First of all, let me to define “big project”. “Big project” is a project with few hundred of header files. Py++ was born to create Python bindings for such projects. If you take a look here you will find few such projects.
Create one header file, which will include all project header files.
Doing it this way makes it so CastXML is only called once and it reduces the overhead that would occur if you pass CastXML all the files individually. Namely CastXML would have to run hundreds of times and each call would actually end up including quite a bit of common code anyway. This way takes a CastXML processing time from multiple hours with gigabytes of caches to a couple minutes with a reasonable cache size.
from module_builder import * mb = module_builder_t( ..., cache=file_cache_t( <<<path to project cache file>>> ), ... )
Single header file, will also improve performance compiling the generated bindings.
When Py++ generated the bindings, you have a lot of .cpp files to compile. The project you are working on is big. I am sure it takes a lot of time to compile projects that depend on it. Generated code also depend on it, more over this code contains a lot of template instantiations. So it could take a great deal of time to compile it. Allen Bierbaum investigated this problem. He found out that most of the time is really spent processing all the headers, templates, macros from the project and from the boost library. So he come to conclusion, that in order to improve compilation speed, user should be able to control( to be able to generate ) precompiled header file. He implemented an initial version of the functionality. After small discussion, we agreed on the following interface:
class module_builder_t( ... ): ... def split_module( self, directory_path, huge_classes=None, precompiled_header=None ): ...
precompiled_headerargument could be
Noneor string, that contains name of precompiled header file, which will be created in the directory. Py++ will add to it header files from Boost.Python library and your header files.
Noneor list of references to class declarations. It is there to provide a solution to this error. Py++ will automatically split generated code for the huge classes to few files:
mb = module_builder_t( ... ) ... my_big_class = mb.class_( my_big_class ) mb.split_module( ..., huge_classes=[my_big_class], ... )
Consider the following file layout:
boost/ date_time/ ptime.hpp time_duration.hpp date_time.hpp //main header, which include all other header files
Py++ currently does not handle relative paths as input very well, so it is recommended that you use “os.path.abspath()” to transform the header file to be processed into an absolute path:
#the following code will expose nothing mb = module_builder( [ 'date_time/date_time.hpp' ], ... ) #while this one will work as expected import os mb = module_builder( [ os.path.abspath('date_time/date_time.hpp') ], ... )
Keep the declaration tree small.
When parsing the header files to build the declaration tree, there will also be the occasional “junk” declaration inside the tree that is not relevant to the bindings you want to generate. These extra declarations come from header files that were included somewhere in the header files that you were actually parsing (e.g. if that library uses the STL or OpenGL or other system headers then the final declaration tree will contain those declarations, too). It can happen that the majority of declarations in your declaration tree are such “junk” declarations that are not required for generating your bindings and that just slow down the generation process (reading the declaration cache and doing queries will take longer).
To speed up your generation process you might want to consider making the declaration tree as small as possible and only store those declarations that somehow have an influence on the bindings. Ideally, this is done as early as possible and luckily CastXML provides an option that allows you to reduce the number of declarations that it will store in the output XML file. You can specify one or more declarations using the
-fxml-startoption and only those sub-tree starting at the specified declarations will be written. For example, if you specify the name of a particular class, only this class and all its members will get written. Or if your project already uses a dedicated namespace you can simply use this namespace as a starting point and all declarations stemming from system headers will be ignored (except for those declarations that are actually used within your library).
pygccxmlpackage you can set the value for the
-fxml-startoption using the
start_with_declarationsattribute of the
pygccxml.parser.config_tobject that you are passing to the parser.
Use Py++ repository of generated files md5 sum.
Py++ is able to store md5 sum of generated files in a file. Next time you will generate code, Py++ will compare generated file content against the sum, instead of loading the content of the previously generated file from the disk and comparing against it.
mb = module_builder_t( ... ) ... my_big_class = mb.class_( my_big_class ) mb.split_module( ..., use_files_sum_repository=True )
Py++ will generate file named “<your module name>.md5.sum” in the directory it will generate all the files.
Enabling this functionality should give you 10-15% of performance boost.
If you changed manually some of the files - don’t forget to delete the relevant line from “md5.sum” file. You can also delete the whole file. If the file is missing, Py++ will use old plain method of comparing content of the files. It will not re-write “unchanged” files and you will not be forced to recompile the whole project.