Best practices

Introduction

Py++ has reach interface and a lot of functionality. Sometimes reach interface helps, but sometimes it can confuse. This document will describe how effectively to use Py++.

Big projects

Definition

First of all, let me to define “big project”. “Big project” is a project with few hundred of header files. Py++ was born to create Python bindings for such projects. If you take a look here you will find few such projects.

Tips

  • Create one header file, which will include all project header files.

    Doing it this way makes it so CastXML is only called once and it reduces the overhead that would occur if you pass CastXML all the files individually. Namely CastXML would have to run hundreds of times and each call would actually end up including quite a bit of common code anyway. This way takes a CastXML processing time from multiple hours with gigabytes of caches to a couple minutes with a reasonable cache size.

    You can read more about different caches supported by pygccxml here. module_builder_t.__init__ method takes reference to an instance of cache class or None:

    from module_builder import *
    mb = module_builder_t( ..., cache=file_cache_t( <<<path to project cache file>>> ), ... )
    
  • Single header file, will also improve performance compiling the generated bindings.

    When Py++ generated the bindings, you have a lot of .cpp files to compile. The project you are working on is big. I am sure it takes a lot of time to compile projects that depend on it. Generated code also depend on it, more over this code contains a lot of template instantiations. So it could take a great deal of time to compile it. Allen Bierbaum investigated this problem. He found out that most of the time is really spent processing all the headers, templates, macros from the project and from the boost library. So he come to conclusion, that in order to improve compilation speed, user should be able to control( to be able to generate ) precompiled header file. He implemented an initial version of the functionality. After small discussion, we agreed on the following interface:

    class module_builder_t( ... ):
        ...
        def split_module( self, directory_path, huge_classes=None, precompiled_header=None ):
            ...
    

    precompiled_header argument could be None or string, that contains name of precompiled header file, which will be created in the directory. Py++ will add to it header files from Boost.Python library and your header files.

    What is huge_classes argument for? huge_classes could be None or list of references to class declarations. It is there to provide a solution to this error. Py++ will automatically split generated code for the huge classes to few files:

    mb = module_builder_t( ... )
    ...
    my_big_class = mb.class_( my_big_class )
    mb.split_module( ..., huge_classes=[my_big_class], ... )
    
    • Caveats

      Consider the following file layout:

      boost/
        date_time/
          ptime.hpp
          time_duration.hpp
          date_time.hpp //main header, which include all other header files
      

      Py++ currently does not handle relative paths as input very well, so it is recommended that you use “os.path.abspath()” to transform the header file to be processed into an absolute path:

      #the following code will expose nothing
      mb = module_builder( [ 'date_time/date_time.hpp' ], ... )
      
      #while this one will work as expected
      import os
      mb = module_builder( [ os.path.abspath('date_time/date_time.hpp') ], ... )
      
  • Keep the declaration tree small.

    When parsing the header files to build the declaration tree, there will also be the occasional “junk” declaration inside the tree that is not relevant to the bindings you want to generate. These extra declarations come from header files that were included somewhere in the header files that you were actually parsing (e.g. if that library uses the STL or OpenGL or other system headers then the final declaration tree will contain those declarations, too). It can happen that the majority of declarations in your declaration tree are such “junk” declarations that are not required for generating your bindings and that just slow down the generation process (reading the declaration cache and doing queries will take longer).

    To speed up your generation process you might want to consider making the declaration tree as small as possible and only store those declarations that somehow have an influence on the bindings. Ideally, this is done as early as possible and luckily CastXML provides an option that allows you to reduce the number of declarations that it will store in the output XML file. You can specify one or more declarations using the -fxml-start option and only those sub-tree starting at the specified declarations will be written. For example, if you specify the name of a particular class, only this class and all its members will get written. Or if your project already uses a dedicated namespace you can simply use this namespace as a starting point and all declarations stemming from system headers will be ignored (except for those declarations that are actually used within your library).

    In the pygccxml package you can set the value for the -fxml-start option using the start_with_declarations attribute of the pygccxml.parser.config_t object that you are passing to the parser.

  • Use Py++ repository of generated files md5 sum.

    Py++ is able to store md5 sum of generated files in a file. Next time you will generate code, Py++ will compare generated file content against the sum, instead of loading the content of the previously generated file from the disk and comparing against it.

    mb = module_builder_t( ... )
    ...
    my_big_class = mb.class_( my_big_class )
    mb.split_module( ..., use_files_sum_repository=True )
    

    Py++ will generate file named “<your module name>.md5.sum” in the directory it will generate all the files.

    Enabling this functionality should give you 10-15% of performance boost.

    • Caveats

      If you changed manually some of the files - don’t forget to delete the relevant line from “md5.sum” file. You can also delete the whole file. If the file is missing, Py++ will use old plain method of comparing content of the files. It will not re-write “unchanged” files and you will not be forced to recompile the whole project.