Architecture

Introduction

This document will describe an architecture behind Py++.

Py++ & pygccxml integration

C++

C++ is very powerful programming language. The power brings complexity. It is not an easy task to parse C++ source files and to create in memory representation of declarations tree. The declarations tree is worth nothing, if a user is not able to explorer it, to run queries against it or to find out traits of a declaration or a type.

On the earlier stage of the development, I realized, that all this functionality does not belong to code generator and should be implemented out side of it. pygccxml project was born. pygccxml made the code generator to be smaller and C++ parser independent. It provides the following services:

  • definition of classes, that describe C++ declaration and types, and their analyzers ( type traits )
  • C++ source files parsing and caching functionality

Py++ uses those services to:

  • extract declarations from source files and to provide powerful query interface
  • find out a declaration default configuration:
    • call policies for functions
    • indexing suite parameters
    • generate warnings/hints

Integration details

Py++ uses different approaches to expose these services to the user.

Parsing integration

Py++ provides it’s own “API” to configure pygccxml parsing services. The “API” I am talking about, is arguments to module_builder.__init__ method. We think, that exposing those services via Py++ simplifies its usage.

Declarations tree integration

Declarations tree API consists from 3 parts:

  • interface definition:
    • declaration_t and all classes that derive from it
    • type_t and all classes that derive from it
  • type traits
  • query engine API

The user should be familiar with these parts and relevant API. In my opinion, wrapping or hiding the API will not provide an additional value. The interface of all those services is pretty simple and well polished.

Before I explain how these services are integrated, take a look on the following source code:

mb = module_builder_t( ... )

details = mb.namespace( 'details' )
details.exclude()

my_class = mb.class_( 'my_class' )
my_class.rename("MyClass")

What you see here, is a common pattern, that will appear in all projects, that use Py++:

  • find the declaration(s)
  • give the instruction(s) to the code generator engine

What is the point of this example? From the user point of view it is perfectly good, it makes a lot of sense to configure the code generation engine, using the declarations tree. How does Py++ add missing functionality to pygccxml.declarations classes? There were few possible solutions to the problem. The following one was implemented:

  1. pygccxml.parser package interface was extended. Instead of creating a concrete instance of declaration classes, pygccxml.parser package uses a factory.
  2. pyplusplus.decl_wrappers package defines classes, which derive from pygccxml.declarations classes and defines the factory.

The implemented solution is not the simplest one, but it provides an additional value to the project:

  • the code generation engine configuration and declarations tree are tightly coupled
  • the functionality provided by pygccxml.declarations and pygccxml.parser packages is available for pyplusplus.decl_wrappers classes
  • classes defined in pyplusplus.decl_wrappers package implement the following functionality:
    • setting reasonable defaults for the code generation engine( call policies, indexing suite, … )
    • provides user with additional information( warnings and hints )
  • as a bonus, pygccxml remained to be stand-alone project

Code generation engine

Code generation for Boost.Python library is a difficult process. There are two different problems the engine should solve:

  • What code should be created in order to export a declaration?

  • How it should be written to files?

    Remember, Py++ is targeting big projects. It cannot generate all code in one file - this will not work, not at all.

Code creators and file writers provides solution for both problems.

Code creators

Do you know how many ways exist to export member function? If you will try to answer the question, consider the following function characteristics and their mix:

  • virtuality( non virtual, virtual or pure virtual )
  • access level( public, protected or private )
  • static\non static
  • overloads

As you see, there are a lot of use cases. How do code creators solve the problem?

Definition

Code creator is an in-memory fragment of a C++ code.

Also, code creator can represent an arbitrary C++ code, in practice it represents logically complete block.

Example of code creators:

  • code_creators.enum_t generates registration code for an enumeration
  • code_creators.mem_fun_pv_t generates registration code for public, pure virtual function
  • code_creators.mem_fun_pv_wrapper_t generates declaration code for public, pure virtual function
  • code_creators.include_t generates include directives
  • code_creators.custom_text_t adds some custom( read user ) text\code to the generated code

There are primary two groups of code creators: declaration based and others.

Declaration based code creator keeps reference to the declaration ( pyplusplus.decl_wrapper.* class instance ). During code generation process, it reads its settings( the code generation engine instructions ) from the declaration. Declaration based code creators also divided into two groups. The first group creates registration code, where the second one creates wrapper\helper declaration code.

I will reuse this example, from Boost.Python tutorials.

  1. BaseWrap::f, BaseWrap::default_f - declaration code is created by code_creators.mem_fun_v_wrapper_t
  2. f registration code is created by code_creators.mem_fun_v_t. This code creator also keeps reference to the relevant instance of code_creators.mem_fun_v_wrapper_t class.

Composite code creator is a creator, which contains other creators. Composite code creator embeds the code, created by internal code creators, within the code it creates. For example:

  • code_creators.class_t:

    First of all it creates class registration code ( class_<...> ), after this it appends to it code generated by internal creators.

  • code_creators.module_body_t:

    Here is “cut & paste” of the relevant code from the source file:

    def _create_impl(self):
        result = []
        result.append( "BOOST_PYTHON_MODULE(%s){" % self.name )
        result.append( compound.compound_t.create_internal_code( self.creators ) )
        result.append( "}" )
        return os.linesep.join( result )
    

Code creators tree

code_creators.module_t class is a top level code creator. Take a look on the following possible “snapshot” of the code creators tree:

<module_t ...>
    <license_t ...>
    <include_t ...>
    <include_t ...>
    <class_wrapper_t ...>
        <mem_fun_v_wrapper_t ...>
        <mem_fun_v_wrapper_t ...>
    <module_body_t ...>
        <enum_t ...>
        <class_t ...>
            <mem_fun_v_t ...>
            <member_variable_t ...>
        <free_function_t ...>
        <...>

You can think about code creators tree as some kind of AST.

Code creators tree construction

pyplusplus.creators_factory package is responsible for the tree construction. pyplusplus.creators_factory.creator_t is the main class of the package. It creates the tree in few steps:

  1. It builds set of exposed declarations.
  2. It sort the set. Boost.Python has few rules, that forces the user to export a declaration before another one.
  3. It creates code creators and put them into the right place within the tree.
  4. If a declaration describes C++ class, it applies these steps to it.

Another responsibility of creator_t class, is to analyze declarations and their dependency graphs. As a result, this class can:

  • find out a class HeldType
  • find out smart pointers conversion, which should be registered
  • find out STD containers, which should be exported
  • warn user, if some declaration is not exported and it used somewhere in exported declarations ( not implemented )

File writers

File writers classes are responsible for writing code creators tree into the files. Py++ implements the following strategies of writing code creators tree into files:

The more sophisticated approach, the better understanding of code creators is required from the file writers.

module_builder package

This package provides an interface to all code generator engine services.

Conclusion

It safe to use Py++ for big and small projects!