Routing Celery task for simple prioritization

Like most businesses, where I work we need to send lots of notifications to our users, mainly emails and push notifications; the setup is quite simple:

  • a service accepts requests to send notifications to users
  • the notification service prepare the message and put it in a queue
  • a pool of workers fetches messages from the queue and perform the actual delivery

This works reasonably well and we can scale the service increasing the instances of the notification service and the delivery workers.

This setup is used also when a user requests an export of her/his historical data; since this process can take a while, a background job fetches the data, generates a pdf and sends it via email using the notifications service. At the same time we generate hundreds of thousands of notifications, usually in the morning, and this fill up the notifiation queue so if a user requests an export during this time frame its email will have to wait a lot before it can be processed.

Solution

We have evaluated a couple of solutions:
  • per message priority
  • dedicated low priority queue for high volume automaticly generated notifications using task routing

The first solution is a generic one but as far as we have seen it is not easy to have the guarantie that a high priority message will be delivered in our desired time frame and we opted for the second solution because we don't need a fine grained prioritization system (maybe in the future) but just a way to continue to deliver user generated notifications when we are sending our automated high volume notifications during the morning.

Implementation

Our stack is based on Celery and it is composed mainly by two parts:
  • the notifications service thats send messages to a queue
  • a pool of workers that fetch messages from the queue and deliver the notifications

To achieve our goal we only had to change the way that notifications service sends messages to the queue by specifing the low or default priority queue based on the message type and by running a specific pool of workers bound to each priority queue.

Example code with routing:

from celery import Celery

celery = Celery()
celery.main = 'Test app'
celery.config_from_object('celeryconfig')
celery.autodiscover_tasks(
    ['tasks'],
    force=True
)


@celery.task
def task(prio, message):
    print(f'{prio}: {message}')

# calling the task specifying the queue
task.apply_async(args=('default', next(generator)),
                 queue='default')
How to run a worker specifing the queue::
$ celery -A tasks worker --loglevel=INFO -Q default

This solution works but there is an efficency problem since when the low priority queue will be empty the low priority workers will be idle wasting precious (and paid) resources; fortunately there is a simple solution for this because it is possible to specify more than one queue using the -Q parameter. This way we will have a dedicated pool of workers that will work on messages generated by user activity and a second pool of workers that will handle the automated messages and, when those will be finished, these workers can help with default priority messages.

An example implementation is provided in this repo with instruction to run the different use cases.

P.S. We are hiring!

Cython and C pointers

Cython is a powerful tool that can speed up your Python code or can help you to quickly create an extension module for a library that has c bindings.

This post is not a cython tutorial but its about a problem I have encountered during the development of a wrapper to Wakaama LWM2M library; if you want to learn more about Cython please refer to its documentation.

During the development of said wrapper I have had a nasty strange bug affecting my code, at some point some object referenced by a pointer kept at the C library side seemed to change its value! For example think about this situation:

  • you pass an object to a certain c function
  • the object's pointer is stored internally by the c library
  • you retrieve the object from the c library but its content its not the same...it's not even an instance of the same class!

To reproduce this behaviour please have a look at overwrite.py in my test project.

At first I've been thinking about memory corruption caused by the C library code; this wrong assumption costed me at least two days of messing with gdb (that saved my life in another memory related bug) that have not given any useful insight.

I don't remember how I've spotted this problem but probably I should have tryed everything until something pointed me to the right direction.

What seems to happen here is something like this:

  1. create an object and assign it to a variable A
  2. store its pointer in the C library
  3. get back the pointer of the object; it will be the same object as referenced by A
  4. assign a new object to A
  5. create an object assign it to a variable B
  6. get back the pointer of the object; it will NOT be the same object as referenced by A but it will be the one referenced by B

When the variable A is referencing the new object the old one is not referenced by anyone in the python world and its memory will became available to store the new object assigned to B and the pointer in the C world is now pointing to what is referenced by B.

It may seems obvious but when it happened to me it was inside a callback originating in the C code and being handled by a function defined in the cython context.

Lesson learned

When working with two different languages together nothing is naive especially when you are wrapping some code that you don't really know and that can bite your ass as soon as you think "what can go wrong?"

The joy of contributing to open source/free software

From time to time I like to give back to the community with small patches or trying to solve issues, sometimes my contribution comes from bugs found while implementing some kind of software and sometimes I just notice something that can be fixed or improved.

Usually to process is a kind of cold syntetic handshake between my code and whatever (D)VCS system used to accept my work but it happens that you will find some real person at the other side and this is especially true at Mozilla.

In my opinion Mozilla infrastructure and people are key factors in welcoming new developers; there are clear documents about the processes which you will be pointed to by the team and there are real persons that will help you until you master the procedure.

Which are the take aways of contributing?

First of all you are forced to read someone else's code; if you work alone or your employer does not force code reviews this is really helpful because you are probably going to learn a lot from other sources.

Similarly when contributing to other people's code you are forced to add unit tests to your patches otherwise your code will not be accepted; this is ideal in the situations where in your day to day work tests are seens as useless nerd goodies.

When contributing to bigger projects there is usually a strong procedure that must be followed; this is not just bureaucracy but it must be seen as a backbone of the development/deployment/release process. Knowing that any regression "shall not pass" is a huge boost to your confidence while creating your patch.

And finally people! I think that is the most amazing part of the work; you'll get to know persons that will teach you something (or maybe the opposite), you will have to communicate effectively to make your point or just to understand what problem you are going to fix, increasing your communication skills; oh and don't we forget the joy of talking to nice people :)

Conclusions

Contributing can be a beautiful and useful experience for beginners, intermediate and senior developers, everyone can learn something while doing useful work for the community (which in turn will provide a better tools for yourself too); even if the amount of free time you have at hand is a couple of hours a week I suggest trying at least to document some software you use or fix that annoying bug that is driving you crazy, you will thank yourself!

P.S.

I'd like to thank Andrew Halberstadt for his help and his patience while working on some issues on Mozilla Central repo/bugzilla, thank you Andrew!

cx_Freeze, again

It's been a long time since the last post and for sure I was not thiking about another post about cx_Freeze. I have spent this week end trying (and at the end succeding) to build an update to an old software. For future memory I'll write briefly the steps needed to get a functional build without losing my mind.

Brief description of the problem

The software is a desktop application used to upload email attachments to a web based data store; the application talk to the web app using a "RESTy" api, and fetches the file from a email address; the GUI is written with the beatiful PyQT5 (someone may prefere PySide) and usually this dependency is a pain to work with but I have to admit that the newer version installs gracefuly.

OS version and things to download

The person using this software needs to run it from a Windows OS and at this time Windows 7 is the oldes worsion of the OS I want to support (no Vista, please) since addressing Windows 10 will produce an executable with no retro compatibily and Windows 8 have less installs than 7.

The first software needed to be dowloaded is obviusly the Python 3.5 interpreter and after its installation came the first surprise, it doesn't run because of some missing dlls; after some research I have downloaded the redistribuitable runtime and it started to work as expected.

After Python was ready I've had to install the pip utility which was not automatically installed, maybe I have forgotten to chek the corresponding option in the installer; anyway everything installed correctly using it, this is a brief list of dependencies:

  • requests: OK
  • paramiko: OK
  • imapclient: OK
  • PyQT: (surprisingly!) OK
  • cx_Freeze: Not ok at all..

cx_Freeze requires Widonws build tools that can freely (as in beer) downloaded and installed but even after that I was not able to install it from pip; fortunately some kind people provided a procompiled wheel for it but be sure to download the 5.0 version because the 4.x was not able to produce a proper executable.

Additional dependencies and some setup.py notes

Having not touched I tought that the setup script was soemwhat correct since the last build was succesful, nothing more far from the truth; the first thing has been a nice surprise, cx_Freeze recognized correctly all the dlls to use in the final package and referencing additional dlls is not needed anymore, good work cx_Freeze guys!

After the first build ihave started a cycle of try, fix retry until the application could as expected, here is a list of additional dependencis that I had to install and reference in the setup.py file:

  • appdir
  • packaging
  • atexit

appdir and atexit only need to be referenced as packages instead packaging requires some more fine tuining so I had to add this additional sub_packages to the includes settings of the build_exe_options dictionary:

  • packaging.version
  • packaging.specifiers
  • packaging.requirements

Final words

It took me a couple of hours of trial/error to be able to ship the build and I hope to not have to repeat this madness again soon; if I'll need to create a new build in the future I hope that this little post will halp me not to waste my time again.

PyQT5 and cx_Freeze Windows™ target tips

One of my preferred benefits of Python is its portability and if coupled with good libraries such as PyQT5 the possibilities are endless.

Now imagine you created a new shiny desktop application and you want to distribute it, how can you package the app for easy installation?

Say hello to cx_Freeze

cx_Freeze is a tool that can create a standalone version of your application so you can easily distribute it and the end user don't have to:

  • download your source code
  • a python interpreter
  • setup pip or easy_install, libraries and so on

Another nice feature of cx_Freeze is that it can create an installer for many different operating systems giving your application a more "professional" look.

I strongly suggest giving it a try; using one of the sample setup scripts should be enough to get started.

Some minor issues

A simple application which only depends on pure python packages usually will not give you any headaches but if you have dependencies like PyQT5 a bit of attention is required. In my case not all the required DLL were included in the installer package and that generated a strange error message that was very hard to debug but thanks to sites like StackOverflow I've found a nice fix for it. It is worth noting that linked solution is not (always?) enough but there is a quick solution (at least in my case): add the file libegl.dll to the "include_files" cx_Freeze building options.

How to test your shiny new installer

In order to test your installer and be sure that all the DLLs are incuded and your application is not "cheating" on you using system DLLs I suggest to create a clean windows installation inside a virtual machine; this way you can test your installer in a real case scenario and fix your build scripts accordingly.

Logging model changes with SQLAlchemy listeners

Scenario

Imagine you have a slew of models in your application, at some point you feel the need to log somewhere creation, modification or deletion of data belonging to these models. How to proceed without having to modify the classes one by one?

What's on sqlalchemy

SQLAlchemy (http://sqlalchemy.org) offers a couple of interesting mechanisms: the first concerns the possibility to hook to some event listeners such as before_insert, before_update, before_delete and the corresponding after_*. Additional help is provided by sqlalchemy the opportunity to work on a model after its definition by overriding the method __declare_last__. Using these facts, and assuming that you have defined a model named MyModel, if we wanted to intercept the event "after_insert" we could write the following code:

class MyModel(object):
#lets pretend to have defined our model

  def after_insert(mapper, connection, target):
    #do some stuff
    pass

  @classmethod
  def __declare_last__(cls):
    event.listen(cos, "after_insert", cls.after_insert)

Whenever an object of class MyModel will be entered into the database after_insert method will be called , passing as parameters the mapping of the model, the connection and the target is none other than the object that has just been entered into the database.

In the event that you are intercepting the creation or deletion of an object is sufficient to access its primary key to identify it in your log, but if we wanted to know which fields have been modified, with new and old values, as a result of an update it gets a little more complicated, but not too much. In fact sqlalchemy allows us, quite easily , to check the status of the fields of an object using the function sqlalchemy.orm.attributes.get_history (http://docs.sqlalchemy.org/en/latest/orm/session.html#sqlalchemy.orm.attributes.get_history). This function is called for each field, it returns an object of type History (http://docs.sqlalchemy.org/en/latest/orm/session.html#sqlalchemy.orm.attributes.History) which we will use the method has_changes() to check for changes, and if there were, getting the new and old values of the field that we are analyzing, for example:

h = get_history(target, "a_field")
if h.has_changes():
  #do something using h.deleted list to get the old values
  #do something using h.added list to get the new values

LoggableMixin

Clearly to do this for all models of an application may be costly in terms of time and code maintenance (and extremely annoying) so you might think about creating a generic Mixin with which to extend the models of our application. Below is the skeleton for the implementation of the above mixin, omitting the details of where and how the logs are stored:

class LoggableMixin(object):

  def after_insert(mapper, connection, target):
    #do some stuff for the insert
    pass

  def after_update(mapper, connection, target):
    #do some stuff for the update, maybe saving the changed fields values using get_history
    pass

  def after_delete(mapper, connection, target):
    #do some stuff
    pass

  @classmethod
  def __declare_last__(cls):
    event.listen(cos, "after_insert", cls.after_insert)
    event.listen(cos, "after_update", cls.after_update)
    event.listen(cos, "after_delete", cls.after_delete)

so, for each model we want to log changes it will be sufficient to inherit from LoggableMixin:

class MyModel(SomeSuperClass, LoggableMixin):
  pass

Improvements

One of the first improvements you can make to the class LoggableMixin could be the separation of the class in three different classes eg . LogInsertMixin, LogUpdateMixin LogDeleteMixin, in my case I preferred to have it all together given the small size of the class. A second improvement would be the generalization of mixin allowing you to specify which functions (or methods) to be assigned to different listeners, and once more the specific needs of the application I'm working on does not require this level of abstraction and can live well with this approach.

Conclusions

SQLAlchemy provides a number of services to work with the model, the system just described would not have been so easy to implement if it were not for the quality of the API of sqlalchemy. I invite anyone to go deeper in the documentation for sqlalchemy (http://docs.sqlalchemy.org) because within it are preserved gems of great value. For those wishing to see a concrete implementation of the topics discussed in this post they can take a look at the file sysgrove/models.py in the repository at https://bitbucket.org/sysgrove/sysgrove

Our QT date picker

One of the first works carried out just entering at SysGrove (http://sysgrove.com ) consisted in rewriting the date picker component in our application and requests regarding this component was that it was similar, at least as a feature, to the date picker available in windows 7 ™. The most interesting aspect of this component is the ability to be able to do some sort of zoom dates, I try to explain with an example:

  • It would be a classic date picker where are the days of the current month and the ability to change the month with the classic arrows
  • Clicking on the button in the middle of the arrows the days are replaced by months and the arrows will change the current year
  • Again by clicking on the button in the middle of the arrows months will be replaced by the years and the arrows will change the decade
  • Selecting a year will return a month view and by selecting a month will return a day view

Some images could help to understand the behaviour:

day selector

The day selector, with month changer arrows; if you click the middle button you'll see the month selector.

/images/qt-calendar/date-picker-month.png

The month selector, if you click on the middle button you'll see the year selector.

/images/qt-calendar/date-picker-year.png

The selector selector.

Fortunately SysGrove is a company that believes strongly in open source and sharing so I got permission to release this as open source project component, nice! :)

At present the appearance of the component is not that great and you have to put hand to customize the code of the component , in the future I hope to be able to allow the customization of the graphical component directly in the process of creating an instance rather than in its base code.

The source of this component for the moment can be found in the main repository of our application ( https://bitbucket.org/sysgrove/sysgrove ) path sysgrove / ui / classes / widgets / calendar.py

Using buildbot at SysGrove

BuildBot (http://buildbot.net/) is a framework created to simplify the development of automation solutions in the field of software development. Unlike other systems such as Jenkins (http://jenkins-ci.org/) or TravisCI (https://travis-ci.org/) buildbot only provides a framework on which to build the infrastructure that you need . The project has client-server architecture where the server sends to the client the work to be performed, the server is called master and the clients are called slaves; the framework supports multiple masters and multiple slaves, in this way the system architecture can easily grow with the needs of your project. Buildbot is available for virtually any platform, even if the current version 0.8.x platform support win32-64 is not exactly the best though still usable with some changes (see http://trac.buildbot.net/wiki/RunningBuildbotOnWindows )

Buildbot at sysgrove

For our development needs in SysGrove( http://sysgrove.com) buildbot is used to generate the release of our application for the win32 platform. Every 12 hours we verify that there are some changes in our repository (https://bitbucket.org/sysgrove/sysgrove) and in this case, the following operations are performed: - Download the project from the repository (we use git, but the system supports hg, svn, bazaar etc ...) - Launch the test suite (in this case we use the testing tool discovery nosetests) - Generate the executable using py2exe - Build the initial database (sqlite and postgres) - It creates the archive with the application executable and test database(s) - Load the new release on "Confluence"

Clearly doing all these steps manually is tedious and can beprone to errors or oversights that would decrease the quality of our product.

Our setup

As mentioned at the beginning we (http://sysgrove.com/) need to create an executable for the platform win32 (and possibly in the future for linux and osx) for which we need to run our script inside a win32 OS; given the relative difficulty of configuring a master on windows it was decided to have the master in a linux host (debian jessie) and a slave in a virtual machine with windows 7 ™ since that the slaves give far fewer problems .

Configuration

The configuration file, as with cfg extension, is a simple python script, you can use this file to configure these main areas: - Slaves: you can define how many slaves you want each with its own identity - The scheduler: the scheduler indicate how and when the activities will be planned by the master - The builder: indicate the steps to take to create a build - Notifiers: the master can communicate in many ways the status of the build such as irc, email or web - Various configurations such as the database to use to store the state of the system, the identity of the project (name, url) etc.

Pitfalls

As I said earlier I had some problems with windows, probably due to my inexperience with the version of python for this operating system. One of the first difficulty concerns sqlalchemy, an ORM for python; buildbot requires in fact a version of sqlalchemy == 0.7.10 as our application requires a newer version. Typically, such a problem is solved easily using the handy tool virtualenv (https://pypi.python.org/pypi/virtualenv) creating an environment for the master (the part that needs access to a db, sqlite by default) and one environment for the slave, but given my experience with windows, I could not separate the environments (which is why the master is on a debian host and the slave is a on windows VM).

Conclusions

Once you understand the general concepts of buildbot, extend and customize the functionality becomes simply a work of configuring and creating scripts that do the real work, besides the web interface by default enables us to keep track of our builds and forcing new if it were needed.

Clearly this is not a "point and click" tool to everyone and this is a good thing, in my humble opinion and responsibility for the maintenance of such a system requires knowledge of a typical IT department and not a manager who, if desired, can always send us an email to request a build rather than messing around with his hands no expert ;)

Having fun with RabbitMQ

Preface

Lately I’ve had to work on a complex web application which have started to have an increased (and increasing) number of users. Unfortunately this web application was not built to scale and so problems started to get to the surface. When the user base was small to medium the load on the application server was fairly low and we could focus on adding new features and growing our user base until we landed on the Facebook platform. At some point users started to came and the scheduled task took 10-20-40-70-90 minutues to complete leaving the users to stare at the “calculating tasks” page.

The first approved (the correct word should be "imposed") solution was to migrate our servers to AWS so we could increase host perfomance using bigger and bigger servers until we reached the limits of the single machine again.

After a lot of battling with the management we had the opportunity to detach the application core from the web site and started to build the new architecture with these loosely defined requirements:

  • better use of resources
  • better code organization and quality
  • horizontal scaling
  • manageabilty
  • ready for future development

I admit that those where my requirements, but the management undestood that they have lost the ability to understand the architecture beyhond a simple web site with a bounch of scheduled routines that, honestly, worked with the help of an infinite dose of luck…and also my prototype run in a fraction of time.

Building blocks

Our old backend has been broken in to those pieces:

  • libcore : the base code extracted from the old and stinky web site
  • RabbitMQ message broker
  • job scheduler : send messages to appropriate exchanges at appropriate time
  • workers : consume messages sent by the scheduler

Leaving libcore apart doing its stuff lets focus on the scheduler and the workers; for details about RabbitMQ please refer to its web site (http://www.rabbitmq.com/).

As a scheduling library I have choosen the open source Quartz.NET (http://quartznet.sourceforge.net/) which provided a simple but powerful interface to schedule jobs, I especially liked the ability to specify job triggers with the well known and compact cron syntax. Long story short the scheduled jobs harvests the tasks and send messages to exchanges accordingly. I think that the whole core plus the jobs are not bigger than some hundreds of lines of code. Wonderful!

In order to create a worker one should simply have to inherit from a BaseConsumer class and override the "consume" method to do its tasks when a message is received. Done. Cool :)

Conclusion

Given this simple architecture what are the benefits? Lets see:

  • scaling: I simply have to add more workers in the same or different machines and the calculating power will increase accordingly
  • flexibility: if a new task is required I have to create a new scheduled job and its consumer (even in a different language or platform) and maybe a new message type (messages are simple JSON described objects); after all is tested I have to update the scheduler and add a new worker either by updating a running worker or adding a new one without affecting the whole system; yes it can be done while the system is crunching its task and yes, obviously without affecting the web site(s)
  • cost effective: if we have a lot of task to complete at a certain hour in the day I could simply turn on a couple of more machines when I need it and turn those off when the whole work is completed; if you have ever used the Amazon AWS infrastructure you understand of much one can save using this approach
  • testing: having everything well separated I can test each block individually instead of having a code blob that is barely manageable

With this short post I hope to increase interest in this simple way of doing distribuited computing using a message broker and a simple but well architected system.