Le blog de pingou

To content | To menu | To search

Tag - Python

Entries feed - Comments feed

Wednesday, June 29 2016

Profiling in python

When working on FMN's new architecture I been wanted to profile a little bit the application, to see where it spends most of its time.

I knew about the classic cProfile builtin in python but it didn't quite fit my needs since I wanted to profile a very specific part of my code, preferrably without refactoring it in such a way that I could use cProfile.

Searching for a solution using cProfile (or something else), I ran into the pycon presentation of A. Jesse Jiryu Davis entitled 'Python performance profiling: The guts and the glory'. It is really quite an interesting talk and if you have not seen it, I would encourage you to watch it (on youtube)

In this talk is presented yappi, standing for Yet Another Python Profiling Implementation and writen by Sümer Cip, together with some code allowing to easy use it and write the output in a format compatible with callgrind (allowing us to use KCacheGrind to visualize the results).

To give you an example, this is how it looked before (without profiling):

t = time.time()
results = fmn.lib.recipients(PREFS, msg, valid_paths, CONFIG)
log.debug("results retrieved in: %0.2fs", time.time() - t)

And this is the same code, integrated with yappi

import yappi
yappi.set_clock_type('cpu')
t = time.time()
yappi.start(builtins=True)
results = fmn.lib.recipients(PREFS, msg, valid_paths, CONFIG)
stats = yappi.get_func_stats()
stats.save('output_callgrind.out', type='callgrind')
log.debug("results retrieved in: %0.2fs", time.time() - t)

As you can see, all it takes is 5 lines of code to profile the function fmn.lib.recipients and dump the stats in a callgrind format.

And this is how the output looks like in KCacheGrind :) kcachegrind_fmn.png

Saturday, June 25 2016

New FMN architecture and tests

New FMN architecture and tests

Introduction

FMN is the FedMsg Notification service. It allows any contributors (or actually, anyone with a FAS account) to tune what notification they want to receive and how.

For example it allows saying things like:

  • Send me a notification on IRC for every package I maintain that has successfully built on koji
  • Send me a notification by email for every request made in pkgdb to a package I maintain
  • Send me a notification by IRC when a new version of a package I maintain is found

How it works

The principile is that anyone can log in on the web UI of FMN there, they can create filters on a specific backend (email or IRC mainly) and add rules to that filter. These rules must either be validated or invalited for the notification to be sent.

Then the FMN backend listens to all the messages sent on Fedora's fedmsg and for each message received, goes through all the rules in all the filters to figure out who wants to be notified about this action and how.

The challenge

Today, computing who wants to be notified and how takes about 6 seconds to 12 seconds per message and is really CPU intensive. This means that when we have an operation sending a few thousands messages on the bus (for example, mass-branching or a packager maintaining a lot of packages orphaning them), the queue of messages goes up and it can take hours to days for a notification to be delivered which could be problematic in some cases.

The architecture

This is the current architecture of FMN:

|                        +--------\
|                   read |  prefs | write
|                  +---->|  DB    |<--------+
|                  |     \--------+         |
|        +-----+---+---+            +---+---+---+---+   +----+
|        |     |fmn.lib|            |   |fmn.lib|   |   |user|
v        |     +-------+            |   +-------+   |   +--+-+
fedmsg+->|consumer     |            |central webapp |<-----+
+        +-----+  +---+|            +---------------+
|        |email|  |irc||
|        +-+---+--+-+-++
|          |        |
|          |        |
v          v        v

As you can see it is not clear where the CPU intensive part is and that's because it is in fact integrated in the fedmsg consumer. This design, while making things easier brings the downside of making it pratically impossible to scale it easily when we have an event producing lots of messages. We multi-threaded the application as much as we could, but we were quickly reaching the limit of the GIL.

To try improving on this situation, we reworked the architecture of the backend as follow:

                                                     +-------------+
                                              Read   |             |   Write
                                              +------+  prefs DB   +<------+
                                              |      |             |       |
   +                                          |      +-------------+       |
   |                                          |                            |   +------------------+   +--------+
   |                                          |                            |   |    |fmn.lib|     |   |        |
   |                                          v                            |   |    +-------+     |<--+  User  |
   |                                    +----------+                       +---+                  |   |        |
   |                                    |   fmn.lib|                           |  Central WebApp  |   +--------+
   |                                    |          |                           +------------------+
   |                             +----->|  Worker  +--------+
   |                             |      |          |        |
fedmsg                           |      +----------+        |
   |                             |                          |
   |                             |      +----------+        |
   |   +------------------+      |      |   fmn.lib|        |       +--------------------+
   |   | fedmsg consumer  |      |      |          |        |       | Backend            |
   +-->|                  +------------>|  Worker  +--------------->|                    |
   |   |                  |      |      |          |        |       +-----+   +---+  +---+
   |   +------------------+      |      +----------+        |       |email|   |IRC|  |SSE|
   |                             |                          |       +--+--+---+-+-+--+-+-+
   |                             |      +----------+        |          |        |      |
   |                             |      |   fmn.lib|        |          |        |      |
   |                             |      |          |        |          |        |      |
   |                             +----->|  Worker  +--------+          |        |      |
   |                         RabbitMQ   |          |    RabbitMQ       |        |      |
   |                                    +----------+                   |        |      |
   |                                                                   v        v      v
   |
   |
   |
   v

The idea is that the fedmsg consumer listens to Fedora's fedmsg, put the messages in a queue. These messages are then picked from the queue by multiple workers who will do the CPU intensive task and put their results in another queue. The results are then picked from this second queue by a backend process that will do the actually notification (sending the email, the IRC message).

We also included an SSE component to the backend, which is something we want to do for fedora-hubs but this still needs to be written.

Testing the new architecture

The new architecture looks fine on paper, but one would wonder how it performs in real-life and with real data.

In order to test it, we wrote two scripts (one for the current architecture and one for the new) sending messages via fedmsg or putting in messages in the queue that the workers listens to, therefore mimiking there the behavior of the fedmsg consumer. Then we ran different tests.

The machine

The machine on which the tests were run is:

  • CPU: Intel i5 760 @ 2.8GHz (quad-core)
  • RAM: 16G DDR2 (1333 Mhz)
  • Disk: ScanDisk SDSSDA12 (120G)
  • OS: RHEL 7.2, up to date
  • Dataset: 15,000 (15K) messages

The results

The current architecture

The current architecture only allows to run one test, send 15K fedmsg messages and let the fedmsg consumer process them and monitor how long it takes to digest them.

Test #0 - fedmsg based
  Lasted for 9:05:23.313368
  Maxed at:  14995
  Avg processing: 0.458672376874 msg/s

The new architecture

The new architecture being able to scale we performed a different tests with it, using 2 workers, then 4 workers, then 6 workers and finally 8 workers. This gives us an idea if the scaling is linear or not and how much improvement we get by adding more workers.

Test #1 - 2 workers - 1 backend
  Lasted for 4:32:48.870010
  Maxed at:  13470
  Avg processing: 0.824487297215 msg/s
Test #2 - 4 workers - 1 backend
  Lasted for 3:18:10.030542
  Maxed at:  13447
  Avg processing: 1.1342276217 msg/s
Test #3 - 6 workers - 1 backend
  Lasted for 3:06:02.881912
  Maxed at:  13392
  Avg processing: 1.20500359971 msg/s
Test #4 - 8 workers - 1 backend
  Lasted for 3:14:11.669631
  Maxed at:  13351
  Avg processing: 1.15160928467 msg/s

Conclusions

Looking at the results of the tests, the new architecture is clearly handling its load better and faster. However, the progress aren't as linear as we like. My feeling is that retrieve information from the cache (here redis) is at one point getting slower, eventually also because of the central lock we tell redis to use.

As time permits, I will try to investigate this further to see if we can still gain some speed.

Wednesday, March 2 2016

Monitor performances of WSGI apps

Accessing pagure's performances via mod_wsgi-express

Continue reading...

Tuesday, January 5 2016

Setting up pagure on a banana pi

This is a small blog post about setting up pagure on a banana pi.

Continue reading...

Thursday, November 19 2015

Introducing mdapi

I have recently been working on a new small project, an API to query the information stored in the meta-data present in the RPM repositories (Fedora's and EPEL's).

These meta-data include, package name, summary, description, epoch, version, release but also changelog, the list of all the files in a package. It also includes the dependencies information, the regular Provides, Requires, Obsoletes and Conflicts but also the new ones for soft-dependencies: Recommends, Suggests, Supplements and Enhances.

With this project, we are exposing all this information to everyone, in an easy way.

mdapi will check if the package asked is present in either of the updates-testing, updates or release repositories (in this order) and it will return the information found in the first repo where there is a match (and say so) So for example: https://apps.fedoraproject.org/mdapi/f23/pkg/guake?pretty=True*

shows the package information for guake in Fedora 23, where guake has been updated but the latest version is in updates not updates-testing. Therefore it says "repo": "updates".

The application is written entirely in python3 using aiohttp which is itself based on asyncio, allowing it to handle some load very nicely.

Just to show you, here is the result of a little test performed with the apache benchmark tool:

    $ ab -c 100 -n 1000 https://apps.fedoraproject.org/mdapi/f23/pkg/guake
    This is ApacheBench, Version 2.3 <$Revision: 1663405 $>
    Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
    Licensed to The Apache Software Foundation, http://www.apache.org/
    
    Benchmarking apps.fedoraproject.org (be patient)
    Completed 100 requests
    Completed 200 requests
    Completed 300 requests
    Completed 400 requests
    Completed 500 requests
    Completed 600 requests
    Completed 700 requests
    Completed 800 requests
    Completed 900 requests
    Completed 1000 requests
    Finished 1000 requests
    
    
    Server Software:        Python/3.4
    Server Hostname:        apps.fedoraproject.org
    Server Port:            443
    SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES128-GCM-SHA256,4096,128
    
    Document Path:          /mdapi/f23/pkg/guake
    Document Length:        1843 bytes
    
    Concurrency Level:      100
    Time taken for tests:   41.825 seconds
    Complete requests:      1000
    Failed requests:        0
    Total transferred:      2133965 bytes
    HTML transferred:       1843000 bytes
    Requests per second:    23.91 [#/sec] (mean)
    Time per request:       4182.511 [ms] (mean)
    Time per request:       41.825 [ms] (mean, across all concurrent requests)
    Transfer rate:          49.83 [Kbytes/sec] received
    
    Connection Times (ms)
                  min  mean[+/-sd] median   max
    Connect:      513  610 207.1    547    1898
    Processing:   227 3356 623.2   3534    4025
    Waiting:      227 3355 623.2   3533    4024
    Total:        781 3966 553.2   4085    5377
    
    Percentage of the requests served within a certain time (ms)
      50%   4085
      66%   4110
      75%   4132
      80%   4159
      90%   4217
      95%   4402
      98%   4444
      99%   4615
     100%   5377 (longest request)

Note the:

    Time per request:       41.825 [ms] (mean, across all concurrent requests)

We are below 42ms so (0.042 second) to retrieve the info of a package in the updates repo and that's while executing 100 requests at the same time on a server that is in the US while I am in Europe.




  • Note the ?pretty=True in the URL, this is something handy to view the JSON

returned but I advise against using it in your applications as it will increase the amount of data returned and thus slow things down.

Note2: Your mileage may vary when testing mdapi yourself, but it should remain pretty fast!

Wednesday, August 5 2015

Faitout changes home

Faitout is an application giving you full access to a postgresql database for 30 minutes.

This is really handy to run tests against.

For example, for some of my applications, I run the tests locally against a in-memory sqlite database (very fast) and when I push, the tests are ran on jenkins but this time using faitout (a little slower, but much closer to the production environment). This setup allows me to find early potential error in the code that sqlite does not trigger.

Faitout is running the cloud of the Fedora infrastructure and since this cloud has just been rebuilt, we had to move it. While doing so, faitout got a nice new address:

http://faitout.fedorainfracloud.org/

So if you are using it, don't forget to update your URL ;-)



See also: Previous blog posts about faitout

Thursday, July 23 2015

Introducing flask-multistatic

flask is a micro-web-framework in python. I have been using it for different projects for a couple of years now and I am quite happy with it.

I have been using it for some of the applications ran by the Fedora Infrastructure. Some of these applications could be re-used outside Fedora and this is of course something I would like to encourage.

One of the problem currently is that all those apps are branded for Fedora, so re-using them elsewhere can become complicated, this can be solved by theming. Theming means adjusting two components: templates and static files (images, css...).

Adjusting templates

jinja2 the template engine in flask already supports loading templates from two different directories. This allows to ask the application to load your own template first and if it does not find them, then it looks for it in the directory of the default theme.

Code wise it could look like this:

    # Use the templates
    # First we test the core templates directory
    # (contains stuff that users won't see)
    # Then we use the configured template directory
    import jinja2
    templ_loaders = []
    templ_loaders.append(APP.jinja_loader)
    # First load the templates from the THEME_FOLDER defined in the configuration
    templ_loaders.append(jinja2.FileSystemLoader(os.path.join(
        APP.root_path, APP.template_folder, APP.config['THEME_FOLDER'])))
    # Then load the other templates from the `default` theme folder
    templ_loaders.append(jinja2.FileSystemLoader(os.path.join(
        APP.root_path, APP.template_folder, 'default')))
    APP.jinja_loader = jinja2.ChoiceLoader(templ_loaders)
Adjusting static files

This is a little more tricky as static files are not templates and there is no logic in flask to allow overriding one or another depending on where it is located.

To solve this challenge, I wrote a small flask extension: flask-multistatic that basically allows flask to have the same behavior for static files as it does for templates.

Getting it to work is easy, at the top of your flask application do the imports:

    import flask
    from flask_multistatic import MultiStaticFlask

And make your flask flask application multistatic

    APP = flask.Flask(__name__)
    APP = MultiStaticFlask(APP)

You can then specify multiple folders where static files are located, for example:

    APP.static_folder = [
        os.path.join(APP.root_path, 'static', APP.config['THEME_FOLDER']),
        os.path.join(APP.root_path, 'static', 'default')
    ]

Note: The order the the folder matters, the last one should be the folder with all the usual files (ie: the default theme), the other ones are the folders for your specific theme(s).


Patrick Uiterwijk pointed to me that this method, although working is not ideal for production as it means that all the static files are served by the application instead of being served by the web-server. He therefore contributed an example apache configuration allowing to obtain the same behavior (override static files) but this time directly in apache!



So using flask-multistatic I will finally be able to make my apps entirely theme-able, allowing other projects to re-use them under their own brand.

Thursday, June 25 2015

EventSource/Server-Sent events: lesson learned

Recently I have been looking into Server-sent events, also known as SSE or eventsource.

The idea of server-sent events is to push notification to the browser, in a way it could be seen as a read-only web-socket (from the browser's view).

Implementing SSE is fairly easy code-wise, this article from html5rocks pretty much covers all the basics, but the principle is:

  • Add a little javascript to make your page connect to a specific URL on your server
  • Add a little more javascript to your page to react upon messages sent by the server



Server-side, things are also fairly easy but also need a little consideration:

  • You need to create basically a streaming server, broadcasting messages as they occurs or whenever you want.
  • The format is fairly simple: data: <your data> \n\n
  • You cannot run this server behind apache. The reason is simple, the browser keeps the connection open which means apache will keep the worker process running. So after opening a few pages, apache will reach its maximum number of worker processes running, thus ending up in a situation where it is waiting forever for an available worker process (ie: your apache server is not responding anymore).

So after running into the third point listed above, I moved the SSE server out of my flask application and into its own application, based on trollius (which is a backport of asyncio to python2), but any other async libraries would do (such as twisted or gevent).

After splitting the code out and testing it some more, I found that there is a limitation on the number of permanent connection a browser can make to the same domain. I found a couple of pages mentioning this issue, but the most useful resource for me was this old blog post from 2008: Roundup on Parallel Connections, which also provides the solution on how to go around this limitation: the limit is per domain, so if you set-up a bunch of CNAME sub-domain redirecting to the main domain, it will work for as many connection as you like :-) (note: this is also what github and facebook are using to implement web-socket support on as many tabs as you want).

The final step in this work is to not forget to set the HTTP Cross-Origin access control (CORS) policy in the response sent by your SSE server to control cross-site HTTP requests (which are known security risks).



So in the end, I went for the following architecture:

SSE_layout3.png

Two users are viewing the same page. One of them edits it (ie: sends a POST requests to the flask application), the web-application (here flask) processes the request as usual (changes something, updates the database...) and also queue a message in Redis information about the changes (and depending on what you want to do, specifying what has changed).

The SSE server is listening to redis, picks up the message and sends it to the browser of the two users. The javascript in the page displayed picks up the message, processes it and updates the page with the change.

This way, the first user updated the page and the second user had the changes displayed automatically and without having to reload the page.



Note: asyncio has a redis connector via asyncio-redis and trollius via trollius-redis.

Wednesday, June 17 2015

Contribute to pkgdb2

How to get started with contributing to pkgdb2.

Continue reading...

Monday, April 20 2015

PyCon 2015 - Montreal

This year, for the first time, I have been lucky enough that I could attend PyCon, the Python Conference.

This conference changes location every two years and this year was the second edition at the Palais des Congrès in Montréal, Canada.

Before I venture further into the conference itself, I would like to thank the organizers. The location was great! The organization flawless! And, as an attendee, everything went really smooth.

The conference itself is divided upon three sections

  • The tutorials and the language summit (2 days)
  • The conference per say (3 days)
  • The sprints (4 days)

I did not attend the tutorials but I was invited to the language summit by Kushal Das (PSF board member and CPyton contributor). I was a unique occasion for me to meet and discover how things are discussed and decided within the python community. I must say it made me want to participate more in this community, join the mailing lists and, who knows, maybe try to tackle some easyfix bugs :)

During the summit we had a number of presentation about alternative python compilers like jython. We also had a short presentation by Guido about changes coming in python 3.5 to support declaring types in the function definition. Another interesting discussion was around the requests library and if it could ever make it into the standard library. While I think that specific question wasn't really answered during the summit, it triggered some interesting discussion around endorsing some external libraries within the documentation of the standard library (ie: advising users to use requests on the urllib documentation pages). Another really interesting topic that has been presented was the state of python on mobile platform (Windows Mobile, iOS and Android). While there are still some more work that needs to happen things seems to be progressing on that front and I'm quite looking forward the day we'll be able to easily ship python application in the different store.

The second day of the tutorials was more relax for me. I took this opportunity to wander around Montreal a little and joined the crew of volunteers at the end of the morning to help preparing the swag bags for every attendee of the conference.
We first took out all the goodies shipped to the conference by the different sponsors and align them on two long tables. Then in the middle of the afternoon we started the 'bag stuffing' process :) This is a complicated process in which experts are carrying bags along the two long tables and another set of experts are translated items from the table into the bags.
Placing myself at the very beginning of the chain, I have probably been in contact with 2500+ bags of the 3000+ bags prepared (I would set the bags and be helped by one or two person that would either give away the bags or help me setting them up depending on the stash of prepared bags :)).
These were some interesting, fun, relaxing and sportive two and half hours! If you have not had the opportunity to stuff bags this year and are going to pycon next year, I highly recommend you to join this crew. It is a lot of fun!

The following three days have been the conference itself. To summarize, here is an overview of the talks I went to over the three days: Friday

  • Opening: Julia Evans
  • Keynote: Catherine Bracy

on the Coding For America project and in a broader sense what I would call, civic coding. (ie: how developper can help the community at large by making publicly accessible information and tools). This was a really great keynote, her talk was inspiring and motivating as well as calling for further reflection upon the roles of FOSS developper in our society within our field of expertise (developing) but outside our traditional scope (web, desktop, OS, company).

  • Machine learning 101: Kyle Kastner

This was also a very interesting talk going over the different machine learning algorithm, libraries and use-cases. That helped getting an overview of the field

  • Introduction to HTTPS: A comedy of errors: Ashwini Oruganti & Christopher Amstrong

This was presenting what are the current issue when dealing with https in general, within python or not. I can't say I learned new things in there but it is always good to get refreshed on this topic

  • Insite the Hat: Python @ Walt-Disney Animation Studios: Paul Hilderbrandt

This was a really interesting presentation about the use of IT in general (and python in particular) at Walt-Disney Animation Studios and of course it was full of pretty pictures from Big Hero 6 as well as some other pretty pictures from a couple of other movies. Paul also presented the overview of how animation movies are made and how Disney developed their own tools to facilitate this process insisting on the idea that the tools have to adapt to the artist rather than the other way around.

  • How to interpret your own genome using (mostly) python: Titus Brown

This talk presented tools and workflow that can be used to analyze and compare genomes, taking a population that had a particular history as example and going down into the genome to figure out what (at the gene level) makes this population so specific. It also gave an overview of the possibility for high-throughput genome sequencing and the application that can derive from it as well as touching the surface of the ethical concerns that raises from these technologies.

  • How to build a brain with Python: Tevor Bekolay

While still being bioinformatics this was a very different topic than the previous talk I attended. This presentation was really about the inner (ie: chemical and physical) modeling of neurons of a brain. The presentation started by introducing a couple of application used to model a single neuron and then introduce their own application used to model several neurons at once. Quite impressive and interesting presentation although knowing more about the biochemical and biophysical properties of the brain would have probably lead to a better comprehension of the work presented :)

Saturday

  • Introducing python wats: Amy Hanlon

While I must say I knew most the example she presented of curious behavior of python, I must say that I did not know completely the reason of these behaviors. The presentation was really nice in that it gave some clues and as well as some tools to help figuring out what is actually happening in the code and why these, sometime surprising, behaviors.

  • Learning from other's mistakes: Data-driven analysis of python code: Andreas Dewes

This was an interesting presentation describing the approach develop by this company to do static code analysis but considering the code not as text but as a graph. This approach allows to find out bugs in the code due to, for example, typos in property names. It seems that the service is freely available but unfortunately, if I understood correctly, the tool is not FOSS.

  • Technical Debt - the code monster in everyone's closet: Nina Zakharenko

The interesting bit about this presentation is that anyone that worked on a reasonable size project could relate to what was presented. There are many times where I thought that I have been in the situation described and some time when I thought I wasn't doing too bad (but here I guess it depends on the projects). There were some good elements to help figuring out the size of the debt as well as some good ideas on how to organize the work to reduce this debt.

  • Achieving continuous delivery: An automation story: James Cammarata

This presentation was about Ansible and how different companies are using it to automate their deployments. Several examples of companies were given, some even integrating Ansible with an IRC bot allowing everyone on the IRC channel to see what the other admins are doing.

  • Build and test wheel packages on Linux, OSX and Windows: Olivier Grisel

Wheel are a format that can be used to compile python packages into binaries that can then be installed on multiple platforms. There are clearly some advantages in this but I am not quite convince especially with regards to architecture specific code and the different architectures that we have today (x86, arch, arch64, ppc...) But anyway, since Fedora does not allow shipping binary files directly wheel isn't quite an option for us. On the other hand it might be one for applications such as liveusb-creator or pyrasite that aim at being cross-platform.

  • Graph database patterns in python: Elizabeth Ramirez

The presenter of this talk works at the New-York Times journal and was presented the approach the use internally (as well as the tools and library) to store semantic concepts, link them and navigate the graph they make. After the presentation I ended up having a very interesting talk about the difference between full-graph database and rdf databases and what the former allows that the later does not. While I am still a little unclear about this difference, it was a really interesting conversation and something I would like to look further into if I was still working with/on semantic web technologies.

Sunday

  • Keynote: Van Lindberg

This was a presentation from the head of the PSF board about the state of the python community and python in general, how it went from being a trendy language when it was created into something stable and sure these days, but also how other languages are growing, potentially threatening python by being the new trendy languages. Community wise, I have written one quote from this talk that I really like:

  A community where people interact only when they are paid to do
  so is not a community, it's a bunch of mercenary
  • keynote: Jacob Kaplan-Moss

This was a great talk about the perception that we have as developers of themselves. For example, did you realize that there are two kinds of developers: the great ones and the terrible ones while if the quality of a developers could be quantified we know that just like everything else it would follow a normal distribution, ie: most people would be average developers and only a few would be great and a few would be terrible. If you have seen it I would like to say:

 Hello, I'm pingou, I'm a mediocre programmer

If you haven't seen it, I invite you to watch it as it was an inspiring talk, really.

  • Interactive data for the web - Bokeh for web developers: Sarah Bird

Bokeh is a library that can be used to create interactive graph that can be included in web pages. The examples shown during the presentation were really impressive and while it probably needs some understanding of the different ideas, concepts and of the library itself, it is definitively something I will look into the next time I have to do some data visualization.

  • WebSockets from the wire up: Christine Spang

While I have heard about web-socket I have not had the opportunity to play with them more than this. In this talk the history and principles of web-socket was described, giving a nice idea of what they can be used for. I must say I know kinda want to play more with them, build more reactive UI using web-sockets. However, for the projects I work on these days I feel it would be a little bit overkill. Maybe for next one ;-)

  • Type hints: Guido Van Rossum

This was a very similar presentation to the one Guido gave during the language summit, presenting the work coming in python 3.5 to support type documentation in function definition. Here, as well as during the language summit, I got quite enthusiast about this idea but the syntax of putting the type in the function definition is really not appealing to me. It makes the function definition both harder to read and, in some case, much longer. To be honest I would love to see the same syntax be supported in docstring which is where I believe it belongs (plus, as a bonus, it kind of encourage developers to document their code, if you start writing docstring for the type, maybe you can add documentation about the arguments themselves and the function, and so on).

  • Keynote: Gary Bernhardt

This keynote was probably the most technical keynote we had (except for Guido's presentation just before). It presented a comparison between strong type languages and dynamic type languages.

This is it for the talks I attended. There are more talks I would have liked to see but either I was doing something else or there was another talk at the same time. Luckily all the talks have been recorded and are available on youtube.

Among the talks I would like to see are:

  • Building secure systems - lvh
  • What can programmers learn from pilots? - Andrew Godwin
  • "Words, words, words": reading Shakespeare with Python - Adam Palay
  • Is your REST API RESTful - Miguel Grinberg
  • l18n: World domination the Easy Way - Sarina Canelake
  • Good test, Bad test - Dan Crosta
  • How our engineering environments are killing diversity (and how we can fix it). - Kate Heddleston
  • Open Source for Newcomers and the People who want to welcome them - Shauna Gordon-McKeon
  • Cutting off the internet: Testing applications that uses requests - Ian Cordasco
  • Rethinking packaging, development and deployment - Domen Kozar
  • Describing descriptors - Laura Rupprecht
  • Avoiding burnout, and other essentials of Open Source self care - Katheleen Danielson
  • Python performance profiling: the Guts and the Glory - A. Jesse Jiryu Davis

As you can see I'm in to spend few hours watching youtube videos :)

The third part of the conference was the sprints.

The idea of the sprints is to take advantage of the fact that many developers come to the conference to keep them a little longer and offer them projects to work on.
During these four days, you can see people hacking on Django, MailMan, Jython, CPython itself, sage, pypy and many more projects. I took this opportunity to spend more time with the people from my team not that we don't work together most of the time but it is nice to be working together in the same room. As for the project, most of the time has been spent on making pagure closer to something we would want to deploy/use. I must say that at the end of this week, since are looking good. Pagure now has support for webhooks, pull-requests can be assigned, they have a score and the project can require a certain score for a pull-request to be merged. Basically, for what I want pagure needs: a) more documentation, b) more unit-tests and c) more tests and d) support to upload tarball/release (although this might arrive only in 0.2). So once documentation and unit-tests are there, I will tag a 0.1 release and move pagure to production (I'll announce it here so keep in touch! ;-))

As final words, I started this (long, sorry) blog post with saying how lucky I am to actually having been able to attend this conference and I would like to thanks Red Hat in general and more precisely the OSAS team that funded my flights and pass for the conference.

Friday, December 12 2014

Infra FAD 2014 - Part 2: Ansible

Part 1: MirrorManager

It has been two days since I came back and others have already reported about our progress (Ralph, kevin day 0 & 1, kevin day 2, kevin day 3, kevin day 4 and finally, kevin day 5) but I wanted to came back on it as well :)

So seven of us from the Fedora Infrastructure team meet up in Raleigh in the Red Hat office there. We had Matt Domsch for the first couple of days to help us understanding and apprehending how MirrorManager works (see Part 1).

The second part of the FAD was dedicated around moving forward the infrastructure task of moving away from puppet in favor of Ansible. This is led to the most productive week we ever had on our Ansible git repo. I have been able to start porting things like varnish or haproxy while Ralph was doing the heavy lifting on working on porting the proxies themselves. Patrick worked on porting the nameservers and managed to actually re-install them using Ansible (and moving them to RHEL7 while at it). Smooge has been poking at the setup for fedorapeople.

With all that we also managed to get MirrorManager2 in staging and Luke wrote some awesome unit-tests for mirrorlist which already allowed us to make still some small optimizations.

All in all, I have to say that I have had a great time. I have the feeling that we achieved a lot of what we wanted to do and that we have been really efficient at it :-)

To remain critical about the organization. I think I agree with Ralph that for the next FAD we should be extra-careful to really organise some sort of social event. We have had strange hours (having lunch at 3pm or even 5pm once) and the one afternoon where we said we would take off we ended up working... Being involved in the organization while not on site makes it difficult to find something nice for the social event, but I think we/I should have tried harder to find something nice to do.

Anyway, like I said, I have a great time and I'm thankfull to everyone that have been able to make it to Raleigh, to the OSAS team at Red Hat that funded most of this FAD and to Ansible for inviting us for dinner on Friday evening :-)

Thanks a bunch folks!

DSC_0026.1.JPG

Saturday, December 6 2014

Infra FAD 2014 - Part 1: MirrorManager

The last two days have been quite busy for the Fedora infrastructure team. Most of us are indeed meeting up in Raleigh, in the Red Hat tower down-town and together with Matt Domsch, the original developer of MirrorManager, we have been on MirrorManager2.

It was really great for us that Matt could join. MirrorManager is pretty straight forward in theory but also full of small details which can make it a hard to understand fully. Having Matt with us allowed us to ask him as many questions as we wanted. We were also able to go with him through all the utility scripts and all the crons that make MirrorManager working.

The good surprise was that a significant part of the code was already converted for MirrorManager2, but we still found some crons and scripts that needed to be ported.

So after spending most of the first day on getting to understand and know more about the inner processes of MirrorManager, we were able to start working on porting the missing parts to MirrorManager2.

We also took the opportunity to discuss with Matt, Luke and David how things should look like for atomic and Ralph was able to make the first changes to make this a reality :-)

So yesterday evening we had all the crons/scripts (but one in fact that one isn't needed for MM2) converted to MirrorManager2 \ó/

That was a good point to stop and go quickly to the Red Hat Christmas party before meeting Greg who invited us for a dinner sponsored by Ansible. We had a really nice meal and evening, thanks Greg, thanks Ansible!

Today started the second part of the FAD: Ansible, but more on that later ;-)

Thursday, November 27 2014

Python multiprocessing and queue

Every once in a while I want to run a program in parallel but gather its output in a single process so that I do not have concurrent accesses (think for example, several process computing something and storing the output in a file or in a database). I could use locks for this but I figure I could also use a queue.

My problem is that I always forget how I do it and always need to search for it when I want to do it again :-) So for you as much as for me here is an example:

# -*- coding: utf-8 -*-

import itertools
from multiprocessing import Pool, Manager


def do_something(arg):
    """ This function does something important in parallel but where we
    want to centralize the output, thus using the queue
    """
    data, myq = arg
    print data
    myq.put(data)
    myq.task_done()


data = range(100)
m = Manager()
q = m.Queue()
p = Pool(5)
p.map(do_something, itertools.product(data, [q]))


with open('output', 'w') as stream:
    while q.qsize():
        print q.qsize()
        item = q.get()
        print item
        stream.write('%s\n' % item)
    q.join()

There are probably other/better ways to do this but that's a start :-)

Thursday, June 26 2014

Faitout, 1000 sessions

A while back, I introduced faitout on this blog.

Since then I have been using it to tests most if not all the project I work on. I basically use the following set-up:

DB_PATH = 'sqlite:///:memory:'
FAITOUT_URL = 'http://209.132.184.152/faitout/'
try:
    import requests
    req = requests.get('%s/new' % FAITOUT_URL)
    if req.status_code == 200:
        DB_PATH = req.text
        print 'Using faitout at: %s' % DB_PATH
except:
    pass

This way, if I have network, the tests are run with faitout and thus against a real postgresql database while if I do not have network, they run against a sqlite in memory database.

This set-up allows me to work offline and still be easily able to run all the unit-tests as I change the code.

What the point of this blog was actually more to announce the fact that despite it's limited spread (only 25 different IP addresses have requested sessions), the tool is used and it has already reached the 1,000 sessions created (and dropped) in less than a year.



If you're not using it, I am inviting you to have a look at it, I find it marvelous in combination with Jenkins and it does help finding bugs in your code.

If you are using it, congrats and keep up the good work!!

Tuesday, June 17 2014

Fedocal 0.7

This morning I released fedocal version 0.7.1.

With this version comes a number of new features which I thought would be nice to advertise a little :-)

The main calendar view & the menu

The main calendar view has had two additions:

  • a pop-up stipulating if there are meetings present that week that are not displayed in the current window (for example, if you're seeing the meetings from 8am to 6pm and there is a meeting at 7pm, or at 4am).
  • shortcuts to interact more easily with the calendar. These shortcuts contains three actions: Add a meeting, switch to list/calendar view, iCal feed.

The menu now highlights the calendar you are looking at to make things easier on you.

popup-highlight-shortcuts.png

The list view

When viewing a calendar as list, fedocal will now automatically scroll down to the display the meetings of today or the future meetings.

In addition, this page also has the three shortcut buttons mentioned above (add meeting, switch view mode, iCal feed).

autoscroll-shortcuts.png

The detail view

We have added three new features in the page showing the details of a meeting

  • permalink: when the user clicks on the pop-up showing the details of a meeting the url is updated to provide a permalink to that specific meeting. This allows one to copy/paste the url and send it to someone.
  • countdown: with the help from mpduty we have added a countdown in the meeting detail view showing the remaining time before the meeting starts. This can nicely circumvent the timezone conversion if you are not logged in fedocal and want to know when a meeting starts
  • UTC titles: if you hover over the dates/times with your mouse, the date/time will be shown in UTC which is always handy as in our community UTC remains quite often the most used way to communicate date/time.

detail_view2.png



I would like to take here the opportunity to thank kparal, ralph, willo, red and lbrabec for their bug reports and RFE that led to all these changes which I think are making fedocal 0.7.1 its best release so far :-)

Friday, June 6 2014

Small update on dgroc

It has been a little while since I last spoke about dgroc, the daily git rebuild on copr program.

The problem is that it kinda already lost its name...

Thanks to the great work of Miro Hrončok dgroc now supports mercurial as well :)

Miro took that opportunity to add a generic structure making it easy for other source configuration management software to be added!

Miro also fixed dgroc to take into account the release number used and automatically bump it upon rebuild.

So if you wanted to use dgroc but could not because you project was using mercurial, well now you have no excuses anymore!

Thursday, March 20 2014

Introducing dgroc

dgroc for ''Daily Git Rebuild On Copr''.

copr is a build system made publicly available to Fedora contributors and allowing to provide package repository for packages that are not or cannot be part of the standard Fedora repositories. There are multiple reasons a package is allowed in copr but not in the standard repositories, for examples:

  • bundled libraries in the sources that have not been cleaned
  • unstable version
  • version introducing too many changes to be introduced to a stable Fedora release
  • packages that are in the process of being integrated into Fedora but have not yet been approved


The use-case for dgroc is the second point on this list: unstable version.

I know some of us out here are crazy testers and for two projects I was interested in having daily builds, this allows easy install/update (just run yum/dnf) and easy testing.

What dgroc does is providing an easy way to automatically build packages on copr from a git repository.

It works fairly simply:

  • Create a ~/.config/dgroc file and include in it some basic, generic information that will be needed either to update the spec file, make the source rpm available or build on copr:
[main]
username = me
email = my_email@example.com
copr_url = https://copr.fedoraproject.org/
upload_command = cp %s /var/www/html/subsurface/
upload_url = http://my_server/subsurface/%s
#no_ssl_check = True # no longer need now that copr has a valid ssl cert


  • Then for each project you have to define at least three information, for example for subsurface:
[subsurface]
git_url = git://subsurface.hohndel.org/subsurface.git
git_folder = /tmp/subsurface/
spec_file = ~/dgroc/subsurface.spec

Eventually, you can specify a patch_files argument that will be a comma-separated list of patches that are need to build the project.

All what dgroc does from there is:

  • clone the git repo if it is not already in the filesystem
  • run a git pull to get the latest changes
  • generate a new tarball (in the rpm %_sourcedir)
  • update the spec file (release, source0 and changelog)
  • generate the source rpm
  • move that source rpm somewhere to make it available to copr (see the upload_command in the config file
  • start the build on copr



I have been running dgroc for both subsurface and guake and it seems to work fine :)

The project isn't packaged yet but I thought I would announce it in case there are people interested in testing it and reporting bugs and RFE.

Hope you like it! :)

Tuesday, December 10 2013

RDFa with rdflib, python and cnucnu web

source.png

Fooling around with RDFa and some projects

Continue reading...

Monday, October 28 2013

Faitout - test against a real database

  • Do you do unit-tests?
  • Do you do continuous integration?
  • Do you use sqlite for your tests while deploying against postgresql?
  • Do you hate using sqlite for your tests?


If you answer 'yes' to any of those three questions, the following post is for you.

Otherwise, well, stay, it might still be interesting ;-)

When doing unit-tests you want to have something fast which allows you to quickly see if your last changes affect other part of your code.

sqlite is great for that. You can easily create in memory database, no FileIO, it all goes fast and smooth.

That is until you push your application to production where it is deployed against a real database system such as PostgreSQL. Then suddenly, queries which run fine under sqlite start breaking under PostgreSQL. sqlite and PostgreSQL implements some things differently and this leads to this kind of situation.

The solution for this is of course to run your tests in an environment as close as possible from the production on, ie: run your tests on the same database system as the one you use on production.

But this can also become complex, it means setting up a new database server, create a new database, clean the database after the tests, handle permissions...

With this in mind, project such as postgression appeared.

The idea is simple: easily get access to postgresql databases which are thrown away after a certain time.

The problem is that postgression is not FOSS, thus when a couple of weeks ago there was no way to get a database, there was also no way to set up our own postgression server that could be used by a restricted number of person.

So after discussing it with Aurélien, somewhere between lunch and dessert, faitout appeared.

The idea was simple, have a small web application, create on the fly a user and a database made available to the on who asks and after 30 minutes (via a cron job for the moment) destroy the database and the user.

The API is pretty simple and all is documented on the front page of the application.

So feel free to have a look at it, test it, break it (but let us know how you did that ;-)) at the test instance we have:

http://209.132.184.152/faitout/

Thursday, March 28 2013

GNOME-tagger

Surprise!!

Over the last few weeks Ralph Been and I have been working on the new version of tagger.

One of the idea of this new version is the integration with gnome-software that Richard Hughes introduced at the beginning of the month.

In order to do so, this new version comes with a clearly defined API.

To be completely honest, Ralph and I had already started to think about defining an API before Richard's post on gnome-software. One reason for which we started on this is for gnome-tagger:

tagger.py3.png

GNOME-tagger is a desktop application, writen in python and GTK3, developed as a GNOME application which is an alternative to the tagger web application. With it you can, add tags to packages, vote on tags (up vote or down vote), see the statistics about tags on the fedora package collection as well as see who is winning the tagger game. Pretty much all what you can do from the web application, but locally :-)

tagger_menu.png

This is still a little bit work in progress as we still need to assess how we want to handle authentification and the blacklist of tags as well as anonymous tags but hopefully we will have this new version ready soon and you can start playing with gnome-tagger!



Note: The new tagger API supports rating apps as well, but at the moment this something integrated with gnome-software but not with tagger web application or gnome-tagger.

- page 1 of 4