New FMN architecture and tests
By Pierre-Yves on Saturday, June 25 2016, 13:23 - Général - Permalink
New FMN architecture and tests
Introduction
FMN is the FedMsg Notification service. It allows any contributors (or actually, anyone with a FAS account) to tune what notification they want to receive and how.
For example it allows saying things like:
- Send me a notification on IRC for every package I maintain that has successfully built on koji
- Send me a notification by email for every request made in pkgdb to a package I maintain
- Send me a notification by IRC when a new version of a package I maintain is found
How it works
The principile is that anyone can log in on the web UI of FMN there, they can create filters on a specific backend (email or IRC mainly) and add rules to that filter. These rules must either be validated or invalited for the notification to be sent.
Then the FMN backend listens to all the messages sent on Fedora's fedmsg and for each message received, goes through all the rules in all the filters to figure out who wants to be notified about this action and how.
The challenge
Today, computing who wants to be notified and how takes about 6 seconds to 12 seconds per message and is really CPU intensive. This means that when we have an operation sending a few thousands messages on the bus (for example, mass-branching or a packager maintaining a lot of packages orphaning them), the queue of messages goes up and it can take hours to days for a notification to be delivered which could be problematic in some cases.
The architecture
This is the current architecture of FMN:
| +--------\ | read | prefs | write | +---->| DB |<--------+ | | \--------+ | | +-----+---+---+ +---+---+---+---+ +----+ | | |fmn.lib| | |fmn.lib| | |user| v | +-------+ | +-------+ | +--+-+ fedmsg+->|consumer | |central webapp |<-----+ + +-----+ +---+| +---------------+ | |email| |irc|| | +-+---+--+-+-++ | | | | | | v v v
As you can see it is not clear where the CPU intensive part is and that's because it is in fact integrated in the fedmsg consumer. This design, while making things easier brings the downside of making it pratically impossible to scale it easily when we have an event producing lots of messages. We multi-threaded the application as much as we could, but we were quickly reaching the limit of the GIL.
To try improving on this situation, we reworked the architecture of the backend as follow:
+-------------+ Read | | Write +------+ prefs DB +<------+ | | | | + | +-------------+ | | | | +------------------+ +--------+ | | | | |fmn.lib| | | | | v | | +-------+ |<--+ User | | +----------+ +---+ | | | | | fmn.lib| | Central WebApp | +--------+ | | | +------------------+ | +----->| Worker +--------+ | | | | | fedmsg | +----------+ | | | | | | +----------+ | | +------------------+ | | fmn.lib| | +--------------------+ | | fedmsg consumer | | | | | | Backend | +-->| +------------>| Worker +--------------->| | | | | | | | | +-----+ +---+ +---+ | +------------------+ | +----------+ | |email| |IRC| |SSE| | | | +--+--+---+-+-+--+-+-+ | | +----------+ | | | | | | | fmn.lib| | | | | | | | | | | | | | +----->| Worker +--------+ | | | | RabbitMQ | | RabbitMQ | | | | +----------+ | | | | v v v | | | v
The idea is that the fedmsg consumer listens to Fedora's fedmsg, put the messages in a queue. These messages are then picked from the queue by multiple workers who will do the CPU intensive task and put their results in another queue. The results are then picked from this second queue by a backend process that will do the actually notification (sending the email, the IRC message).
We also included an SSE component to the backend, which is something we want to do for fedora-hubs but this still needs to be written.
Testing the new architecture
The new architecture looks fine on paper, but one would wonder how it performs in real-life and with real data.
In order to test it, we wrote two scripts (one for the current architecture and one for the new) sending messages via fedmsg or putting in messages in the queue that the workers listens to, therefore mimiking there the behavior of the fedmsg consumer. Then we ran different tests.
The machine
The machine on which the tests were run is:
- CPU: Intel i5 760 @ 2.8GHz (quad-core)
- RAM: 16G DDR2 (1333 Mhz)
- Disk: ScanDisk SDSSDA12 (120G)
- OS: RHEL 7.2, up to date
- Dataset: 15,000 (15K) messages
The results
The current architecture
The current architecture only allows to run one test, send 15K fedmsg messages and let the fedmsg consumer process them and monitor how long it takes to digest them.
Test #0 - fedmsg based Lasted for 9:05:23.313368 Maxed at: 14995 Avg processing: 0.458672376874 msg/s
The new architecture
The new architecture being able to scale we performed a different tests with it, using 2 workers, then 4 workers, then 6 workers and finally 8 workers. This gives us an idea if the scaling is linear or not and how much improvement we get by adding more workers.
Test #1 - 2 workers - 1 backend Lasted for 4:32:48.870010 Maxed at: 13470 Avg processing: 0.824487297215 msg/s Test #2 - 4 workers - 1 backend Lasted for 3:18:10.030542 Maxed at: 13447 Avg processing: 1.1342276217 msg/s Test #3 - 6 workers - 1 backend Lasted for 3:06:02.881912 Maxed at: 13392 Avg processing: 1.20500359971 msg/s Test #4 - 8 workers - 1 backend Lasted for 3:14:11.669631 Maxed at: 13351 Avg processing: 1.15160928467 msg/s
Conclusions
Looking at the results of the tests, the new architecture is clearly handling its load better and faster. However, the progress aren't as linear as we like. My feeling is that retrieve information from the cache (here redis) is at one point getting slower, eventually also because of the central lock we tell redis to use.
As time permits, I will try to investigate this further to see if we can still gain some speed.