Le blog de pingou

To content | To menu | To search

Tag - datagrepper

Entries feed

Thursday, February 25 2021

datanommer/datagrepper investigations

A few team members of the CPE team have investigated how to improve the performances of datanommer/datagrepper.

Continue reading...

Tuesday, February 28 2017

Some stats about our dist-git and updates

I recently started looking at our dist-git usage but my data was a little limited.

Instead of querying datagrepper I managed to access directly the data in the database to get some stats:

Dist-git commits

Here is the output:

Over 1582 days (from 2012-10-08 to 2017-02-28)
   There was an average of 376.300884956 commits per day
   The median is of 327.0 commits per day
   The minimum number of commits was 1
   The maximum number of commits was 34716

For the average and median we removed all the days where there were more than 3,000 builds since they mostly concern mass-rebuilds (18 days were above 3000, and thus removed).

This is how it looks in a graph:

Commits in dist-git per day

dist_git_commit_per_day.png




Bodhi updates

Using the same data source, I went on to look at the number of bodhi updates flagged go to testing and the number of bodhi updates flagged to go to stable per day.

Here is the output:

Over 1541 days (from 2012-10-08 to 2017-02-28)
   there was an average of 76.9000648929 requests to testing per day
   The median is of 75 requests to testing per day
   the minimum number of requests to testing was 4
   the maximum number of requests to testing was 291
Over 1561 days (from 2012-10-08 to 2017-02-28)
   there was an average of 57.4477898783 requests to stable per day
   The median is of 54 requests to stable per day
   the minimum number of requests to stable was 1
   the maximum number of requests to stable was 217

(No data were removed there since there are no equivalent to mass-rebuild for these).

Graphically:

Updates requests for testing:

bodhi_requests_testing_per_day.png

Updates requests for stable:

bodhi_requests_stable_per_day.png

Thursday, February 23 2017

Some stats about our dist-git usage

You may have heard that there are some thoughts going on around integrating some continuous integration for our packaging work in Fedora.

Having in mind the question about how much resources we would need to offer CI in Fedora, I tried to gather some stats about our dist-git usage.

Querying datagrepper was as always the way to go, although the amount of data in datagrepper is such that it starts to be hard to query some topics (such as koji builds) or to go back very far in history.

Anyway, I went on and retrieved 87600 messages from datagrepper, covering 158 days.

Here is the output:

Over 158 days (from 2016-09-19 to 2017-02-23)
   There was an average of 554.430379747 commits per day
   The median is of 418.0 commits per day
   The minimum number of commits was 51
   The maximum number of commits was 10029
Over 158 days (from 2016-09-19 to 2017-02-23)
   There was an average of 254.151898734 packages updated per day
   The median is of 119.5 package updated per day
   The minimum number of package updated was 20
   The maximum number of package updated was 9612

To be honest I was expecting a little more, I'll try re-generating this data maybe in another way to see if that changes something, but that gives us a first clue

Wednesday, December 18 2013

Fedora packagers activity

Following up on the thoughts about activity on our packages using the last build date I was curious to investigate the activity of our packagers.

So here again, I wrote a script that uses FAS to retrive the list of people in the packager group. For each of these person, it then queries datagrepper for their last fedmsg message, thus retrieving the date of their last activity.

Graphically it looks like this: On the X axis is presented the number of packager whose last activity was on that day, on the Y axis is how many days ago that day was.

last_packager.png

Converted to a log scale, we get: On the X axis is the log of the number of packager whose last activity was on that day, on the Y axis is how many days ago that day was.

last_packager_log.png

On both graph the peak at the end represent the number of packagers for which no activity could be found on datagrepper.



To provide some more numbers:

  • There are 1476 user in the packager group
  • 224 were active today (day 0)
  • 878 (59.5%) were active over the last 30 days
  • 386 (26.2%) were not active for the last 100 days
  • 296 (20%) were not active for the last 200 days
  • 253 (17%) had no activity registered by fedmsg.
  • The oldest activity registered is from 308 days ago.

Tuesday, December 17 2013

Fedora package build history

Recently I have been thinking about a way to do mass-rebuild but only of packages that have not been built in a while (since the last release?).

At the moment, we only do mass-rebuild when there is a specific need to, for example a new version of GCC.

This is a very specific process which is ran over multiple days and just rebuilds all the packages. As a results, some packages that are of very low maintenance may just seat around, un-touched until the next mass-rebuild.

I was wondering if we could not simply take all the packages on rawhide and run, say once a month (or once a week, every day?), check when their last successfull build was and if older than X (to be defined), do a simple scratch build of the package. We could query koji or fedmsg via datagrepper to get the date of the last successful build of the package.

So technically it is duable, in theory it makes sense but the question is, in practice does it?

The first check to assess this is simply looking at the distribution last successfull dates of the packages.

So I wrote a small script querying the packagedb to get the list of all the packages and then queries datagrepper to retrieve the date of the last successful build. The number of days between this date and today is then computed and the output provides the number of packages that have been rebuild on each day.

Graphically it looks like this: On the X axis is presented the number of packages built on that day, on the Y axis is how many days ago that day was.

last_build.png

Converted to a log scale, we get: On the X axis is a log of the number of packages built on that day, on the Y axis is how many days ago that day was.

last_build_log.png

To provide some more statistics:

  • 14397 packages were checked
  • 49 packages were built yesterday (day 0, when the data was gathered),
  • 1 package has not been successfully built since 271 days ago
  • 66 packages have not been sucessfully re-built for 200 days or more
  • 11418 packages have not been sucessfully re-built for 100 days or more
  • The two peaks that can be seen are from 132 and 133 days ago (last mass-rebuild?)



Is this something worth persuing? Should we automatically re-build packages after a while and report in case the build fails?

What do you think?