adams's blog

adams's picture

Now With Extra Added Fellowship

Last week I was surprised to receive an email from Shane Coughlan inviting me to become a Fellow of the OpenForum Academy. The current Fellowship has some of my personal heroes in there and so it was exciting and humbling to receive such an offer.

From their website:

OpenForum Academy is a think tank with a broad aim to examine the paradigm shift towards openness in computing that is currently underway, and to explore how this trend is changing the role of computing in society.

and:

OpenForum Academy is an independent programme established by OpenForum Europe. It has created a link with academia in order to provide new input and insight into the key issues which impact the openness of the IT market. Central to the operation of OpenForum Academy are the Fellows, each selected as individual contributors to the work of OFA. A number of academic organisations have agreed to work with OFA, working both with the Fellows and within a network of contributors in support of developing research initiatives.

During Akademy last week I was very impressed by the keynote presentation given by Will Schroeder of Kitware. At the core of his talk was a concern that research should be open by default, but it isn’t: peer review is a black box and publishers charge a fortune for access to the final paper. Over dinner we spoke at length on this matter.

I am lucky in that my career is no-longer tied to my publication record. This allows me the freedom to do my research when and where I want. Typically this means I treat all my research as “work in progress”. As and when I make little steps forward, I publish what I have done in my blog and gather feedback.

I have now taken the decision that whenever I have a complete piece of work I will publish:

adams's picture

GIMP Doesn’t

GIMP does not drive me to the airport if my flight is before 10am.

adams's picture

I'm Going To Akademy; So Is Kolab

So, its that time of year again, the annual meeting of all things KDE… Akademy! This year it is coming to you from Tallin, Estonia. This year will be my 6th outing to the event :)

Of course, KDE is very dear to me and to Kolab and so, in addition to me, there will be a few other members of the Kolab community will meet at this year’s Akademy. Key contributors Christian Mollekopf and Jeroen van Meeuwen will be present and available to discuss Kolab related issues. Jeroen will also give a talk about release engineering processes using KDE as an example. His experience from the Fedora Project, Cyrus IMAP, Cyrus SASL and from his roleas a Systems Architect at Kolab Systems provides him with ample experience to give some insight into how release engineering and quality assurance within the fast-paced KDE project could be improved further.

The Kolaborators will also be taking part in a Task Management sprint featuring Zanshin and Kolab developers. If you are interested in task management in KDE, you are invited to join. The sprint will focus on counting work into bringing Zanshin-like experience to Kolab on the desktop and web. This meeting will take place during the workshop week after the main conference; no date or time has been set yet, but if you track down me, Christian or Kevin Ottens we’ll work it out.

Akademy is one of my favourite conferences of the year and I’m really excited to be catching up with my KDE buddies. If you want to talk about Kolab (or anything else) just come track me down… I’ll be around until Wednesday.

adams's picture

Cyrus IMAP: What Happened When They Switched To GIT?

So recently Jeroen van Meeuwen asked me to take a look at Cyrus IMAP. He had been involved in their switch from CVS to GIT and was curious to see what the results looked like. Let’s start with the usual green blobs:

Cyrus IMAP: Full History in Green Blobs (Click to Enlarge)

So, since I do not know precisely when the switch from CVS to GIT was made, I’m using Jeroen’s start date in the project (2010-09) as a rough guideline. Looking at the green blobs it is pretty clear that something happened after he joined the project. But let’s start by looking at what was going on before he joined.

Between 1993-07 and 2010-09 there were 25 accounts in CVS. Note: accounts and not contributors; clearly some of these accounts belong to the same person. For much of these first 7 years the project is also displaying what I refer to as “token-based development”; that is, in many weeks there is only one contributor (as if you had to hold a shared token to commit). I first noticed this phenomenon when studying Evince during my PhD and I have seen it only a couple of times since. I wish I could explain it.

Now, since 2010-09 we can see that 27 new accounts have contributed to the project; most are only around for one week (if we look deeper, I bet for only one or two commits) never to be seen again. Perhaps one of the effects of switching to GIT is that it is simply a lot easier for people to contribute? No brainer.

But I think there is slightly more at play here. How did Cyrus IMAP manage to get to its 17th anniversary and then basically double the size of its developer community just because of a switch to GIT? A project of such importance surely must have been attracting more folk before the switch to GIT… It is not like activity has increased significantly since the switch, right? Right?

Let’s take a look at some simple measures… Commit and Committers per month:

adams's picture

Getting Cohesive

So, in my mission to see how we can automatically detect “core” teams, I need a measure for how closely people work together. Those of you with strong memories will remember I once coined the term “cohesion” for this measure. I introduced it in a paper at the International Conference on Software Maintenance, three years ago and blogged about it around that time.

This measurement is based on some basic graph theory that I have been over before. But for the sake of completeness here is a quick recap. Let’s start by taking a look at a graph which represents one month of KDELIBS development, in this case, April 2009 (click to enlarge):

Each node here represents someone who has committed to KDELIBS in the month. The edges represent resource sharing: two nodes are connected if the committers both commit to the same file in the month. These edges have a weight (not shown) which is the number of shared files between the nodes.

Using the Floyd-Warshall algorithm it is possible to find the shortest paths between all pairs of nodes in the graph. This, in turn, allows us to find the mean shortest path length and this is what I call “community cohesion” (which should not be confused with graph structural cohesion). Now, this number is not really comparable between communities; their differing working practices really disallow this. However, within a community, we can certainly trend this metric and see how it varies over time. Perhaps, for example, certain events (such as release deadlines) cause the metric to increase? An increase in this metric shows the community is working together more tightly (higher edge weights, contributors sharing more resources).

The next step, of course, is then to actually measure this and see how the trend looks for different projects. So, I have picked KDEPIM and KDELIBS to look at; below is their cohesion trends for the 120 months from 2001 to 2010 (click to enlarge):

adams's picture

KDEPIM: A Little Look At 2011

So this is about the time I usually do my annual review of activity in KDE SVN. Of course I have now stopped my analysis of KDE SVN and moved on to git. Instead of analysis every repo in KDE git, I will focus on what happened in KDEPIM in 2011 (KDEPIM exclusively, no PIMLIBS or PIMRUNTIME).

OK, to kickoff, the green blobs (click to enlarge):

The first thing I noticed here is that there is no account which has committed in every week of 2011. Notice, also, Laurent; he is not the most regular contributor to this repo (he committed in 67% of the weeks) and yet he is one of the most regular contributors. If we look at commits per committer, we get the following top 10 for last year:

  58 Bjoern Ricks
  89 Christophe Giboudeauw
 109 Torgny Nyblom
 142 Volker Krause
 162 Script Kiddy
 195 Tobias Koenig
 196 Allen Winter
 273 David Jarvie
 273 Sergio Martins
1198 Montel Laurent

The second thing I noticed about the green blobs is how “white” that image is towards the bottom; that is, developers whose first commit for KDEPIM in 2011 was after the first week tended not to stay around too long. This for me feels like the people towards the top are most-likely part of an existing “core” team.

My “Oracle of Ervin” tool reveals Laurent to be the most highly-connected developer in this repo; this comes as no surprise. If we visualise the community we can see him along with others in the “core” of the community (click to enlarge):

adams's picture

Collatz Conjecture: Dabbling with Python and Graphviz

[This is slightly off topic from my usual Free Software analysis.]

So the Collatz Conjecture came to mind. I took a look at the Wikipedia article and was struck by a couple of things: I liked the stopping time (the number of steps you have to take to get from the given starting number to 1) plot and the graph showing the paths from certain starting numbers to 1.

Both also disappointed me for not showing enough data; this had clearly been done for clarity. Fair enough, but sometimes if you throw enough data in a visualisation it just “looks” right. Right? (OK, this is far from true). So, since it had been a while since I had last dusted off my Python and Graphviz skills, I thought I would try to replicate these visualisations, just with more data.

So let’s start with the stopping time plot (click to enlarge):

Nice pattern. Hardly exciting.

What is a little more fun is the graph showing the paths from given starting numbers back to 1 (click to see the full image, 36mb):

adams's picture

When Git Push Comes To Shove

[If you are not familiar with the English idiom "When push comes to shove" you can read more here.]

For some time I have been hesitant to start publishing data about usage of Git. You see, when a community changes a tool as fundamental as the SCM it will need to change its processes (to some degree). Of course, this is often the reason why the SCM has been switched. It is also the first reason why it is difficult to compare SVN data from “before” to Git data from “after”. Reason 2 is that the two systems work in very different ways. A commit in a DVCS is very different from a commit in a centralised system. It is probably the “push” that is more comparable. Right? Right??

Let’s take a look at the daily commits for KDEPIM:

KDEPIM switched to Git on 28 January, 2011 (or thereabouts). Before this date the average daily commits was 16 (14 in the month prior), after it drops to 11. I’m sure the KDEPIM community is not crying into its collective beer tonight. Here’s why:

  • Human factor: The initial large drop in commit rate could easily be caused by people needing to learn how to use Git properly.
  • Process factor: Git allows the user to squash multiple commits into one.

The change of tool will always have human and process impacts. Here I have suggested just one of each; there are many more. But these factors plausibly explain my concern with coming forward with Git data… It is up to me to make it absolutely clear why (or potentially why) the figures change in the way that they do. Whilst the need for education and commit squashing are two factors that might apply to any project, the factors that actually apply can only really be revealed by those directly involved.

So what can we conclude? Two things:

  1. The impact of the switch to Git can be shown in the measurement of something as simple as daily commits;
  2. Watching the new trends develop over time is going to be fascinating.
adams's picture

Delving Into Git (KDEPIM)

OK, now KDE is 15 years old, it is time for my work to grow up and start looking at git. One of the questions I get asked from time to time is how much code rewriting I will need to do in order to for with git. Thankfully… none.

All of my scripts parse SVN logs and it is easy enough to get git to give back logs in SVN format. Just like this:

git log –reverse –format=”<logentry revision=\”%H\”>%n  <author>%ae</author>%n  <date>%ci</date>%n</logentry>”

So as a first brief experiment with git, here is the result of generating the green blobs for KDEPIM (click to enlarge):

adams's picture

So What Does 15 Years Of KDE Look Like?

So, I thought I would take a quick look at what KDE community “looks” like after 15 years under development. So here I will briefly show off three visualisations with no particular comment. I will just leave them here for your amusement.

So let’s start with the now-infamous green blobs (click to enlarge):

Green Blobs for KDE's First 15 Years

For the uninitiated, a quick lesson: Each column in this visualisation represents the commit history of everyone who has committed to KDE SVN. Each row represents a week, with the most recent weeks being at the top. If the contributor committed during that week, they get a green blob, otherwise it is left empty. For each column the committer, the date of their first commit and the % of weeks in which they committed (of those they /could/) is given.

You might remember from my last blog post that I charted the growth in the number of accounts in KDE SVN. With such a steady growth in contributors, should we expect something similar in the daily commits and committer trends? Of course we should…

  • Daily Commits (click to enlarge):

http://dl.dropbox.com/u/46229283/images/commits-kde15.png

I will admit that I have doctored this data ever-so-slightly in order to filter out the days in which script went crazy and created 1000s of commits by itself.

  • Daily Committers (click to enlarge):

Daily Committers in KDE SVN