Over the last few months I've been reading more than the usual number of papers on a selection of software development topics that are of recent interest to me. The topics have been fairly far flung as there are a few projects I have been poking at in my free time.
By way of example, I took a couple weeks reading about transitory trust algorithms that are resistant to manipulation, which is a pretty interesting problem with some rather elegant (partial) solutions which are actually implementable at the individual agent level, though computationally impractical if you wish to simulate a whole network which thankfully was not what I was interested in. (So reasonable for implementing real-world systems with, though not simulations or finding definitive solutions to specific problems.)
This past week I've been reading up on a variety of B-tree algorithms. These have been around since the early 1970s and are extremely common in all sorts of software, so one might expect that after 40+ years of continuous use of such a simple concept that there'd be very little to talk about, but it's quite a vast territory. In fact, each year for the last two decades Donald Knuth has held a public lecture around Christmas-time about trees. (Yes, they are Christmas Tree Lectures. ;) Some of the papers I've been reading were published in just the last few years, with quite a bit of interesting research having gone on in this area over the last decade.
The motivation for reading up on the topic is I've been looking for a tree that is well suited to storing the sorts of indexes that Akonadi Next is calling for. They need to be representable in a form that multiple processes can access simultaneously without problems with multiple readers and (at least) one writer; they also need to be able to support transactions, and in particular read transactions so that once a query is started the data being queried will remain consistent at least until the query is complete even if an update is happening concurrently. Preferably without blocking, or at least as little blocking as possible. Bonus points for being able to roll-back transactions and keeping representations of multiple historic versions of the data in certain cases.
In the few dozen papers I downloaded onto the tablet for evening reading, I came across Transactions on the Multiversion B+-Tree which looks like it should do the trick nicely and is also (thankfully) nice and elegant. Worth a read if you're into such things.
As those who have been following Akonadi Next development know, we are using LMDB for storage and it does a very nice job of that but, unfortunately, does not provide "secondary" indexes on data which Akonadi Next needs. Of course one can "fake" this by inserting the values to be indexed (say, the dates associated with an email or calendar event) as keys with the value being they key of the actual entry, but this is not particularly beautiful for various reasons, including:
- this requires manually cleaning up all indexes rather than having a way to efficiently note that a given indexed key/value pair has been removed and have the indexes cleaned up for you
- some data sets have a rather low cardinality which would be better represented with approaches such as bitmap indexes that point to buckets (themselves perhaps trees) of matching values
- being able to index multiple boolean flags simultaneously (and efficiently) is desirable for our use cases (think: "unread mails with attachments")
- date range queries of the sort common in calendars ("show this month", "show this week", e.g.) could also benefit from specialized indexes
I could go on. It's true that these are the sorts of features that your typical SQL database server provides "for free", but in our case it ends up being anything but "free" due to overhead and constraints on design due to schema enforcement. So I have been looking at what we might be able to use to augment LMDB with the desired features, and so the hunt for a nice B+-tree design was on. :) I have no idea what this will all lead to, if anything at all even, as it is purely an evening research project for me at the moment.
They application-facing query system itself in Akonadi Next is slowly making its way to something nice, but that's another topic for another day.