In my last blog entry, I mentioned that we have been working on a comprehensive data loss prevention (DLP) and audit trail system for use with Kolab, with the end goal being not only DLP but also a platform for business intelligence. In that entry I listed the three parts of the system, noting that I'd be writing about one at a time. I had hoped to jump on the first of those a day or two after writing the entry, but life and work intervened and then I was off on a short family vacation ... but now I'm back. So let's talk about the capture side of the system.
Kolab can be viewed as a set of cooperative microservices: smtp, imap, LDAP, spam/virus protection, invitation auto-processing, web UI, etc. etc. There are a couple dozen of these and up until now they have all done the traditional, and correct, thing of logging events to a system log.
This has numerous drawbacks, however. First, on a distributed system where different services are running on different hosts (physical or VMs), the result is data spread over many systems. Not great for subsequent reporting. At the time of logging, the events are in a "raw" state: each service likely does not know about the rest of the Kolab services and thus how their events relate to the whole system. With logs going through the host systems it makes it difficult to ensure that they are not easily tampered with; this can be somewhat alleviated by setting up remote logging but this also only goes so far. Finally, logging tends to be a firehose of data and for our specific interests here we want a very specific sub-stream of that total flow.
So we have written yet another service whose entire job is to collect events as they are generated. This service is itself distributed, allowing collection agents to be run across a cluster running a Kolab instance, and it stores its data in a dedicated key-value store which can be housed on an isolated (and specially secured, if desire) system. The program running this service is called Egara, which is Sumerian for "storehouse", and it is written in Erlang due to its robustness (this service must simply never go down), scalability and distributed communication features. The source repository can be found here. Egara itself is part of the overall DLP/auditing system we have named Bonnie.
The high-level purpose of Egera is to create a consistent and complete history of what happens to objects within the groupware system over time. An "object" might be an email, a user account, a calendar event, a tag, a note, a todo item, etc. An event (or "what happens") including things such as new objects, deletions, setting of flags or tags, changing the state (e.g. from unread to read), starting or tearing down an authenticated session, etc. In other words, its job is to create, in real-time, a complete history of who did what when. As such I've come to view it as an automated historian for your world of groupware.
Egara itself is divided into three core parts:
- incoming handlers: these components implement a standard behavior and are responsible for collecting events from a specific service (e.g. cyrus-imap) and relaying them to the core application once received
- event normalizers: these workers process events from the new event queue and are tasked with normalizing and augmenting the data within them, creating complete point-in-time additions to the history. Many events come in with simple references to other objects, such as a mail folder; the event normalization workers need to turn those implicit bits of information into explicit links that can be reliably followed over time
- middleware: these are mainly the bits that provide process supervision, populate and manage the shared queues of events as information arrives from incoming handlers and is processed by normalizers.
This all happen asynchronously and provides guarantees at each step of correct handling (inasumuch as each reporting service allows for that). This means that individual normalizers can fail in even spectacular fashion and not disrupt the system, that an admin can halt and restart the system at will without fear of loss of events (save those that are generated during downtime periods, assuming a full Egara take-down), etc.
Final storage is done in a Riak database, with queues managed by the Mnesia database built into Erlang's OTP system itself. Mnesia can best be thought of as a built-in Redis: entirely in-memory (fast) with disk backing (robust); just add built-in clustering and native, first-class API for storage and retrieval (e.g. we are able to use Erlang functions to do perform updates and filtering over all or part of a queue's dataset). Data in Mnesia is stored as native Erlang records, while data in Riak is stored as JSON documents.
Incoming events may be any format and any delivery mechanism. They can be parallelized, spread across a cluster of machines ... it doesn't matter. The incoming handler is tasked with translating the stream of events into an Erlang term that can be passed on to the normalizer for processing. This allows us to extend Egara in a very easy way with new service-specific handlers to virtually any dataset we wish to keep track of within Kolab or its surroundings.
Normalizers will eventually also join this level of abstraction, though right now the sole worker implementation is specific to groupware data objects. Future releases of Egara will add support for different workers for different classes of events, giving a nice symmetry with the incoming event handlers.
The middleware is designed to be used without modification as the system grows in capability while being scalable. Multiple instances can be run across different systems and the results should (eventually) be the same. I say "eventually" since in such a system one can not guarantee the exact order of events, only the exact results after some period of time. Or, in more familiar terms, it is eventually consistent.
The whole system is quite flexible at runtime, as well. One can configure which kinds of events one cares to track; which data payloads (if any) to archive; which incoming handlers to run on a given node, etc. This will expand over time as well to allow normalizers and their helpers to be quarantined to specific systems within a cluster.
Egara works nicely with Kolab 3.4 and Kolab Enterprise 14, though Bonnie is not officially a part of either. I expect the entire system will be folded into a future Kolab release to ease usage. It will almost certainly remain an optional component, however: not everyone needs these features, and if you don't then there's no reason to pay the price of the runtime overhead and maintenance.
That's a "50,000 foot" view of the historian component of Bonnie. The next installments in this blog series will look a bit closer at the storage model, history querying and replayability and, finally, what this means for end-users and organizations running Kolab with the Bonnie suite.