In my last blog entry, I mentioned that we have been working on a comprehensive data loss prevention (DLP) and audit trail system for use with Kolab, with the end goal being not only DLP but also a platform for business intelligence. In that entry I listed the three parts of the system, noting that I'd be writing about one at a time. I had hoped to jump on the first of those a day or two after writing the entry, but life and work intervened and then I was off on a short family vacation ... but now I'm back. So let's talk about the capture side of the system.
Kolab can be viewed as a set of cooperative microservices: smtp, imap, LDAP, spam/virus protection, invitation auto-processing, web UI, etc. etc. There are a couple dozen of these and up until now they have all done the traditional, and correct, thing of logging events to a system log.
This has numerous drawbacks, however. First, on a distributed system where different services are running on different hosts (physical or VMs), the result is data spread over many systems. Not great for subsequent reporting. At the time of logging, the events are in a "raw" state: each service likely does not know about the rest of the Kolab services and thus how their events relate to the whole system. With logs going through the host systems it makes it difficult to ensure that they are not easily tampered with; this can be somewhat alleviated by setting up remote logging but this also only goes so far. Finally, logging tends to be a firehose of data and for our specific interests here we want a very specific sub-stream of that total flow.
So we have written yet another service whose entire job is to collect events as they are generated. This service is itself distributed, allowing collection agents to be run across a cluster running a Kolab instance, and it stores its data in a dedicated key-value store which can be housed on an isolated (and specially secured, if desire) system. The program running this service is called Egara, which is Sumerian for "storehouse", and it is written in Erlang due to its robustness (this service must simply never go down), scalability and distributed communication features. The source repository can be found here. Egara itself is part of the overall DLP/auditing system we have named Bonnie.