Geek2Geek Berlin: NoSQL
NoSQL Database Duel: CouchDB vs. Neo4j vs. ArangoDB
At Sociomantic Labs, data is everything. The way we store our information is a key component to our business. In fact, it might be one of the most important ingredients in what makes the company special. And while I won’t go into the details of our D-based, distributed data storage, I will touch on some topics that I find incredibly relevant to anyone concerned with database technology in general. More specifically I’ll be (lightly) digging through the contents of the latest Geek2Geek meetup.
Geek2Geek is a Berlin-based tech talk with a few hundred fellow self-identified geeks who get together to devour developer topics whilst munching on pizza and sipping on beer or cola provided by a variety of loving sponsors. Intended to take place monthly, February’s session was the first I had personally attended and its broad topic was a brief introduction to the world of NoSQL databases.
For those who have never heard of it before, the concept behind NoSQL is quite simple: applying methods of data modeling that are different than the well-known model of relational databases. As well, sometimes the name can take on an alternate meaning: “Not only SQL”, referencing the fact that such databases can also be multi-purpose.
Split into five categories based on their features, NoSQL databases can be modeled to create key-value stores, column stores, document stores, graph databases or relational databases. Their performance, scalability, flexibility, complexity, and functionality are the specific traits that can be used to measure which model might be the best suited for a particular project or business.
And this is precisely what February’s Geek2Geek set out to do: sell us geeks in attendance on one (or more) currently available NoSQL databases. Having not been formally introduced to NoSQL at all before, I had just a glimpse of what MongoDB (a document model store) was – thanks to a fellow colleague who had shared his own, extra-curricular learnings with me over a fluted glass of Weizenbier – and I felt that the talks, overall, were incredibly enlightening as well as ridiculously entertaining. But we’ll get into the hilarity of it all later…
First and foremost, my hat comes gently off to both TempoDB (who unfortunately weren’t represented in a physical presence, so we couldn’t learn much about their contribution to the NoSQL world other than a quick view of their website) and Locafox (a start-up building a local, physical, e-commerce-oriented mobile application) who sponsored the event and took us out for beers and pizza afterwards.
Second up to bat was the only-mildly-bashful Stefan Plantikow, who is a key contributor to Neo4j, a graph database with a super sexy, visual interface. I must admit for many of my creative purposes Neo4j was already winning the race, but their “gotcha” is the fact that once you begin to make any money with their database technology than you’ll need to toss them some coin, too, for a commercial license. Fair enough: from what we saw it was quite an impressive and powerful alternative to the more typical document database structure.
Some of you who are new to the world of NoSQL might be wondering: what exactly is a graph database? Just imagine that you wanted to store not only “documents,” but also the relationships (or actions) between each document. Keep in mind that your documents have attributes and values, and then imagine you want to apply further attributes and values to their intertwining relationships as well. Lastly, imagine that you want to formulate a relatively complex query — for example, to get the names of all of the geeks who attended the February Geek2Geek, came with someone else, and either of them did not know about one of the three (or four, technically, counting TempoDB) databases presented. With a graph model database, you can do that all in one query. Quite powerful (and picky) stuff!
However, what really pleased me and my fellow colleagues was that Stefan didn’t try to win us over with pure tech-talk (as Jan did, devilishly delivering the distribution aspects of CouchDB, which was arguably quite a useful approach to get all those utilitarian tidbits fully told in a tight-packed amount of time), but instead ushered us through a variety of creative and engaging applications and examples built upon Neo4j that stressed the strangeness, the silliness, and especially the storyline-specific power that graph database queries can provide. What’s important to note is that all of these examples were provided by actual users and community members of Neo4j, and not its creators themselves, and thus further pushed home the point that people were actually using this database model, regardless of the real-life application or availability of their creations.
I had almost forgotten, until checking out Robert Reiz’ blog post on the same topic, that Stefan actually did a live demo as well, which at the beginning seemed almost doomed by long download times but eventually worked in the end. As Robert says, for such a bashful start, Stefan really came out the gates brave as hell by the end!
And finally, Martin Schoener – who’d been raising a liberal, literal handful of reasonably alarming questions and concerns about the other two database technologies during the course of the talks – got up to drench us in the all-powerful ArangoDB. Originally titled “AvocadoDB” but later forced (for legal reasons) to rename themselves, Martin showed off that his team’s database was not only capable of dealing with various model formats (document, graph, and simple key-value store) at the same time, but that an “arango” was actually a type of avocado after all!
In fact, for such a scientifically well-mannered and incredibly well-spoken candidate, Martin surprised the snippets out of us when he began not only dropping direct (semi-friendly) attacks on the other databases, but also diving deep into their (alleged) weaknesses via visual graphs that depicted the pros and cons of both CouchDB and Neo4j in comparison to ArangoDB. The balls on this guy! Admittedly he did show some honesty in that ArangoDB isn’t the best in all operations, though the moment of most hilarious truth arose when Jan — startled by a critical bar graph gleaming clearly from the overhead beamer onto co.up’s clean white wall — rose from the off-stage couch where he sat next to Stefan, and mentioned out loud that he had just coded and deployed a patch which would normalize the particular request times (if I’m not mistaken). The audience was in guttural uproar.
I have to hand it to these three guys and their three camps of database technologies. They torched their code contributions with vigor, touched upon the topic of NoSQL with an inspiringly competitive edge, and still treated each other with an air of humor and respect in the end (after all, the “fights” or developer debates, albeit public, were all in good faith and even entertained us so much so that they border-lined on seeming staged). All of this really helps for satisfying skepticism, courting challenges, and giving gusto to the general growth of the whole NoSQL movement. To be honest, at the end, we were torn between the types: CouchDB definitely seemed the best for distribution; Neo4j’s creative applications, community, and sexy interface was basically unbeatable; however, the entirely free-of-cost, multi-purpose ArangoDB was a very tempting finale of the four. (Yes, four — but hopefully next time TempoDB will actually be in attendance to give us the run-down on why they beat the pants off the other three.)
All in all, we were sold on one thing: NoSQL. That, and Geek2Geek of course. Because even after such a full library of talks the whole crew headed out together to hit a local pizza joint, talk shop, talk shit, or simply talk inaudibly with mouths full of both cheese and beverages, happily slipping away into the buzz of an enjoyable Kreuzberg evening.
So, until next month, thanks to everyone involved in the talks, the sponsors, and especially the venue — Berlin’s good old, funky, developer-friendly co.up.