My second and (unfortunately) last day at Devoxx i visited two completely different talks.
The first talk was about Cassandra, one of the many nosql databases that have emerged in the last years. Nosql was definitely one of the biggest topics on Devoxx. Four university slots were reserved for it. Apart from the Cassandra talk, there was a talk on Hadoop, MongoDB and HBase.
Cassandra by Example
by Jonathan Ellis
Cassandra is one of the many nosql databases around. It offers scalability, reliability and high availability. It has good clustering and failover capabilities. It also integrates some Hadoop functionality for analyzing its datastore. Currently it has support for Pig and Hive. The scalability and reliability comes at price though. Cassandra doesn’t support ACID transactions and has only limited support for ad-hoc queries (hence the name nosql). This makes it especially suitable for very large internet applications where ACID isn’t that big a deal. Applications like Twitter and Facebook come to mind.
The idea behind Cassandra (and other nosql databases) is that IO should be optimized. Data closely related to each other is stored (often redundantly) close together on disk, thus significantly reducing IO overhead. Compared to relational databases, where scaling is accomplished predominantly by increasing cpus and memory, Cassandra scales by optimizing IO which comes at the cost of increased disk space. But since disk space had become relatively cheap these days, as compared to memory and processor units, this is a fair price to pay.
There is a another price to pay however. As mentioned, Cassandra stores a lot of data redundantly. Data related to a particular query eg. is stored in one row. This highly increases complexity. Since Cassandra doesn’t support referential integrity, so there’s no such thing as a cascaded delete for example, it’s up to the application (hence the programmer) to make sure that the stored data remains consistent with the datamodel. And as mentioned before there is limited support for ad-hoc querying
A good deal of the talk showed some examples on Cassandra’s shell client and Cassandra’s java api Hector. The examples were pretty complex, and coding against the API comes with a lot of boilerplating. Jonathan made and interesting comparison. He compared Hector with JDBC (for relational databases) and foresaw a JPA-like API in the near future. Now this would be really interesting from a JEE programmer point of view. JPA could abstract away the fact that you’re programming against a nosql database! All the examples Jonathan showed, belonged to an online example Twitter-like application called Twissandra. The example project regarding Hector is called Twissjava.
At the end Jonathan mentioned some more advanced features of Cassandra
- Batch insertions;
- Secundary indexes, i.e. the ability to add indexes on additional columns in a row;
- SuperColumns, which are basically maps of columns, used to denormalize data to avoid extra queries;
- RipCord, a management suite for Cassandra;
- JMX monitoring.
All in all, a very interesting talk. I’m definitely going to dive in in this topic in the near future.
And here’s a link to the slides.
Ajax Library Smack down: Prototype vs. jQuery
by Nathaniel Schutta
Both libraries offer a lot of the same goodies:
- cross browser abstractions;
- simplified AJAX;
- CSS selectors;
- event handling;
- widgets, effects, animations;
Both have excellent online documentation available, are widely used, have good community support and are very small libraries.
Apart from the similarities, there are also a lot of differences between the two. According to Nathaniel Prototype has been developed from a programmer’s viewpoint (API centric), whereas jQuery is more focussed on HTML elements. Here’s a list of the pro’s and con’s.
- adds useful functions to core elements (on very very large pages this can cause a significant memory footprint);
- widgets and effects available via script.aculo.us;
- widely used.
- no minified version;
- performance not always a priority;
- pollutes the global namespace.
- focussed on HTML elements;
- doesn’t pollute global namespace;
- dom traversal is a snap;
- extensive array of plugins available.
- parameter ordering in apis not alway intuitive;
- plugins required for a variety of functionality;
- some functions reassign this.
I’m definitely not an expert on using either one of these libraries but after Nathaniel compared both libraries i think jQuery has a slight advantage. This is because it had plugins in mind when it was first designed. There are a lot of plugin libraries available and you can pick the ones you need and omit the ones you don’t need. In Prototype you only have two options: with or without script.acul.us.
All probably very open doors to most programmers, but still i honestly believe that in practice a lot of these rules are violated.
Here’s a link to the slides.