This project continues to be an active learning experience. As any Map/Reduce veteran would've probably told me, it's a terrible framework to use for performing queries. In fact as soon as I began getting abysmal time performance on the queries I found all of the articles explaining why that's exactly what I should have expected. This brought me to switch over to an added complication I thought I would be able to avoid: distributed database systems.
I looked at the available systems and, although there were multiple that support Hadoop interaction, I decided to go with HBase. It's a distributed NoSQL database system that is based on Google's Bigtable. The main reason that I chose it is that it is so closely coupled with Hadoop that it seemed that it would be the one that I could pick up the easiest and thereby have the smallest impact on my timeline. It very well may be the case that in the future some other distributed database could provide a better solution for currently unforseen reasons. This provided me with a good opportunity to more clearly incorporate the Semantically-Interlinked Online Communities (SIOC) RDF vocabulary into the project. Each of the tables is a type from the ontology and the column values are the associated properties. The column families of the tables are "sioc" and (where needed) "foaf". I believe that will give the data representation the ability to be easily integrated with existing and future technologies. Now that the project is starting to really solidify into particular directions I'll start making blog posts about the involved technologies and how they apply to social analytics.
No comments:
Post a Comment