Fibercorps Social: Getting the Project Started

This blog is about the Fibercorps social media analytics framework, a 2011 Google Summer of Code project. The framework is going to be a tool for developing analytical tools for social media. The project proposal can be found at http://www.google-melange.com/gsoc/proposal/review/google/gsoc2011/blakelemoine/1 .

Right off the bat though there are some changes from that original document. It was pointed out to me last week that IBM has a very similar proprietary project to the one that I've proposed: IBM BigSheets. That's not at all discouraging and in fact I think it points to this concept being a good idea. Good enough for IBM to turn it into a product in any case. I was asked by Fibercorps to incorporate one of the major differences between my proposal and BigSheets. The Fibercorps social media analytics framework will now incorporate distributed computing using Apache Hadoop.

This change triggers a cascade of other changes that will need to be propogated through the system design. The first of which is that the machine learning library used will no longer be Java-ML but will instead be Mahout. Also OpenCog's AtomSpace is explicitly intended to be an in-memory resource and as such will no longer be an appropriate representation. I am currently looking into using RDF to represent information. Representing RDF data in the Hadoop Distributed File System (HDFS) seems to be an open problem but one that should be relatively easy to solve for the special case of what Fibercorps Social is intended to do. I'll update as I progress on that topic.

Fibercorps Social

Tuesday, May 24, 2011

Getting the Project Started

No comments:

Post a Comment