Fibercorps Social: First Use Case

I've been a bit negligent in posting here so I'll just do a quick update. The first use case is done now. Rather than going through all of the particulars I'll just make a quick bullet list of the capabilities that are wrapped into the application:

Connect to the Twitter Streaming API

Add and remove topics from the stream monitor dynamically

Extract the keyphrases from the tweets downloaded

Query the tweets using the extracted keyphrases

Examine frequency and coocurrence frequency of keyphrases

Getting the keyphrase model working was a real learning experience. It ended up involving seven map/reduce cycles. Two are executed once and five are executed iteratively over the various keyphrase sizes from 1 to N. There are some major improvement opportunities over the current design however. The largest of these potentially is that I realize now the need for a distributed database. Using the map/reduce framework to perform queries is simply too slow. Also I will need to adapt the keyphrase extraction algorithm so that it can be iterative. Other than that though it's on to the next use case which is user/topic clustering.

Fibercorps Social

Sunday, June 26, 2011

First Use Case

No comments:

Post a Comment