- Connect to the Twitter Streaming API
- Add and remove topics from the stream monitor dynamically
- Extract the keyphrases from the tweets downloaded
- Query the tweets using the extracted keyphrases
- Examine frequency and coocurrence frequency of keyphrases
Getting the keyphrase model working was a real learning experience. It ended up involving seven map/reduce cycles. Two are executed once and five are executed iteratively over the various keyphrase sizes from 1 to N. There are some major improvement opportunities over the current design however. The largest of these potentially is that I realize now the need for a distributed database. Using the map/reduce framework to perform queries is simply too slow. Also I will need to adapt the keyphrase extraction algorithm so that it can be iterative. Other than that though it's on to the next use case which is user/topic clustering.
No comments:
Post a Comment