Timeline: June 2017
Github: Twitter Searcher
Data mining, Scripting, Threading
Python, Selenium Webdriver
This project was a part of my work as a undergraduate research assistant for Joseph Shin, who is doing research for Professor Emily Phanke at the University of Washington. The purpose of the research was to analyze the twitter activity of unicorn firms in order to draw a connection between social media activity and unicorn firm success. I wrote a twitter searcher to collect all the tweets for this research using python.
One challenge was that I had to go from knowing zero python to delivering a working searcher within 2 weeks. Also, because I wasn’t able to use the twitter API (The API limits to the first 2000 tweets of a user but we needed all of the tweets from a user) I had to come up with my own method to scrape the tweets from twitter’s website. The method I came up with was to split up the dates in which the twitter account was open into 100 day chunks and then have a number of web browsers (depending on the machine) search those 100 day chunks on multiple threads through the twitter advanced search function. After it was able to scroll to the end of the results page in the search, the html source would be extracted and the tweets would be pulled from the html and put in a csv. Although this process was quite tedious, the twitter-searcher I wrote worked quite effectively and was able to go beyond the 2000 tweet limit.
What I learned:
I became much more comfortable with python as I basically went from zero knowledge to writing a full-fledged program in about 2 weeks. I also became much more comfortable with threading and using automated webdrivers.