Build a Twitter Data Collector

In recent years, the issue of influencing people by disseminating false information has been a recurrent topic in the media. For example, the US presidential election in 2016 sparked a broad debate about the influence of Russian bots in social media. Furthermore Donald Trump, who popularized the term "Fake News" , is also a considerable source of misinformation himself (Sweden(2), Inauguration(1) etc). The consequences of such Fake News are not only reflected in election results but can also have other effects, For example, the claim made in 2013 by the "Syrian Electronic Army" about a terrorist attack on the White House in which President Obama was injured, caused a crash on the stock markets. Meanwhile, in WhatsApp had to deactivate the group share function, as it caused lynchings of innocent people in India. When fast spreading misinformation has severe real-world consequences, we speak of "Digital Wildfires".

We have created the UMOD(3) project, which aims at understanding such Digital Wildfires on all forms of electronic news platforms. The work described here moves within the scope of social networks, especially Twitter.

The task

A tool is to be developed that makes it possible to acquire, process and evaluate Twitter data. The following technologies are to be integrated for this purpose:

• Sentiment Analysis for Tweets
• Language Analysis for Tweets
• Output in different formats
• Bot Detection
• Fake Follower detection
• Load balancing for Requests to the Twitter API

For all these features there are web services or implementations. It’s more about combining these components to create an API that provides useful functionality in the project context. Suggestions for the technologies to be used are:

• Neo4j
• JGraphT
• Google Sentiment
• Twitter4j
• Java Language detection library

If possible, the tool should be written in Java. Here are a few examples of requests that should be answered. If possible, the tool should be written in Java. Here are a few examples of basic requests the tool should be able to answer:

The network in which all users that speak Norwegian, Swedish, or Danish, are more than 18 years old, and have less than 100 tweets on their timeline.

The follower network of X up to degree Y which deals wit a specific topic, e.g. climate change All Tweets related to a url X. within a certain timeframe



This thesis is offered by Simula Research Laboratory in cooperation with the University of Oslo (UIO). We are most of the time at Simula in Fornebu where you also have a place to work. The supervision will be taken over by Johannes Langguth and prof. Carsten Griwodz.


At Simula we offer exciting work in close cooperation with leading research groups as well as excellent working conditions. We provide a stimulating work environment and the opportunity to build future networks. Simula strives to achieve a balance between genders, and women are particularly encouraged to apply.


We expect self-motivation, initiative and the ability to work independently, as well as working knowledge of a programming language such as Python or Java. Knowledge of social network analysis is not required, although familiarity with Twitter would be helpful.


Publisert 3. apr. 2019 08:00 - Sist endret 3. apr. 2019 08:00

Omfang (studiepoeng)