9:15 – 10.30: Keynote
Computational Social Science and Microblogs — The Good, the Bad and the Ugly
Markus Strohmaieir
abstract
According to the Computational Social Science Society of the Americas (CSSSA), computational social science is “The science that investigates social phenomena through the medium of computing and related advanced information processing technologies”. Positioned between the computer and social sciences, this new and emerging interdisciplinary field is fuelled by at least the following two developments: (i) availability of data: With the web, a huge volume of social data is now available which enables the study of traces of social interactions on new scales. (ii) increasing quantification of social theories: With recent advances in the social sciences, social theories become increasingly formal and/or mathematical and thus amenable to quantification. Taken together, these two developments give rise to a whole range of new and interesting problems on the intersection between computer and social sciences. While a multitude of social data is available on the World Wide Web, microblogs are of particular interest due to their real-time nature, their rich social fabric and their presumed on/offline coupling. In this talk, I am going to talk about the potentials and the challenges of doing computational social science based on data obtained from microblogs such as Twitter. In particular, I want to present previous work by my group and others to identify research avenues where progress has already been made or where progress is on the horizon, and contrast these with what I feel are open research challenges in this emerging field. Work that demonstrates the potential of microblogs for computational social science includes for example [1], where we have operationalized a number of theoretical constructs from sociology to characterize the nature of online conversational practices of political parties on Twitter. In another work, we have studied the ways in which users’ fields of expertise can be inferred from microblog data [4]. Work that demonstrates the pitfalls and challenges of doing computational social science with microblog data include for ex- ample [5] where we have studied a network of bots who are competing against each other in attacking users on Twitter. In subsequent work, we have found that such attacks have the potential to impact the social graph of Twitter [3], i.e. the network of who follows whom respectively who replies to whom. In other work, [2] have shown that there is a stark difference between the demographics of Twitter and the general population of the US, finding that Twitter users significantly over-represent densely populated regions and are predominantly male. I will argue that these and other factors need to be considered when we aim to unlock the full potential of microblog data for computational social science purposes.
bibtex
@InProceedings{microposts2014_strohmaier:2014,
author = {Markus Strohmaier},
title = {Computational Social Science and Microblogs -- The Good, the Bad and the Ugly},
crossref = {proc_microposts2014@www2014},
pages = {1--1},
booktitle = {4th Workshop on Making Sense of Microposts {(\#Microposts2014)}},
year = 2014,
keywords = {Social data, computational social science, social behavior, web science, online social networks},
url = {http://ceur-ws.org/Vol-1141/keynote_abstract.pdf},
}
@Proceedings{proc_microposts2014@www2014,
title = {Proceedings, 4th Workshop on Making Sense of Microposts {(\#Microposts2014)}: Big things come in small packages, Seoul, Korea, 7th April 2014},
year = 2014,
booktitle = {Making Sense of Microposts {(\#Microposts2014)}},
editor = {Matthew Rowe and Milan Stankovic and Aba-Sah Dadzie},
month = {April},
url = {http://ceur-ws.org/Vol-1141},
}
back to top
10:30 – 10:35 Lightning Round – Posters
Sentiment Analysis of Wimbledon Tweets
Priyanka Sinha, Anirban Dutta Choudhury & Amit Kumar Agrawal
abstract
Annotating videos in the absence of textual metadata is a major challenge as it involves complex image and video analytics, which is often error prone. However, if the video is a live coverage of an event, time correlated textual feed about the same event can act as a valuable source of aid for such annotation. Popular real time microblog streams like Twitter feeds can be an ideal source of such textual information. In this paper we explore the possibility of such correlation with the sentiment analysis of a set of tweets of the Roger Federer and Novak "Nole" Djokovic semi finals match at Wimbledon 2012.
bibtex
@InProceedings{microposts2014_sinha.ea:2014,
author = {Priyanka Sinha and Anirban Dutta Choudhury and Amit Kumar Agrawal},
title = {Sentiment Analysis of {Wimbledon} Tweets},
crossref = {proc_microposts2014@www2014},
pages = {51--52},
booktitle = {4th Workshop on Making Sense of Microposts {(\#Microposts2014)}},
year = 2014,
keywords = {Twitter, Wimbeldon, Sentiment Analysis, TV},
url = {http://ceur-ws.org/Vol-1141/paper_10.pdf},
}
@Proceedings{proc_microposts2014@www2014,
title = {Proceedings, 4th Workshop on Making Sense of Microposts {(\#Microposts2014)}: Big things come in small packages, Seoul, Korea, 7th April 2014},
year = 2014,
booktitle = {Making Sense of Microposts {(\#Microposts2014)}},
editor = {Matthew Rowe and Milan Stankovic and Aba-Sah Dadzie},
month = {April},
url = {http://ceur-ws.org/Vol-1141},
}
back to top
11:00 – 15.30 Main Track – Paper Presentations
Micropost Mining & Analysis
11:00 – 11:30
Mining Concurrent Topical Activity in Microblog Streams
André Panisson, Laetitia Gauvin, Marco Quaggiotto, Ciro Cattuto
abstract
Streams of user-generated content in social media exhibit patterns of collective attention across diverse topics, with temporal structures determined both by exogenous factors and endogenous factors. Teasing apart different topics and resolving their individual, concurrent, activity timelines is a key challenge in extracting knowledge from microblog streams. Facing this challenge requires the use of methods that expose latent signals by using term correlations across posts and over time.
Here we focus on content posted to Twitter during the London 2012 Olympics, for which a detailed schedule of events is independently available and can be used for reference. We mine the temporal structure of topical activity by using two methods based on non-negative matrix factorization.
We show that for events in the Olympics schedule that can be semantically matched to Twitter topics, the extracted Twitter activity timeline closely matches the known timeline from the schedule.
Our results show that, given appropriate techniques to detect latent signals, Twitter can be used as a social sensor to extract topical-temporal information on real-world events at high temporal resolution.
bibtex
@InProceedings{microposts2014_panisson.ea:2014,
author = {Andr\'e Panisson and Laetitia Gauvin and Marco Quaggiotto and Ciro Cattuto},
title = {Mining Concurrent Topical Activity in Microblog Streams},
crossref = {proc_microposts2014@www2014},
pages = {3--10},
booktitle = {4th Workshop on Making Sense of Microposts {(\#Microposts2014)}},
year = 2014,
keywords = {topic detection, microblogs, matrix and tensor factorization, collective attention, event detection},
url = {http://ceur-ws.org/Vol-1141/paper_04.pdf},
}
@Proceedings{proc_microposts2014@www2014,
title = {Proceedings, 4th Workshop on Making Sense of Microposts {(\#Microposts2014)}: Big things come in small packages, Seoul, Korea, 7th April 2014},
year = 2014,
booktitle = {Making Sense of Microposts {(\#Microposts2014)}},
editor = {Matthew Rowe and Milan Stankovic and Aba-Sah Dadzie},
month = {April},
url = {http://ceur-ws.org/Vol-1141},
}
11.30 – 12:00
TEA: Episode Analytics on Short Messages
Prapula G, Soujanya Lanka & Kamalakar Karlapalem
abstract
Twitter is a widely used micro-blogging service, which in recent times, has become a reliable source of happening news around the world [11]. Breaking news are covered in twitter; the magnitude and volumes of tweets reflecting on the nature and intensity of the news. During events, many tweets are posted either expressing sentiments about the event or just about the occurrence of the event. Events related to an entity that have attracted a large number of tweets can be considered significant in the entity's twitter lifetime. Entity could represent a person, movie, community, electronic gadgets, software products and like wise.In this work, we attempt to automatically detect significant events related to an entity. An episode, is an event of importance; identified by processing the volumes of tweets/posts in a short time.
The key features implemented in Tweet Episode Analytics (TEA) system are:
(i) detecting episodes among the streaming tweets related to a given entity over a period of time (from the entity's birth i.e., mention in the tweet world till date), (ii) providing visual analytics (like sentiment scoring and frequency of tweets over time) of each episode through graphical interpretation.
bibtex
@InProceedings{microposts2014_prapulag.ea:2014,
author = {Prapula G and Soujanya Lanka and Kamalakar Karlapalem},
title = {{TEA}: Episode Analytics on Short Messages},
crossref = {proc_microposts2014@www2014},
pages = {11--18},
booktitle = {4th Workshop on Making Sense of Microposts {(\#Microposts2014)}},
year = 2014,
keywords = {Tweets, Episode, Text Analytics},
url = {http://ceur-ws.org/Vol-1141/paper_08.pdf},
}
@Proceedings{proc_microposts2014@www2014,
title = {Proceedings, 4th Workshop on Making Sense of Microposts {(\#Microposts2014)}: Big things come in small packages, Seoul, Korea, 7th April 2014},
year = 2014,
booktitle = {Making Sense of Microposts {(\#Microposts2014)}},
editor = {Matthew Rowe and Milan Stankovic and Aba-Sah Dadzie},
month = {April},
url = {http://ceur-ws.org/Vol-1141},
}
12:00 – 12:30
Sentic API: A Common and Common-Sense Knowledge API for Cognition-Driven Sentiment Analysis
Erik Cambria, Soujanya Poria, Alexander Gelbukh & Kenneth Kwok
abstract
The bag-of-concepts model can represent semantics associated with natural language text much better than bags-of-words. In the bag-of-words model, in fact, a concept such as cloud_computing would be split into two separate words, disrupting the semantics of the input sentence. Working at concept-level is important for tasks such as opinion mining, especially in the case of microblogging analysis. In this work, we present Sentic API, a common-sense based application programming interface for concept-level sentiment analysis, which provides semantics and sentics (that is, denotative and connotative information) associated with 15,000 natural language concepts.
bibtex
@InProceedings{microposts2014_cambria.ea:2014,
author = {Erik Cambria and Soujanya Poria and Alexander Gelbukh and Kenneth Kwok},
title = {{Sentic API}: A Common-Sense Based {API} for Concept-Level Sentiment Analysis},
crossref = {proc_microposts2014@www2014},
pages = {19--24},
booktitle = {4th Workshop on Making Sense of Microposts {(\#Microposts2014)}},
year = 2014,
keywords = {Natural language processing; Sentiment analysis},
url = {http://ceur-ws.org/Vol-1141/paper_02.pdf},
}
@Proceedings{proc_microposts2014@www2014,
title = {Proceedings, 4th Workshop on Making Sense of Microposts {(\#Microposts2014)}: Big things come in small packages, Seoul, Korea, 7th April 2014},
year = 2014,
booktitle = {Making Sense of Microposts {(\#Microposts2014)}},
editor = {Matthew Rowe and Milan Stankovic and Aba-Sah Dadzie},
month = {April},
url = {http://ceur-ws.org/Vol-1141},
}
Micropost Classification & Extraction
14:00 – 14:30
Evaluating Multi-label Classification of Incident-related Tweets
Axel Schulz, Eneldo Loza Mencía, Thanh Tung Dang and Benedikt Schmidt
abstract
Microblogs are an important source of information in emergency management as lots of situational information is shared, both by citizens and official sources. It has been shown that incident-related information can be identified in the huge amount of available information using machine learning. Nevertheless, the currently used classification techniques only assign a single label to a micropost, resulting in a loss of important information that would be valuable for crisis management.
With this paper we contribute the first in-depth analysis of multi-label classification of incident-related tweets. We present an approach assigning multiple labels to these messages, providing additional information about the situation at-hand. An evaluation shows that multi-label classification is applicable for detecting multiple labels with an exact match of 84.35%. Thus, it is a valuable means for classifying incident-related tweets. Furthermore, we show that correlation between labels can be taken into account for these kinds of classification tasks.
bibtex
@InProceedings{microposts2014_schulz.ea:2014,
author = {Axel Schulz and Eneldo {Loza Menc\'ia} and Thanh Tung Dang and Benedikt Schmidt},
title = {Evaluating Multi-label Classification of Incident-related Tweets},
crossref = {proc_microposts2014@www2014},
pages = {26--33},
booktitle = {4th Workshop on Making Sense of Microposts {(\#Microposts2014)}},
year = 2014,
keywords = {Microblogs, Multi-label Learning, Social Media},
url = {http://ceur-ws.org/Vol-1141/paper_01.pdf},
}
@Proceedings{proc_microposts2014@www2014,
title = {Proceedings, 4th Workshop on Making Sense of Microposts {(\#Microposts2014)}: Big things come in small packages, Seoul, Korea, 7th April 2014},
year = 2014,
booktitle = {Making Sense of Microposts {(\#Microposts2014)}},
editor = {Matthew Rowe and Milan Stankovic and Aba-Sah Dadzie},
month = {April},
url = {http://ceur-ws.org/Vol-1141},
}
14:30 – 15:00
Combining Named Entity Recognition Methods for Concept Extraction in Microposts
Štefan Dlugolinský, Peter Krammer, Marek Ciglan, Michal Laclavík, Ladislav Hluch
abstract
NER in microposts is a key and challenging task of mining semantics from social media. Our evaluation of a number of popular NE recognizers over a micropost dataset has shown a significant drop-off in results quality. Current state-of-the-art NER methods perform much better on formal text than on microposts. However, the experiment provided us with an interesting observation -- although individual NER tools did not perform very well on micropost data, we have received recall over 90% when we merged all the results of the examined tools. This means that if we would be able to combine different NE recognizers in a meaningful way, we might be able to get NER in microposts of an acceptable quality. In this paper, we propose a method for NER in microposts, which is designed to combine annotations yielded by existing NER tools in order to produce more precise results than input tools alone. We combine NE recognizers utilizing ML
techniques, namely decision tree and random forest using the C4.5 algorithm. The main advantage of the proposed method lies in the possibility of combining arbitrary NER methods and in its application on short, informal texts. The evaluation on a standard dataset shows that the proposed approach outperforms underlying NER methods as well as a baseline recognizer, which is a simple combination of the best underlying recognizers for each target NE class. To the best of our knowledge, up-to-date, the proposed approach achieves the highest F1 score on the #MSM2013 dataset.
bibtex
@InProceedings{microposts2014_dlugolinsky.ea:2014,
author = {\v{S}tefan Dlugolinsk\'y and Peter Krammer and Marek Ciglan and Michal Laclav\'ik and Ladislav Hluch\'y},
title = {Combining Named Entity Recognition Methods for Concept Extraction in {Microposts}},
crossref = {proc_microposts2014@www2014},
pages = {34--41},
booktitle = {4th Workshop on Making Sense of Microposts {(\#Microposts2014)}},
year = 2014,
keywords = {named entity recognition, machine learning, microposts},
url = {http://ceur-ws.org/Vol-1141/paper_09.pdf},
}
@Proceedings{proc_microposts2014@www2014,
title = {Proceedings, 4th Workshop on Making Sense of Microposts {(\#Microposts2014)}: Big things come in small packages, Seoul, Korea, 7th April 2014},
year = 2014,
booktitle = {Making Sense of Microposts {(\#Microposts2014)}},
editor = {Matthew Rowe and Milan Stankovic and Aba-Sah Dadzie},
month = {April},
url = {http://ceur-ws.org/Vol-1141},
}
15:00 – 15:30
HG-RANK: A Hypergraph-based Keyphrase Extraction for Short Documents in Dynamic Genre
Abdelghani Bellaachia and Mohammed Al-Dhelaan
abstract
Conventional keyphrase extraction algorithms are applied to a fixed corpus of lengthy documents where keyphrases distinguish documents from each other. However, with the emergence of social networks and microblogs, the nature of such documents has changed. Documents are now of short length and evolve topics which require specific algorithms to capture all features. In this paper, we propose a hypergraph-based ranking algorithm that models all the features in a random walk approach. Our random walk approach uses weights of both hyperedges and vertices to model short documents' temporal and social features, as well as discriminative weights for word features respectively, while measuring the centrality of words in the hypergraph. We empirically test the effectiveness of our approach in two different data sets of short documents and show that our approach has an improvement of 14% to 25% in precision over the closest baseline in a Twitter data set and 10% to 27% in the Opinosis data set.
bibtex
@InProceedings{microposts2014_bellaachia.ea:2014,
author = {Abdelghani Bellaachia and Mohammed Al-Dhelaan},
title = {{HG-Rank}: A Hypergraph-based Keyphrase Extraction for Short Documents in Dynamic Genre},
crossref = {proc_microposts2014@www2014},
pages = {42--49},
booktitle = {4th Workshop on Making Sense of Microposts {(\#Microposts2014)}},
year = 2014,
keywords = {Text hypergraphs; Keyphrase extraction; Random walks; Short documents; Hypergraph random walks},
url = {http://ceur-ws.org/Vol-1141/paper_06.pdf},
}
@Proceedings{proc_microposts2014@www2014,
title = {Proceedings, 4th Workshop on Making Sense of Microposts {(\#Microposts2014)}: Big things come in small packages, Seoul, Korea, 7th April 2014},
year = 2014,
booktitle = {Making Sense of Microposts {(\#Microposts2014)}},
editor = {Matthew Rowe and Milan Stankovic and Aba-Sah Dadzie},
month = {April},
url = {http://ceur-ws.org/Vol-1141},
}
back to top
16:10 – 17.10: NEEL Challenge Presentations & Results
16:10 – 16:20
Named Entity Extraction and Linking Challenge: University of Twente at #Microposts2014
Mena B. Habib, Maurice van Keule & Zhemin Zhu
abstract
16:20 – 16:30
Adapting AIDA for Tweets
Mohamed Amir Yosef, Johannes Hoffart, Yusra Ibrahim, Artem Boldyrev & Gerhard Weikum
abstract
This paper presents our system for the "Making Sense of Microposts 2014 (#Microposts2014)" challenge. Our system is based on AIDA, an existing system that links entity mentions in natural language text to their corresponding canonical entities in a knowledge base (KB). AIDA collectively exploits the prominence of entities, contextual similarities, and coherence to effectively disambiguate entity mentions. The system was originally developed for clean and well-structured text (e.g. news articles). We adapt it for microposts, specifically tweets, with special focus on the named entity recognition and the entity candidate lookup.
16:30 – 16:40
E2E: An End-to-end Entity Linking System for Short and Noisy Text
Ming-Wei Chang, Bo-June Hsu, Hao Ma, Ricky Loynd & Kuansan Wang
abstract
We present E2E, an end-to-end entity linking system that is designed for short and noisy text found in microblogs and text messages. Mining and extracting entities from short text is an essential step for many content analysis applications. By jointly optimizing entity recognition and disambiguation as a single task, our system can process short and noisy text robustly.
16:40 – 16:50
DataTXT at #Microposts2014 Challenge
Ugo Scaiella, Michele Barbera, Stefano Parmesan, Gaetano Prestia, Emilio Del Tessandoro & Mario Verí
abstract
In this paper we describe the approach taken for the "Making Sense of Microposts challenge 2014" (#Microposts2014), where participants were asked to cross reference micro-posts extracted from Twitter with DBpedia URIs belonging to a given taxonomy.
For this task we deployed dataTXT which is the evolution of [3], the state-of-the-art topic annotator for short texts and which has proven to be very effective and efficient in several challenging scenarios [2].
16:50 – 17:00
Linking Entities in #Microposts
Romil Bansal, Sandeep Panem, Priya Radhakrishnan, Manish Gupta & Vasudeva Varma
abstract
Social media has emerged to be an important source of information. Entity linking in social media provides an effective way to extract useful information from microposts shared by the users. Entity linking in microposts is a difficult task as they lack sufficient context to disambiguate the entity mentions. In this paper, we do entity linking by first identifying entity mentions and then disambiguating the mentions based on three different features: (1) similarity between the mention and the corresponding Wikipedia entity pages; (2) similarity between the mention and the tweet text with the anchor text strings across multiple webpages, and (3) popularity of the entity on Twitter at the time of disambiguation. The system is tested on the manually annotated dataset provided by Named Entity Extraction and Linking (NEEL) Challenge 2014, and the obtained results are on par with the state-of-the-art methods.
17:00 – 17:10
Part-of-Speech is (almost) enough: SAP Research & Innovation at the #Microposts2014 NEEL Challenge
Daniel Dahlmeier, Naveen Nandan & Wang Ting
abstract
This paper describes the submission of the SAP Research & Innovation
team at the #Microposts2014 NEEL Challenge. We use a two-stage
approach for named entity extraction and linking, based on conditional
random fields and an ensemble of search APIs and rules,
respectively. A surprising result of our work is that part-of-speech
tags alone are almost sufficient for entity extraction. Our results for
the combined extraction and linking task on a development and test
split of the training set are 34.6% and 37.2% F1 score,
respectively, and for the test set is 37%.
back to top