Abstract: |
Social networking sites serve a very important role in our daily lives, providing us with a platform where thoughts can be easily shared and expressed. As a result, these networking sites generate endless amount of information about extensive range of topics. Nowadays, through software development, analysing the content of social media is made possible through Application Program Interfaces (APIs). One particular application of content analysis of social networking sites is traffic. Traffic events can be determined from these sites. Thus, social networking sites have the potential to be utilised as a very cost-effective social sensor, whereby social media posts serve as the sensor information. Advancements in the field of machine learning have provided ways and techniques in which social media posts can be exploited/harvested to detect small-scale events, particularly traffic events in a timely manner. This work aims to develop a traffic-based information system that relies on analysing the content of social media data. Social media content is classified as either ‘traffic-related’ or ‘non-traffic-related’. ‘Traffic-related’ events are further classified into various ‘traffic-related’ sub-categories, such as: ‘accidents’, ‘incidents’, ‘traffic jams’, and ‘construction/road works’. The date, time, and the geographical information of each associated traffic event are also determined. To reach these aims, several algorithms are developed: i) An adaptive data acquisition algorithm is developed to make it possible to gather events from social media; ii) Several supervised binary classification algorithms are developed to analyse the content of social media and classify the results as either ‘traffic-related’ events or ‘non-traffic-related’ events; iii) A topic classification algorithm is developed to further analyse the ‘traffic-related’ events and classify them into the sub-categories previously mentioned; iv) A geoparser algorithm is further developed to obtain the date, time and the geographical information of the traffic event. A fully functional, real-time, automated system is developed by interconnecting all the algorithms together. This developed system produces very promising results when applied to Twitter data as a source of information. The results show that social networking sites have the potential to serve as a very efficient method to detect not only small-scale events, such as traffic events, but can also be scaled up to detect large-scale events. |