Crowdsourcing the character of a place: Character-level convolutional networks for multilingual geographic text classification
Adams, Benjamin; McKenzie, Grant
Abstract – Tuhinga Whakarāpopoto
How do people talk about a place? In this paper, the researchers present a new character-level convolutional neural network model that can classify multilingual text using any character set that can be encoded with UTF-8 – a standard and widely used 8-bit character encoding. To test the model on the geographic classification of text, the model was tested on four crowdsourced data sets made up of Wikipedia articles, online travel blogs, Geonames toponyms, and Twitter posts. Unlike word-based methods, which require data cleaning and pre-processing, the proposed model works for any language without modification and with classification accuracy comparable to existing methods. The results indicate that UTF-8 character-level convolutional neural networks are a promising technique for georeferencing noisy text, such as found in colloquial social media posts and texts scanned with optical character recognition.
Keywords – Kupu Hāngai
urban understanding tool, urban development, language data mining, havesting geo data, crowdsourcing information
Fields of Research – Āpure Rangahau
Geographical Information Systems (GIS); Crowdsourcing data; Neural network modelling
Date – Te Wā Whakarewa
2018-01
Type – Te Auaha
Journal paper
Collections – Kohinga Kaupapa
Citation – Kupu Hautoa
Adams, B. & McKenzie, G. (2018). Crowdsourcing the character of a place: Character-level convolutional networks for multilingual geographic text classification. Transactions in GIS, 22, 2, 394-408. DOI: 10.1111/tgis.12317