Spatio-temporal modeling of documents
Spatio-temporal modeling of documents

Temporal and geographic information is important aspects of text documents. Thus, it also occurs frequently in many types of text documents in the form of temporal and geographic expressions. Spatio-temporal expressions can be normalized so that their meaning is unambiguous and can be placed on a timeline or pinpointed on a map. A general text document can contain many spatio-temporal expressions that are unrelated to their content. In this thesis, we propose estimating the focus time and focus place of documents that are defined as the time and place that the document’s content refers to. We utilize statistical knowledge from Wikipedia English to calculate association scores that are used to estimate the focus time and place contained in the document. We implement two different association score calculation methodologies and compare their accuracy respectively. The effectiveness of our methods are evaluated on three different time-tagged datasets of documents about historical events in total time frame of 4000 years. Our methods achieve average error of less than 15 years. Our methods are also able to estimate focus place of each document correctly.

