Automatic dating of documents and temporal text classification
Given an initial set of k means m The algorithm is often presented as assigning objects to the nearest cluster by distance.
The details of the review process will be published soon on the homepage.
The frequency of occurrence of words in natural languages exhibits a periodic and a non-periodic component when analysed as a time series.
This work presents an unsupervised method of extracting periodicity information from text, enabling time series creation and filtering to be used in the creation of sophisticated language models that can discern between repetitive trends and non-repetitive writing patterns.
The problem is computationally difficult (NP-hard); however, efficient heuristic algorithms converge quickly to a local optimum.
These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both k-means and Gaussian mixture modeling.