Abstract:
To effectively impute incomplete bus arrival time, an impraed
k*-means clustering algorithm was proposed in this paper. Four kinds of features were firstly extracted from historical travel records, such as travel distance, passenger numbers, time period and travel time. Then an improved
k*-means algorithm was developed to cluster these features, and a complete dictionary was constructed on travel time between stations, based on which the arrival time between any 2 stations could be imputed indirectly. Empirical data of bus transit route in Beijing were used to validate the effectiveness of the proposed algorithm. Furthermore, 3 kinds of typical imputation method of linear regression,
k-nearest neighbors, and
k-means clustering were adopted for result comparison.Experimental results demonstrate that the imputation proportion by this method is over 97%, the highest among the 4 methods. Moreover, the average imputation error is no higher than 100 seconds, which proves the effectiveness of the method.