基于改进k*-means算法的不完整公交到站时间填充

    Imputation of Incomplete Bus Arrival Time Based on the Improved k*-means Algorithm

    • 摘要: 为了有效填充不完整的公交到站时间信息,提出了一种基于改进k*-means算法的不完整到站时间的填充方法.根据到站流动人数、到站所属时段、站点间距离、站点间运行时间特征加权度量站点间相似性,对现有k-means算法进行改进以构建公交站点间运行时间完备信息表.以北京市地面公交运行数据为例,验证了该方法的可靠性,并与线性拟合、最近邻插值、k-means算法等填充方法进行了对比试验.结果表明:该方法对不完整到站时间的填充率高于97%,且对已知到站时间平均填充误差不高于100 s.

       

      Abstract: To effectively impute incomplete bus arrival time, an impraed k*-means clustering algorithm was proposed in this paper. Four kinds of features were firstly extracted from historical travel records, such as travel distance, passenger numbers, time period and travel time. Then an improved k*-means algorithm was developed to cluster these features, and a complete dictionary was constructed on travel time between stations, based on which the arrival time between any 2 stations could be imputed indirectly. Empirical data of bus transit route in Beijing were used to validate the effectiveness of the proposed algorithm. Furthermore, 3 kinds of typical imputation method of linear regression, k-nearest neighbors, and k-means clustering were adopted for result comparison.Experimental results demonstrate that the imputation proportion by this method is over 97%, the highest among the 4 methods. Moreover, the average imputation error is no higher than 100 seconds, which proves the effectiveness of the method.

       

    /

    返回文章
    返回