Online Clustering based Fault Data Detection Method for Distributed PV Sites
【摘要】：In distributed photovoltaic(PV) sites, fault data detection is critical to ensure the safety of power grid. Accurate and reliable PV data is the basis of PV power generation performance analysis and power load forecasting. However, many PV power sites have high proportion of fault power measured data, which greatly impairs the analysis of power site performance.This paper summarizes three typical fault data types of PV data based on engineering experience. Utilizing Spark Streaming and k-means algorithm, a new method, namely the streaming k-means method under different time windows is adopted to detect the fault PV data in real time. In the meanwhile, the specified Silhouette Coefficient is used to choose the proper clustering number in each detection period. And in order to better display the clustering results, principal components analysis(PCA) is applied to present the data distribution in real time. In the numerical simulation, the actual data from Wuxi Hongdou PV power cites and the artificially generated data set are utilized to verify the proposed method. The experiment results show that the streaming k-means method can effectively identify various types of fault data and has a better detection rate than the 3-sigma recognition method and logistic regression.