||Statistical Analysis and Fault Detection in DSB IC3 Train
||Andersen, Klaus Kaae (Mathematical Statistics, Department of Informatics and Mathematical Modeling, Technical University of Denmark, DTU, DK-2800 Kgs. Lyngby, Denmark)
||Technical University of Denmark, DTU, DK-2800 Kgs. Lyngby, Denmark
||The objective of this report is data mining of the error log of IC3 train, owned by DSB (The Danish State Railways).
This report explains methods on how to extract explanatory features from a simple error log generated by the 96 IC3 trains which DSB is currently in possession of. The overall goal of the features is to identify latent dependencies among errors in the error log. Some additional features are designed to explain the consistency of the errors’ behaviors in different trains. The errors that are consistent in all trains are identified, and referred to as global errors. Also, some errors are more specific for a few number of trains. These errors are detected and called local errors. The generated features are successful in extracting valuable information and detecting strong dependencies among some errors. The correlations of the feature values are also rather sensible. The further focus of this study is set to the global errors, which also appear to occur more frequently compared to other types of errors.
After generating a large number of features, factor analysis is applied to study potential latent correlations within the feature space. The generated factors explain rather interesting, but uninterpretable results. As an attempt to reduce the feature space, factor analysis is applied to two subsets of features separately. This results in some interpretable and usable factors, which are used as input to cluster analysis.
As the next step, cluster analysis is used to investigate potential clusters that may appear in the feature space. Different types of features are applied as input to the method, but no clear cluster pattern is seen among the data. However, cluster analysis is able to isolate the desired global errors in one cluster. The cluster containing global errors is considered as an effective outcome, and the content of that cluster is studied more elaborately by means of logistic regression.
At last, logistic regression has been introduces in order to investigate chain dependencies among several errors. More specifically, the significances of several errors are studied as being the predictors of an outcome error. The outcome error in this part is chosen to be a potential central error, and the predictor errors are the central error’s subsequent errors. This results in interesting outcomes that should be analyzed by a mechanical expert. This introduced method is mainly a guideline on how a technical expert can gain information on different error dependencies. The verification
of whether the significant predictors of an error are sensible, is therefore left to mechanical experts. An example is shown in this part, where some of the significant predictors appeared to indicate a disconnection procedure, and were sensible to a technical expert from DSB.
||Technical University of Denmark (DTU) : Kgs. Lyngby, Denmark
Creation date: 2010-02-16
Update date: 2010-10-28