|Title||Data Mining and Automatic Quality Assurance of Survey Data|
|Author||Selimaj, Ledian (Copenhagen University College of Engineering, IHK)
Pedersen, Hans Christian
|Institution||Ingeniørhøjskolen i København, IHK, DK-2750 Ballerup, Denmark|
|Thesis level||Bachelor thesis|
|Education||Diplomingeniør, Informations- og kommunikationsteknologi|
|Education||Bachelor of Engineering in Information and Communication Technology|
|Abstract||Data mining is an emerging field in the computer science world, which stands in between the fields of statistics, machine learning and pattern recognition. Classification is one of the main tasks of data mining and its ultimate goal is to describe or predict certain unknown data into predefined categories based on knowledge acquired on data of the same type beforehand.
Programmatic quality checking of field survey data is difficult as it requires manual supervision to be able to identify and differentiate data trends. This paper attempts to explore the possibility to run various classifiers on field survey data and then use different techniques to build classification models to mimic manual supervision of data to programmatically check the quality of field survey data.
The choice of the right classifiers and model testing techniques are of crucial importance for the whole process as the main purpose is to ensure high reliability of the predictive classification models that will be used to predict the category where new and previously unknown data pertains. Emphasis is placed on picking and utilizing the best classifiers as well as to test with several manners and combine them for a classification task in order to provide more accurate results than just using one single algorithm, which might suffer from structural problems.
|External partner||Ledere og medarbejdere i private virksomheder|
|Admin||Creation date: 2012-01-09 Update date: 2013-11-14 Source: ihk ID: ihk-10622017 Original MXD|