Beta 1


Title Data Mining and Automatic Quality Assurance of Survey Data
Author Selimaj, Ledian (Copenhagen University College of Engineering, IHK)
Supervisor Chew, Chengzi
Pedersen, Hans Christian
Institution Ingeniørhøjskolen i København, IHK, DK-2750 Ballerup, Denmark
Thesis level Bachelor thesis
Education Diplomingeniør, Informations- og kommunikationsteknologi
Education Bachelor of Engineering in Information and Communication Technology
Publication date 2012-01
Abstract Data mining is an emerging field in the computer science world, which stands in between the fields of statistics, machine learning and pattern recognition. Classification is one of the main tasks of data mining and its ultimate goal is to describe or predict certain unknown data into predefined categories based on knowledge acquired on data of the same type beforehand.
Programmatic quality checking of field survey data is difficult as it requires manual supervision to be able to identify and differentiate data trends. This paper attempts to explore the possibility to run various classifiers on field survey data and then use different techniques to build classification models to mimic manual supervision of data to programmatically check the quality of field survey data.
The choice of the right classifiers and model testing techniques are of crucial importance for the whole process as the main purpose is to ensure high reliability of the predictive classification models that will be used to predict the category where new and previously unknown data pertains. Emphasis is placed on picking and utilizing the best classifiers as well as to test with several manners and combine them for a classification task in order to provide more accurate results than just using one single algorithm, which might suffer from structural problems.
Pages 79
External partner Ledere og medarbejdere i private virksomheder
Fulltext
Dokument MSc.pdf (3.12 MB)
Admin Creation date: 2012-01-09    Update date: 2013-11-14    Source: ihk    ID: ihk-10622017    Original MXD