特定偽缺漏值好發生之資料群集之偵測 = Detecting the Da...
國立高雄大學資訊工程學系碩士班

 

  • 特定偽缺漏值好發生之資料群集之偵測 = Detecting the Data Group most Prone to a Specific Disguise Value
  • Record Type: Language materials, printed : monographic
    Paralel Title: Detecting the Data Group most Prone to a Specific Disguise Value
    Author: 馮文榆,
    Secondary Intellectual Responsibility: 國立高雄大學
    Place of Publication: [高雄市]
    Published: 撰者;
    Year of Publication: 2013[民102]
    Description: 50面圖,表 : 30公分;
    Subject: 資料清理
    Subject: data cleansing
    Online resource: http://handle.ncl.edu.tw/11296/ndltd/93939081335077285614
    Notes: 參考書目:面40-41
    Notes: 103年12月16日公開
    Notes: 內容為英文
    Summary: 偽缺漏值是缺漏值的一種特殊的缺漏值;在資料欄位中,偽缺漏值並不為空,但所擁有的資料卻無法反應事實。偽缺漏值的存在可能會造成分析結果的嚴重偏差,因此,偽缺漏值偵測遂成為資料清理中一個重要的議題。根據Little與Rubin所提出的分類,偽缺漏值可分為完全隨機型、隨機型、以及非隨機型三種,而過去的方法往往著重於第一種偽缺漏值的處理,並沒有對另外兩種類型進行探討。在本論文中,我們提出一個關於偵測隨機型偽缺漏值上的問題的變形,即尋找特定偽缺漏值容易發生的資料群集。我們成功地將此問題轉換為最佳化的問題,並提出基於遺傳演算法的方法以處理這個問題。我們並利用兩個真實的數據庫來進行實驗,根據實驗結果顯示,我們所提出的遺傳演算法的偵測方法能夠有效地找出最可能產生特定偽缺漏值的資料群集。 Disguised missing data is a special kind of missing data, which is not exactly missing in the data entry, but cannot reflect the fact. The presence of disguised missing data may lead to severe bias on analysis results, so the problem of detecting existing disguise values becomes an important issue in data cleansing. Following the taxonomy proposed by Little and Rubin, the types of disguise missing data can also be classified into three categories: Missing completely at random, missing at random, and missing not at random. Previous work on the detection of disguise missing data focused on the first type; no work has been conducted to the other two types. In this thesis, we present a variant of the problem of detecting the second type of disguise missing data, i.e., finding out the data group most prone to a specific disguise value. We formalize this problem as an optimization problem and propose a genetic algorithms based method to handle this problem. According to the experimental results we conducted on two real datasets, our genetic algorithms based method can discover the data group most prone to the occurrence of a given specific disguise value.
Items
  • 2 records • Pages 1 •
 
310002502048 博碩士論文區(二樓) 不外借資料 學位論文 TH 008M/0019 464103 3104 2013 一般使用(Normal) On shelf 0
310002502055 博碩士論文區(二樓) 不外借資料 學位論文 TH 008M/0019 464103 3104 2013 c.2 一般使用(Normal) On shelf 0
  • 2 records • Pages 1 •
Reviews
Export
pickup library
 
 
Change password
Login