國立高雄大學圖資館 |

語系: 繁體中文

說明(常見問題)

圖資館首頁

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

特定偽缺漏值好發生之資料群集之偵測 = Detecting the Da...

國立高雄大學資訊工程學系碩士班

特定偽缺漏值好發生之資料群集之偵測 = Detecting the Data Group most Prone to a Specific Disguise Value

紀錄類型:	書目-語言資料,印刷品 : 單行本
並列題名:	Detecting the Data Group most Prone to a Specific Disguise Value
作者:	馮文榆,
其他團體作者:	國立高雄大學
出版地:	[高雄市]
出版者:	撰者;
出版年:	2013[民102]
面頁冊數:	50面圖，表 : 30公分;
標題:	資料清理
標題:	data cleansing
電子資源:	http://handle.ncl.edu.tw/11296/ndltd/93939081335077285614
附註:	參考書目：面40-41
附註:	103年12月16日公開
附註:	內容為英文
摘要註:	偽缺漏值是缺漏值的一種特殊的缺漏值；在資料欄位中，偽缺漏值並不為空，但所擁有的資料卻無法反應事實。偽缺漏值的存在可能會造成分析結果的嚴重偏差，因此，偽缺漏值偵測遂成為資料清理中一個重要的議題。根據Little與Rubin所提出的分類，偽缺漏值可分為完全隨機型、隨機型、以及非隨機型三種，而過去的方法往往著重於第一種偽缺漏值的處理，並沒有對另外兩種類型進行探討。在本論文中，我們提出一個關於偵測隨機型偽缺漏值上的問題的變形，即尋找特定偽缺漏值容易發生的資料群集。我們成功地將此問題轉換為最佳化的問題，並提出基於遺傳演算法的方法以處理這個問題。我們並利用兩個真實的數據庫來進行實驗，根據實驗結果顯示，我們所提出的遺傳演算法的偵測方法能夠有效地找出最可能產生特定偽缺漏值的資料群集。 Disguised missing data is a special kind of missing data, which is not exactly missing in the data entry, but cannot reflect the fact. The presence of disguised missing data may lead to severe bias on analysis results, so the problem of detecting existing disguise values becomes an important issue in data cleansing. Following the taxonomy proposed by Little and Rubin, the types of disguise missing data can also be classified into three categories: Missing completely at random, missing at random, and missing not at random. Previous work on the detection of disguise missing data focused on the first type; no work has been conducted to the other two types. In this thesis, we present a variant of the problem of detecting the second type of disguise missing data, i.e., finding out the data group most prone to a specific disguise value. We formalize this problem as an optimization problem and propose a genetic algorithms based method to handle this problem. According to the experimental results we conducted on two real datasets, our genetic algorithms based method can discover the data group most prone to the occurrence of a given specific disguise value.

特定偽缺漏值好發生之資料群集之偵測 = Detecting the Data Group most Prone to a Specific Disguise Value
馮, 文榆

特定偽缺漏值好發生之資料群集之偵測 = Detecting the Data Group most Prone to a Specific Disguise Value / 馮文榆撰 - [高雄市] : 撰者, 2013[民102]. - 50面 ; 圖，表 ; 30公分.
參考書目：面40-41103年12月16日公開內容為英文.
資料清理data cleansing

特定偽缺漏值好發生之資料群集之偵測 = Detecting the Data Group most Prone to a Specific Disguise Value
LDR:03445nam0a2200301 450 001 389689
005 20170214094525.0
009 389689
010 0 $b 精裝
010 0 $b 平裝
100 $a 20170214d2013 k y0chiy05 b
101 1 $a eng $d chi $d eng
102 $a tw
105 $a ak am 000yy
200 1 $a 特定偽缺漏值好發生之資料群集之偵測 $d Detecting the Data Group most Prone to a Specific Disguise Value $z eng $f 馮文榆撰
210 $a [高雄市] $c 撰者 $d 2013[民102]
215 0 $a 50面 $c 圖，表 $d 30公分
300 $a 參考書目：面40-41
300 $a 103年12月16日公開
300 $a 內容為英文
314 $a 指導教授：林文揚博士
328 $a 碩士論文--國立高雄大學資訊工程學系碩士班
330 $a 偽缺漏值是缺漏值的一種特殊的缺漏值；在資料欄位中，偽缺漏值並不為空，但所擁有的資料卻無法反應事實。偽缺漏值的存在可能會造成分析結果的嚴重偏差，因此，偽缺漏值偵測遂成為資料清理中一個重要的議題。根據Little與Rubin所提出的分類，偽缺漏值可分為完全隨機型、隨機型、以及非隨機型三種，而過去的方法往往著重於第一種偽缺漏值的處理，並沒有對另外兩種類型進行探討。在本論文中，我們提出一個關於偵測隨機型偽缺漏值上的問題的變形，即尋找特定偽缺漏值容易發生的資料群集。我們成功地將此問題轉換為最佳化的問題，並提出基於遺傳演算法的方法以處理這個問題。我們並利用兩個真實的數據庫來進行實驗，根據實驗結果顯示，我們所提出的遺傳演算法的偵測方法能夠有效地找出最可能產生特定偽缺漏值的資料群集。 Disguised missing data is a special kind of missing data, which is not exactly missing in the data entry, but cannot reflect the fact. The presence of disguised missing data may lead to severe bias on analysis results, so the problem of detecting existing disguise values becomes an important issue in data cleansing. Following the taxonomy proposed by Little and Rubin, the types of disguise missing data can also be classified into three categories: Missing completely at random, missing at random, and missing not at random. Previous work on the detection of disguise missing data focused on the first type; no work has been conducted to the other two types. In this thesis, we present a variant of the problem of detecting the second type of disguise missing data, i.e., finding out the data group most prone to a specific disguise value. We formalize this problem as an optimization problem and propose a genetic algorithms based method to handle this problem. According to the experimental results we conducted on two real datasets, our genetic algorithms based method can discover the data group most prone to the occurrence of a given specific disguise value.
510 1 $a Detecting the Data Group most Prone to a Specific Disguise Value $z eng
610 0 $a 資料清理 $a 資料探勘 $a 資料品質 $a 偽缺漏值 $a 遺傳演算法 $a 隨機缺漏 $a 無偏差樣本
610 1 $a data cleansing $a data mining $a data quality $a data cleansing $a disguised missing data $a genetic algorithms $a missing at random $a unbiased sampling
681 $a 008M/0019 $b 464103 3104 $v 2007年版
700 1 $a 馮 $b 文榆 $4 撰 $3 614549
712 0 2 $a 國立高雄大學 $b 資訊工程學系碩士班 $3 353878
801 0 $a tw $b NUK $c 20141204 $g CCR
856 7 $z 電子資源 $2 http $u http://handle.ncl.edu.tw/11296/ndltd/93939081335077285614