具複合型屬性之特徵群聚與選取 = Feature Clustering ...
國立高雄大學資訊工程學系碩士班

 

  • 具複合型屬性之特徵群聚與選取 = Feature Clustering and Selection with Composite Attributes
  • 紀錄類型: 書目-語言資料,印刷品 : 單行本
    並列題名: Feature Clustering and Selection with Composite Attributes
    作者: 宋偉屏,
    其他團體作者: 國立高雄大學
    出版地: [高雄市]
    出版者: 撰者;
    出版年: 民99[2010]
    面頁冊數: 110面圖,表 : 30公分;
    標題: 基因演算法
    標題: Feature Selection
    電子資源: http://handle.ncl.edu.tw/11296/ndltd/29243580338543295788
    摘要註: 特徵選取的技術在資料探勘與機器學習上扮演著相當重要的角色,當訓練用的資料集合具備大量的特徵個數時,其所需要的計算時間通常是相當可觀的,而特徵選取的技術可以來處理此一問題。一組優良的特徵集合在分類問題上不只可以擁有著高準確率並且可以減少探勘所需的時間。在過去,已經有一些藉由特徵群聚來進行特徵擷取的研究了,但都只考慮到單一屬性。因此,在這篇論文內,我們提出了二個基於基因演算法來進行複合式屬性的分群以及特徵選取。在第一個方法中,提出了一個新的染色體編碼,此編碼分成二個部分,屬性組合的部分和分群的部分。屬性組合的部分代表著哪些特徵會組合成複合式屬性,而分群的部分代表著這些特徵所歸屬的群聚。我們利用不同群聚內的特徵組合出來的集合之分類準確度與各群間特徵數量的平衡度和複合式屬性所持有的懲罰值來評估一個染色體的好壞。第二個方法則是延伸第一個方法並提出一個新的適應度函數來改進時間的效能。此新的適應度函數採用了不同群聚內的特徵組合出來的集合之分類準確度、屬性之間的相似程度、各群間特徵數量的平衡度和複合式屬性所持有的懲罰值來進行染色體的評估,其可大量減低掃描資料庫所需的時間。除此之外,我們提出了一個針對複合式屬性的分群部分的錯誤的調整程序。最後的實驗部分討論我們所提的方法所求得的分群結果,證實可以得到良好的效能,並且可在準確率與計算時間之間達成折衷。我們所提出的特徵分群方法比以往的特徵選取技術有更大的彈性,而且也可以輕易的處理在分類時特徵值遺失的問題。 Feature selection is an important pre-processing step in data mining and machine learning. An properly selected feature subset can not only reduce computational time to derive rules but also decrease classification cost. It is usually executed when the amount of attributes in a given training data is very large. In the past, some researches about feature extraction by feature clustering were proposed, but all of them considered clustering single attributes. In this thesis, we propose two GA-based clustering methods for composite-attribute clustering and feature selection. In the first method, a new chromosome representation is presented, which is divided into two parts, the composition part and the cluster part. The composition part is used to represent which attributes can be combined into a composite attribute, and the cluster part is used to denote which cluster an attribute is located in. The fitness of each individual is evaluated using the average accuracy of the composite or single attribute substitutions in clusters, the cluster balance and the total penalty for the composite attributes. The second method further extends the first one to improve the time performance. A new fitness function based on the accuracy, composition penalty, cluster balance and the attribute similarity is proposed. It can reduce the time of scanning training sets. Besides, we design an adjustment process to make the composite attributes consistent with their cluster numbers. At last, experiments are done and the experimental results show the proposed approaches can obtain a good performance and get a good trade-off between accuracy and time complexity. The proposed approaches can thus provide flexible alternatives for feature selection and can also easily handle the problem of missing values in classification.
館藏
  • 2 筆 • 頁數 1 •
 
310002032053 博碩士論文區(二樓) 不外借資料 學位論文 TH 008M/0019 464103 3027 2010 一般使用(Normal) 在架 0
310002032061 博碩士論文區(二樓) 不外借資料 學位論文 TH 008M/0019 464103 3027 2010 c.2 一般使用(Normal) 在架 0
  • 2 筆 • 頁數 1 •
評論
Export
取書館別
 
 
變更密碼
登入