题目: Cluster-based sampling of multiclass imbalanced data
报告人:Nuanwan Soonthornphisaj
时间: 2016年10月14日上午10:30
地点:电信大楼305室
邀请人:赵兴明教授
报告人简介:
Nuanwan Soonthornphisaj,硕士毕业于Asian Institute of Technology,博士毕业于Chulalongkorn University,曾在Osaka University从事博士后研究,是德国University of Freiburg、英国University of Reading和瑞士University of Bern等高校的访问学者。Soonthornphisaj博士目前是泰国Kasetsart University计算机系的Associate Professor,其主要研究方向为计算机视觉、机器学习和自然语言处理。Soonthornphisaj博士在Journal of Information Science和Intelligent Data Analysis 等国际杂志发表学术论文30余篇。
内容提要:
Class imbalance is prevalent in many real-world applications such as bioinformatics, anomaly detection, intrusion detection, fraud detection, etc. Imbalanced datasets is one of the most important problems in data mining, machine learning, and pattern recognition since the prediction of the majority classes usually gets good performance, whereas the minority classes get poor performance. In this talk, we introduce a new resampling approach based on Clustering with sampling for Multiclass Imbalanced classification using Ensemble(C-MIEN). C-MIEN uses clustering approach to create a new training set for each cluster. The new training sets consist of the new label of instances with similar characteristics. This step is applied to reduce the number of classes then the complexity problem can be easily solved by C-MIEN. After that, we apply two resampling techniques (oversampling and undersampling) to rebalance the class distribution. Finally, the class distribution of each training set is balanced and ensemble approaches are used to combine the models obtained with the proposed method through majority vote. Moreover, we carefully design the experiments and analyze the behavior of C-MIEN with different parameters (imbalance ratio and number of classifiers). The experimental results show that C-MIEN achieved higher performance than state-of-the-art methods.
欢迎各位老师同学踊跃参加!