Instructor and Course Information
|
|
Announcements
| 1/26: Suggested Projects are posted ! |
Schedule
|
Topic |
Reading |
Assigned |
Due |
|
1. Introduction to Data Mining (01/16; 01/20) |
|
||
|
2/10/09 |
|||
|
3. Clustering (01/26, 01/29, 02/03, 02/05) |
2/23/09 |
||
|
4. Classification ( 02/10, 02/12, 02/17, 02/19, additional) |
03/10/09 |
||
|
5. Semi-supervised Clustering ( 02/26 ) |
|||
|
6. Dimensionality Reduction (03/03, 03/05) |
|||
|
7. Graph Mining |
|||
|
8. David Kreig |
|
||
|
9. Devin Cook : Stupid Filter 10. Yin Hu: Pattern Mining in Images |
|||
|
11. Dan Staley: Game Learning 12. ChingJoo Khor: EM algorithm for approximate linkage analysis |
|||
|
13. Daniel Harris: 14. Fo Bo: |
|||
|
15. Joshua Guerin 16. Matt Caldwell |
|||
|
17. Wille Miller 18. Kai Wang |
|||
|
19. Sami Taha 20. Tom Shearing |
|||
|
21. Jeremy Howard 22. Chandrasekarapuram, Mohan |
Syllabus
With the unprecedented rate at which data is being collected today in almost all fields of human endeavor, there is an emerging economic and scientific need to extract useful information from it. Data mining is the process of automatic discovery of patterns, changes, associations and anomalies in massive databases, and is a highly inter-disciplinary field representing the confluence of several disciplines, including database systems, data warehousing, machine learning, statistics, algorithms, data visualization, and high-performance computing. This seminar will provide an introductory survey of the main topics (including and not limited to classification, regression, clustering, association rules, trend detection, feature selection, similarity search, data cleaning, privacy and security issues, and etc.) in data mining and knowledge discovery as well as a wide spectrum of data mining applications such as biomedical informatics, bioinformatics, financial market study, image processing, network monitoring, social service analysis.
For each topic, a few most related research papers will be selected as the major teaching material. Students are expected to read the assigned paper before each class and to participate the discussion in each class.
Prerequisite
- None
- Some background in algorithms, data structures, statistics, machine learning, artificial intelligence, and databases is helpful.
References
No required textbook
- Data Mining --- Concepts and techniques, by Han and Kamber, Morgan Kaufmann, 2006. (ISBN:1-55860-901-6)
- Principles of Data Mining, by Hand, Mannila, and Smyth, MIT Press, 2001. (ISBN:0-262-08290-X)
- Introduction to Data Mining, by Tan, Steinbach, and Kumar, Addison Wesley, 2006. (ISBN:0-321-32136-7)
- The Elements of Statistical Learning --- Data Mining, Inference, and Prediction, by Hastie, Tibshirani, and Friedman, Springer, 2001. (ISBN:0-387-95284-5)
Grading
Each student in CS685 will be expected to present a paper and lead the discussion following his/her presentation and do a project on selected topics. There will be neither homework nor exam.
- Homeworks (4) 40%
- Presentation: 30%
- Project: 30%
- Exam: 15%
Tentative Course Outline
1. Introduction
· What is data mining?
2. Data Preprocessing
· Data sampling, data cleaning, feature selection, and dimensionality reduction
3. Classification
· Tree-based, rule-based, and instance-based methods
· Bayesian methods (naive Bayes and Bayesian belief networks)
· Neural networks, linear discriminant analysis, support vector machines, and ensemble methods
· Model evaluation
4. Association Analysis
· Apriori algorithm and its extensions
· Pattern evaluation (subjective and objective interestingness measures)
· Sequential patterns and graph mining
5. Clustering
· Partitional and hierarchical clustering methods
· Graph-based and density-based methods
· Cluster evaluation