Executive Software
Engineering Program
EE382C –Data Mining
Spring 2003
Instructor:
Joydeep
Ghosh, Ph.D. Professor
Email
address: ghosh@ece.utexas.edu;
URL:
http://www.lans.ece.utexas.edu/~ghosh
Course Title and Description:
Many companies that gather huge amounts of electronic data have now begun applying data mining techniques to their data warehouses to discover and extract “hidden” patterns useful for making smart business decisions. Effective data mining requires an understanding of concepts from exploratory data analysis, pattern recognition, machine learning/ AI, heterogenous data bases, parallel processing and data visualization, in addition to knowing the application domain. I will focus on basic techniques for data mining, including methods useful for analyzing information from the world wide web. Demos using an industrial strength software (SAS) as well as a public domain JAVA package (WEKA) will be given and some applications/case studies will be discussed. The course involves a mid-term exam, a paper presentation and a term project. There will be no final exam.
Textbook(s):
Title: Data Mining: Concepts and Techniques
Author Han and Kamber (HK)
Publisher Morgan Kaufmann
ISBN: 1-55860-489-8
Title: Data Mining
Author: Witten and Frank (WF)
Publisher: Morgan Kaufmann
ISBN: 1-55860552-5
Course Expectations:
This course requires students to have very basic knowledge of JAVA. An undergraduate level understanding of probability/statistics, data analysis, databases and linear algebra is assumed. This is a graduate course so the workload will be medium to heavy.
While studying techniques for database representation/modeling, clustering, classification, finding associations and sequence processing, emphasis will be placed on the issues of algorithm scalability, performance, interpretability and the ability to deal with garbage data. 10-15 minute student talks will be interwoven with the lectures, depending on class size. The last two classes will largely consist of student term-project presentations, followed by active discussion.
Class outline:
January 24 and 25 -- Introduction
Reading
Assignment:
HK ch1-3; WF ch 1, 2
Area
of study: overview, SAS demos, data warehousing, OLAP; Data quality
and pre-processing
February
21 and 22
Reading
Assignment:: HK ch5, 6, 8 ; WF 3, 4.5, 6.6; 7
Area
of study: clustering/segmentation; market basket analysis; intro to
finding association rules
March
21 and 22
Reading Assignment:: HK ch 7; WF rest of Ch 4-7.
Area
of study: Assoc. Rules (contd), classification; prediction/
forecasting
April
11 and 12
Reading
Assignment: from
papers/notes
Area
of study: combining multiple
models; web analytics: analyzing
hyperlink structure and content of websites.
May
9 and 10
Reading Assignment: HK ch
9; notes
Area
of study: web analytics (contd): analyzing usage of web sites.; project
presentations; course wrap-up; the future of data mining.
Grading Information:
45% final
project,
20%
written homework,
20%
mid-term
10% brief
presentation of research paper (groups of 2)
5% class participation