EE 380L: DATA MINING
Spring 2006
Class times: TTh: 9:30-11am, ENS 126, Unique No. 15845
Instructor: Joydeep Ghosh. ghosh@ece.utexas.edu;
www.lans.ece.utexas.edu/~ghosh
Office: ACES 3.118, 471-8980
Office Hrs: TTh 1:30-2:30pm. Other times by appointment
only.
TA info: TBD xx@lans.ece.utexas.edu, office hrs:
ACES 3.106
PREREQUISITES: (Graduate standing in ECE, BME, CS or Maths) OR (consent of the instructor). You are expected to know basics (undergraduate level) of probability/statistics. Knowledge of basic linear algebra will help, but is not required.
COURSE URL: http://www.lans.ece.utexas.edu/~rsuju/ee380l
COURSE OUTLINE: The information explosion of the past few years has us drowning in data but often starved of knowledge. Many companies that gather huge amounts of electronic data have now begun applying data mining techniques to their data warehouses to discover and extract pieces of information useful for making smart business decisions. Effective data mining, as opposed to data dredging, requires an understanding of concepts from exploratory data analysis, pattern recognition, machine learning, heterogenous data bases, parallel processing and data visualization, in addition to knowing the problem domain.
I will first give a
series of lectures
. While studying techniques for database
representation/modeling, clustering, classification, finding associations and
sequence processing, emphasis will be placed on the issues of algorithm
scalability, performance, interpretability and ability to deal with garbage
data. You will be using the Java based public domain software such as WEKA , for some class exercises. The
last few classes will consist of student term-project presentations, followed by
active discussion.
GRADING:
5+35+5 pts: pre-proposal presentation +
Term paper (due May 2) + 20 min. presentation (groups of 2-3).
25 pts:
Homework assignments
25 pts: Written Exam; Thursday, March 23,
in class
5
pts: Participation in discussions.
There will be no final exam.
A set of class notes and supplementary materials will be available via HKN.
Strongly Recommended Books
I. H. Witten and E. Frank (2nd Ed, 2005), Data Mining.
Morgan Kaufmann.
Machine learning viewpoint, closely tied to the WEKA software.
P. Tan, M. Steinbach and V. Kumar,
Introduction to Data Mining, Addison-Wesley, 2005.
Other recommended books:
Hastie/Tibshirani/Friedman (2001) The Elements of Statistical Learning ,
Springer
Solid; stats oriented.
Duda/Hart/Stork (2000). Pattern Classification (2nd Ed) .
Solid again. Gives pattern recognition perspective.
D. Hand, H. Mannila, P.
Smyth (2001), Principles of Data Mining , MIT Press.
More
conceptual and statistically oriented.
J. Han and M. Kamber (2000) Data
Mining: Concepts and Techniques , Morgan Kaufmann.
Database oriented.
Disabilities statement: "The University of Texas at Austin provides
upon request appropriate academic accommodations for qualified students with
disabilities. For more information, contact the Office of the Dean of Students
at 471-6259, 471-4641 TTY."
The above was a mandated statement, quoted
verbatim. It does not imply that this course is disabled. I wonder what TTY
means.
WEBSITES:
Data Mining and Knowledge Discovery
Resources