New Course                    New Course                 New Course

EE 379K: Introduction to DATA MINING
FALL  2008

 

Class times: TTh: 11am-12:15pm, ENS 126, Unique No. 17110
Instructor: Joydeep Ghosh. ghosh@ece.utexas.edu; www.ideal.ece.utexas.edu/~ghosh
Office: ACES 3.118, 471-8980
Office Hrs: M 3-4:30 pm; Th: 3:30-5pm, in ACES 3.118
TA/grader info: Goo Jun, gjun@ece

PREREQUISITES: EE351K. Knowledge of very basic linear algebra will help

COURSE URL: http://www.ideal.ece.utexas.edu/courses/f08_ee379k/  List of topics is at: http://pegasus.ece.utexas.edu/%7Eghosh/ug-dmtopics.html.   Assignments, course notes etc are posted on blackboard.

COURSE OUTLINE: The information explosion of the past few years has us drowning in data but often starved of knowledge. Many companies that gather huge amounts of electronic data have now begun applying data mining techniques to their data warehouses to discover and extract pieces of information useful for making smart business decisions. Analysis of web data (links, content, usage) also has many applications including better search engines, finding key players in social networks etc. This course will expose you to relevant concepts pattern recognition, machine learning, large scale “cloud” computing and data visualization, that are needed for effective and practical data mining.

I will first give a series of lectures  on  some basic techniques for predictive modeling, and go over several real-life applications. This includes introduction to MapReduce (Google)/Hadoop (public version) – basic parallel programming constructs used  to work on large scale computers such as the computing “clouds” offered by Amazon. I’ll also provide an introduction to Extensive MATLAB based exercises will be conducted to give you "hands-on" experiences with data analysis. The last few classes will consist of student term-project presentations, followed by active discussion.

GRADING:
5+30+5 pts: pre-proposal presentation + Term paper (due Dec 4) + 20 min. presentation (groups of 2-3).
25 pts: Homework assignments
25 pts: Written Exam; Nov 6, in class
5 pts: Participation in discussions.
There will be no final exam.
 

Text:
P. Tan, M. Steinbach and V. Kumar, Introduction to Data Mining, Addison-Wesley, 2005. I'll be mostly focusing on the introductory chapters: 1-5, 8; Appendices.
Some sample chapters are available at the book's website, http://www-users.cs.umn.edu/~kumar/dmbook/index.php,

A set of class notes and supplementary materials will be available via Blackboard/Web.

Reference books:
I. H. Witten and E. Frank (2nd Ed, 2005), Data Mining. Morgan Kaufmann.
Machine learning viewpoint, closely tied to the WEKA software.
Hastie/Tibshirani/Friedman (2001) The Elements of Statistical Learning , Springer
Solid; stats oriented.
Duda/Hart/Stork (2000). Pattern Classification (2nd Ed) .
Solid again. Gives pattern recognition perspective.
Hadoop: http://hadoop.apache.org/core/


Disabilities statement: "The University of Texas at Austin provides upon request appropriate academic accommodations for qualified students with disabilities. For more information, contact the Office of the Dean of Students at 471-6259, 471-4641 TTY."
The above was a mandated statement, quoted verbatim. It does not imply that this course is disabled. I wonder what TTY means.

WEBSITES:

Data Mining and Knowledge Discovery Resources