News
  • Check LATEST schedule!. Project 2 presentations now due on 1 Dec (Wed)
  • "Density Biased Sampling" paper in list 2
  • All lecture notes are current on the secured site.
  • CVIZ - IBM data visualization tool available in 'Online Demos' section
  • Data warehousing gets big push by key vendors

    This is an advanced course in data mining, supported by Dell under the The Leadership Alliance for Research, Instruction and Technology, a strategic partnership between Dell Computer Corporation and The University of Texas at Austin. It is designed for students who have completed an introduction to data mining (e.g., MIS 382N.10 : Introduction to Data Mining, 97-98 or EE 380L : Data Mining, Fall 98, or CS 395T : Mining and Monitoring Databases, Fall 97), or who have substantial background in applied statistics and/or data analysis/visualization.

    We shall work on large, real-life data sets provided by Dell and others.

    Grading

  • 20% Minor Project
  • 50% Major Project and Presentation
  • 20% Paper Presentation in Class
  • 10% Class Participation

    For both major and minor projects, you will be working in groups of 3 or 4.
    Syllabus Restricted Access ()
  • Schedule (tentative)
  • Office Hours
  • Course proposal
  • Login (once) for access to:
  • SSH access to LANS (NEW)
  • Calendar
  • Newsgroup (lab machines access info.)
  • Project Data Files
  • Online Copies of lectures
  • References Links
  • Reading List 1 (Course Reader)
  • Reading List 2 (Online, under construction)
  • SAS Online Tutorial
    1. Getting Started with SAS Software
    2. SAS tutorial @ Univ. of New Mexico
  • Enterprise Miner Manual (Online)
  • SVM page in Germany
  • Presentations (FILO) Online Demos (JAVA)
    1. Michael Anderson - Optimal Sample Allocation for Normal Discrimination and Logistic Regression Under Stratified Sampling.
    2. Alexander Strehl - CURE: An Efficient Clustering Algorithm for Large Databases
    3. Amanda Whaley - Interactive Data Analysis: The Control Project
    4. Shi Zhong - Feature selection for classificaiton by M. Dash and H. Liu
    5. Rubeena Shahnaz - A survey of methods for scaling up inductive learning"
    6. Gunjan Gupta - ROCK: A Robust Clustering Algorithm for Categorical Attributes
    7. Fabio da Silva - Understanding Consumer Database Marketing
    8. Shailesh Kumar - Entropy Based Subspace Clustering for Mining Numerical-Data
    9. Chase Krumpelman - CLARANS : Clustering
    1. Data Visualization
    2. Statistics
      1. 2-D Gaussian data generator (adjustable correlation)
      2. Real Time 2-D regression
    3. Data Reduction
      1. Principal Curves (nonlinear)
    4. Support Vector Machines
      1. 2-D SVM classifier



    Report problems to kuiyu@lans.ece.utexas.edu