Reading List 2
EE 380L - Data Mining (ESE)
Spring 2003

Notices

Paper Selection Policy

  • Students must select and present one of the following papers.
  • Paper is allocated on first come basis.
  • To select a paper, email your selection, to Srujana Merugu.
  • Scheduling of your talk depends on the topic you have chosen. Ideally it will take place in the "Student Paper Presentations" slot right after that topic has been covered in class.

A. Exploratory Data Analysis

  1. "Mining Frequent Patterns by Pattern-Growth: Methodology and Implications," 
    J. Han and J. Pei, SIGKDD Explorations, vol. 2(2), Dec. 2000

  2. "On Interactive Visualization of high-dimensional Data using the Hyperbolic Plane,"
    Joerg Walter, Helge Britter, KDD 2002

B. Clustering/Segmentation

  1. "Scalability for Clustering Algorithms Revisited"
    Fredrik Farnstrom, James Lewis, Charles Elkan, SIGKDD Explorations 2(1), 2000

  2. "Hierarchical Model-Based Clustering of Large Datasets Through Fractionation and Refractionation,"
    Jeremy Tantrum, Alejandro Murua, Werner Stuetzle, KDD 2002

   3.   "Learning to Match and Cluster Large High-Dimensional Data Sets For Data Integration,"
         William W. Cohen, Jacob Richman, KDD 2002

    4. "High-Performance Clustering of Streams and Large Data Sets,  

         L. O'Callaghan, N. Mishra, A. Meyerson, and S. Guha, ICDM 2002.

 

    5. "Mining the Stock Market: Cluster Discovery".  

        M. Gavrilov, D. Angelov, and P. Indyk KDD 2000.

C. Association Rules

  1. Discovering Frequent Substructures from Hierarchical Semi-structured Data
    Gao Cong, Lan Yi, Bing Liu, Ke Wang, SDM 2002

  2. Learning Simple Relations: Theory and Applications
    Pavel Berkhin and Jonathan D. Becher

  3. "Selecting the Right Interestingness Measure for Association Patterns,"
    Pang-Ning Tan, Vipin Kumar, Jaideep Srivastava, KDD 2002

  4. "Small is Beautiful: Discovering the Minimal Set of Unexpected Patterns,"
    Padmanabhan, B. and Tuzhilin, A., KDD-2000, pp. 54-63

  5. "Empirical Bayes Screening for Multi-item Associations,"
    William DuMouchel and Daryl Pregibon, KDD-2001, pp. 67-76

D. Classification and Prediction

  1. Segmented Regression Estimators for Massive Data Sets
    Ramesh Natarajan and Edwin Pednault, SDM 2002

  2. Why the Information Explosion Can Be Bad for Data Mining, and How Data Fusion Provides a Way Out
    Peter van der Putten, Joost N. Kok, and Amar Gupta, SDM 2002

  3. "What's the Code? Automatic Classification of Source Code Archives,"
    Secil Ugurel, Robert Krovetz, Lee Giles, David M. Pennock, Eric J. Glover, Hongyuan Zha, KDD 2002

  4. "Transforming classifier scores into accurate multiclass probability estimates,"
    Bianca Zadrozny, Charles Elkan, KDD 2002

  5. "BOAT -- Optimistic Decision Tree Construction,"
    J. E. Gehrke, Venkatesh Ganti, Raghu Ramakrishnan, and Wei-Yin Loh.
    Proceedings of the 1999 SIGMOD Conference, Philadelphia, Pennsylvania, 1999.

  6. "Probabilistic Classification and Clustering in Relational Data,"
    B. Taskar
    , E. Segal & D. Koller, IJCAI-01

E. Web Mining and Information Retrieval

  1. "Mining Knowledge-Sharing Sites for Viral Marketing,"
    Matthew Richardson, Pedro Domingos, KDD 2002
  2. "Efficiently Mining Frequent Trees in a Forest,"
    Mohammed Zaki, KDD 2002
  3. "ANF: A Fast and Scalable Tool for Data Mining in Massive Graphs,"
    Christopher R. Palmer, Phillip B. Gibbons, Christos Faloutsos, KDD 2002
  4. "Web Site Mining: A new way to spot Competitors, Customers and Suppliers in the World Wide Web,"
    Martin Ester, Hans-Peter Kriegel, Matthias Schubert, KDD 2002
  5. "Agglomerative Clustering of A Search Engine Query Log,"
    Beeferman, D. and Berger, A., KDD-2000, pp. 407 - 416
  6. "Intermediaries: An Approach to Manipulating Information Streams,"
    Barrett, R. and Maglio, P.P., IBM Systems Journal, 38, 1999
  7. Personalization from Incomplete Data: What you don’t know can hurt,”
    Padmanabhan, B., Zheng, Z., and Kimbrough, S., KDD-2001

F. Miscellaneous

  1. Transforming Data to Satisfy Privacy Constraints
    Vijay S. Iyengar, KDD 2002

  2. From Run-time Behavior to Usage Scenarios: An Interaction-pattern Mining Approach
    Mohammad El-Ramly, Eleni Stroulia, Paul Sorenson, KDD 2002

  3. Customer Lifetime Value Modeling and Its Use for Customer Retention Planning
    Saharon Rosset, Einat Neumann, Uri Eick, Nurit Vatnik, Yizhak Idan, KDD 2002

  4. Learning Nonstationary Models of Normal Network Traffic for Detecting Novel Attacks
    Matthew V. Mahoney, Philip K. Chan, KDD 2002


Last updated 01/2003