Reading List 2 (Online) 1 (Course Reader)

    Exploratory Data Analysis

  1. Comprehensible Knowledge Discovery: Gaining Insight from Data
    Michael J. Pazzani,
    First Federal Data Mining Conference and Exposition, pp 73-82, Washington, DC.
  2. Visualization Techniques for Mining Large Databases: A Comparison
    Daniel A. Keim, Hans-Peter Kriegel.
    IEEE Transactions on Knowledge and Data Engineering, Special Issue on Data Mining, Vol. 8, No. 6, 1996, pp 923-938
  3. Discovery-driven Exploration of OLAP Data Cubes
    Nimrod Megiddo, Sunita Sarawagi, Rakesh Agrawal
    In Proc. Sixth International Conference on Extending Database Technology (EDBT), Mar 1998

    Clustering/Segmentation

  4. Clustering Large Datasets in Arbitrary Metric Spaces
    Venkatesh Ganti, Raghu Ramakrishnan, Johannes Gehrke, Allison L. Powell, James C. French
    Proceedings of the 5th International Conference on Data Engineering, 23-26 March 1999, Sydney, Austrialia, IEEE CS Press, 1999, pp. 502-511
  5. ROCK: A Robust Clustering Algorithm for Categorical Attributes
    Sudipto Guha, Rajeev Rastogi, Kyuseok Shim
    Serendip Data Mining Project (Bell Labs)
    Proceedings of the 15th International Conference on Data Engineering, 23-26 March 1999, Sydney, Austrialia, IEEE CS Press, 1999, pp. 512-521
  6. CACTUS-Clustering Categorical Data Using Summaries
    Venkatesh Ganti, Raghu Ramakrishnan, Johannes Gehrke
    DEMON project
    Proc. 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), Aug 1999, pp 73-83

    Association Rules

  7. User Profiling in Personalization Applications through Rule Discovery and Validation
    Gediminas Adomavicius, Alexander Tuzhilin
    Proc. 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), Aug 1999, pp 377-381
  8. Using Association Rules for Product Assortment Decisions: A Case Study
    Tom Brijs, Gilbert Swinnen, Koen Vanhoof, Geert Wets
    Proc. 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), Aug 1999, pp 254-260
  9. A Statistical Theory for Quantitative Association Rules
    Yonatan Aumann, Yehuda Lindell
    Proc. 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), Aug 1999, pp 261-270
  10. Constraint-Based Rule Mining in Large, Dense Databases
    Roberto J. Bayardo Jr., Rakesh Agrawal and Dimitrios Gunopulos
    Proceedings of the 15th International Conference on Data Engineering, 1999 Mar, Sydney, Austrialia, IEEE CS Press, 1999 Scheines, R.
  11. Beyond Market Baskets: Generalizing Association Rules to Correlations(Dependence Rules)
    Craig Silverstein, Sergey Brin, Rajeev Motwani
    Data Mining and Knowledge Discovery, 2, 1998, pp. 39-68
  12. Pruning and Grouping Discovered Association Rules
    H. Toivonen, M. Klemettinen, P. Ronkainen, K. Hätönen, and H. Mannila
    In MLnet Workshop on Statistics, Machine Learning and Discovery in Databases, pp 47-52, Heraklion, Crete, Greece, Apr 1995

    Classification

  13. SLIQ: A Fast Scalable Classifier for Data Mining
    Manish Mehta, Rakesh Agrawal and Jorma Rissanen
    Proc. Fifth Int'l Conference on Extending Database Technology, Avignon, France, Mar 1996
  14. An Interval Classifier for Database Mining Applications
    Rakesh Agrawal, S. Ghosh, T. Imielinski, B. Iyer, and A. Swami
    Proc. 18th Int'l Conference on Very Large Databases, pp 560-573, Vancouver, Aug 1992
  15. Support vector classifiers: a first look
    David M.J. Tax, D. de Ridder, Robert P.W. Duin
    Proceedings of the Third Annual Conference of the Advanced School for Computing and Imaging, ASCI, Delft, June 1997
  16. A tutorial on Support Vector Machines for Pattern Recognition
    Christopher J. C. Burges
    Bell Labs SVM page
    Data Mining and Knowledge Discovery, Vol. 2, Number 2, p. 121-167, 1998
  17. On Support Vector Decision Trees for Database Marketing
    Kristin P. Bennett, D. H. Wu, L. Auslender
    .P.I Math Report No. 98-100, Rensselaer Polytechnic Institute, Troy, NY, 1998

    Sampling

  18. Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules
    S.D. Lee, David W. Cheung, Ben Kao
    Data Mining and Knowledge Discovery, An International Journal, Vol. 2, pp. 233-262, Kluwer Academic Publishers, 1998
  19. Sampling Large Databases for Association Rules
    Hannu Toivonen
    In 22th International Conference on Very Large Databases (VLDB'96), 134-145, Mumbay, India, September 1996. Morgan Kaufmann
  20. Density Biased Sampling: An Improved Method for Data Mining and Clustering
    Christopher R. Palmer and Christos Faloutsos
    CMU Technical Report CMU-CS-99-113

    Scalability Issues

  21. Scaling EM (Expectation-Maximization) Clustering to Large Databases
    Paul Bradley, Usama FAyyad, and Cory Reina
    Microsoft Research Technical Report MSR-TR-98-35, Revised Feb 1999
  22. Scalable Parallel Data Mining for Association Rules
    Eui-Hong (Sam) Han, George Karypis, and Vipin Kumar
    IEEE Transactions on Knowledge and Data Engineering (To appear)
  23. The Effects of Training Set Size on Decision Tree Complexity
    Tim Oates, David Jensen
    Proceedings of the 14th International Conference on Machine Learning. 1997
  24. A Survey of Methods for Scaling Up Inductive Learning Algorithms"
    Foster .J. Provost, Venkat Kolluri
    Data Mining and Knowledge Discovery Journal 3(2), pp 131-169

    Web Mining &Information Retrieval

  25. Searching the World Wide Web
    Steve Lawrance and C. Lee Giles Science 280, p. 98, April 3, 1998
  26. Empirical Analysis of Predictive Algorithms for Collaborative F iltering
    John S. Breese, David Heckerman, Carl Kadie
    Microsoft Research Technical Report MSR-TR-98-12, May 1998

    Mining of Sequential Patterns and Time Series

  27. Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases
    Rakesh Agrawal, King-Ip Lin, Harpreet S. Sawhney, and Kyuseok Shim
    Proc. 21st International Conference on Very Large Databases, Zurich, Switzerland, Sep 1995

    Miscellaneous

  28. What Makes Patterns Interesting in Knowledge Discovery Systems
    Avi Silberschatz, Alexander Tuzhilin
    IEEE Transactions on Knowledge and Data Engineering Vol. 8, No. 6, Dec 1996, pp 970-974
  29. Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem
    Mauricio A. Hernández, Salvatore J. Stolfo
    Data Mining and Knowledge Discovery, 1997, pp. 9-37
  30. Discovering Robust Knowledge from Databases that Change
    Chun-Nan Hsu and Craig A. Knoblock
    Data Mining and Knowledge Discovery, 2(1), 1998, pp. 69-95
  31. Statistics and Data Mining Techniques for Lifetime Value Modeling
    D.R. Mani, James Drew, Andrew Betz, Piew Datta
    Proc. 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), Aug 1999, pp 94-103
  32. Detecting Changes in Categorical Data: Mining Contrast Sets
    Stephen D. Bay, Michael J. Pazzani
    Proc. 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), Aug 1999, pp 302-306
  33. The Impact of Changing Populations on Classifier Performance
    Mark G. Kelly, David J. Hand, Niall M. Adams
    Proc. 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), Aug 1999, pp 367-371
  34. MetaCost: A General Method for Making Classifiers Cost-Sensitive
    Pedro Domingos
    Proc. 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), Aug 1999, pp 155-164
  35. Interactive Data Analysis: The Control Project
    Joseph M. Hellerstein, et al.
    The CONTROL project
    IEEE Computer, 32(8), Aug, 1999, pp. 51-59