Reading List 2 (Online) 1 (Course Reader)

    Database Marketing

  1. Discovering Roll-Up Dependencies (Detailed Technical Report)
    Jef Wijsen, Raymond T. Ng, Toon Calders
    Proc. 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), 1999 Aug, pp. 213-222

    Feature Selection, Stepwise Regression

  2. The Variable Selection Problem (VIGNETTE)
    Edward I. George
    (Dept. of MSIS, U.T. Austin)
  3. Transformations of the Explanatory Variables in the Logistic Regression Model for Binary Data
    Pages 1 2 3 4 5 6 7
    Richard Kay, Sarah Little
    Biometrika 74, 1987, pp 495-501
  4. Feature Selection for Classification
    Manoranjan Dash, Huan Liu
    Intelligent Data Analysis, 1(3), 1997, pp 131-156

    Imputation

  5. Large-Scale Imputation for Complex Surveys
    David A. Market, David R. Judkins, and Marianne Winglee
  6. Clustering & Segmentation

  7. Clustering Large Datasets in Arbitrary Metric Spaces
    Venkatesh Ganti, Raghu Ramakrishnan, Johannes Gehrke, Allison L. Powell, James C. French
    Proceedings of the 5th International Conference on Data Engineering, 23-26 March 1999, Sydney, Austrialia, IEEE CS Press, 1999, pp. 502-511
  8. ROCK: A Robust Clustering Algorithm for Categorical Attributes
    Sudipto Guha, Rajeev Rastogi, Kyuseok Shim
    Serendip Data Mining Project (Bell Labs)
    Proceedings of the 15th International Conference on Data Engineering, 23-26 March 1999, Sydney, Austrialia, IEEE CS Press, 1999, pp. 512-521
  9. BIRCH: an efficient data clustering method for very large databases
    Tian Zhang, Raghu Ramakrishnan, Miron Livny
    Proceedings of the 1996 ACM SIGMOD international conference on Management of data , 1996, pp. 103-114
  10. CURE: An Efficient Clustering Algorithm for Large Databases
    Sudipto Guha, Rajeev Rastogi, Kyuseok Shim
    Serendip Data Mining Project (Bell Labs)
    Proceedings of the ACM SIGMOD Conference, 1998
  11. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise
    M. Ester M., H.-P. Kriegel, J. Sander, X. Xu
    LudWig Maximilians Universitat Munchen
    Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD-96), Portland, OR, 1996, pp. 226-231
  12. Efficient and Effective Clustering Methods for Spatial Data Mining
    Raymond T. Ng, Jiawei Han
    Intelligent Database Systems Research Laboratory
    Proc. of 1994 Int'l Conf. on Very Large Data Bases (VLDB'94), Santiago, Chile, September 1994, pp. 144-155
  13. Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values
    Zhexue Huang
    Data Mining and Knowledge Discovery 2(3): 283-304 (1998)

    Combining Multiple Models

  14. On Combining Artificial Neural Nets
    Sharkey, A.J.C.(1996)
    Boosting, bagging and the theory of ensembles
    Connection Science, 8, 3/4, pp 299-314

    Beyond Association Rules

  15. User Profiling in Personalization Applications through Rule Discovery and Validation
    Gediminas Adomavicius, Alexander Tuzhilin
    Proc. 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), 1999 Aug, pp. 377-381
  16. Scalable Techniques for Mining Causal Structures
    Craig Silverstein, Rajeev Motwani, Sergey Brin, and Jeff D. Ullman
    Proceedings of the 24th International Conference on Very Large Data Bases (VLDB), 1998
  17. Using Association Rules for Product Assortment Decisions: A Case Study
    Tom Brijs, Gilbert Swinnen, Koen Vanhoof, Geert Wets
    Proc. 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), 1999 Aug, pp. 254-260
  18. A Statistical Theory for Quantitative Association Rules
    Yonatan Aumann, Yehuda Lindell
    Proc. 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), 1999 Aug, pp. 261-270

    Support Vector Machines

  19. A tutorial on Support Vector Machines for Pattern Recognition
    Christopher J. C. Burges
    Bell Labs SVM page
    Data Mining and Knowledge Discovery, Vol. 2, Number 2, p. 121-167, 1998
  20. A tutorial on Support Vector Regression
    Alex Smola and Bernhard Schökopf
    German SVM page
    Neuro COLT Technical Report TR-1998-030, Royal Holloway College
  21. Web Mining & Information Retrieval

  22. Searching the World Wide Web
    Steve Lawrance and C. Lee Giles Science 280, p. 98, April 3, 1998
  23. Online Interaction & Processing

  24. Interactive Data Analysis: The Control Project
    Joseph M. Hellerstein, et al.
    The CONTROL project
    IEEE Computer, 32(8), Aug, 1999, pp. 51-59

    Sampling

  25. Density Biased Sampling: An Improved Method for Data Mining and Clustering
    Christopher R. Palmer and Christos Faloutsos
    CMU Technical Report CMU-CS-99-113
  26. Optimal Sample Allocation for Normal Discrimination and Logistic Regression Under Stratified Sampling
    Pages 1 2 3 4 5
    Tzu-Cheg Kao, George P. McCabe
    Journal of the American Statistical Association, 86, 1991, pp 432-436

    Scalability Issues

  27. A Survey of Methods for Scaling Up Inductive Learning Algorithms"
    Foster .J. Provost, Venkat Kolluri
    Data Mining and Knowledge Discovery Journal 3(2), pp 131-169

    Miscellaneous

  28. Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem
    Mauricio A. Hernández, Salvatore J. Stolfo
    Data Mining and Knowledge Discovery, 1997, pp. 9-37
  29. Discovering Robust Knowledge from Databases that Change
    Chun-Nan Hsu and Craig A. Knoblock
    Data Mining and Knowledge Discovery, 2(1), 1998, pp. 69-95
  30. Statistics and Data Mining Techniques for Lifetime Value Modeling
    D.R. Mani, James Drew, Andrew Betz, Piew Datta
    Proc. 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), 1999 Aug, pp. 94-103
  31. Detecting Changes in Categorical Data: Mining Contrast Sets
    Stephen D. Bay, Michael J. Pazzani
    Proc. 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), 1999 Aug, pp. 302-306
  32. The Impact of Changing Populations on Classifier Performance
    Mark G. Kelly, David J. Hand, Niall M. Adams
    Proc. 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), 1999 Aug, pp. 367-371

    General Interest Stories

  33. Wall Street Journal Special Report - Technology: Overload!
  34. Getting business smarts Focus on customer information from the Internet is reviving data warehousing *****************