| Reading List 2 (Online) 1 (Course Reader) |
Database Marketing |
|
|
Discovering Roll-Up Dependencies
(Detailed Technical Report)
Jef Wijsen, Raymond T. Ng, Toon Calders Proc. 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), 1999 Aug, pp. 213-222 |
Feature Selection, Stepwise Regression |
|
|
The Variable Selection Problem (VIGNETTE)
Edward I. George (Dept. of MSIS, U.T. Austin) |
|
|
Transformations of the Explanatory Variables in the Logistic Regression Model for Binary Data Pages 1 2 3 4 5 6 7 Richard Kay, Sarah Little Biometrika 74, 1987, pp 495-501 |
|
|
Feature Selection for Classification
Manoranjan Dash, Huan Liu Intelligent Data Analysis, 1(3), 1997, pp 131-156 |
Imputation |
|
|
Large-Scale Imputation for Complex Surveys
David A. Market, David R. Judkins, and Marianne Winglee |
Clustering & Segmentation |
|
|
Clustering Large Datasets in Arbitrary Metric Spaces
Venkatesh Ganti, Raghu Ramakrishnan, Johannes Gehrke, Allison L. Powell, James C. French Proceedings of the 5th International Conference on Data Engineering, 23-26 March 1999, Sydney, Austrialia, IEEE CS Press, 1999, pp. 502-511 |
|
|
ROCK: A Robust Clustering Algorithm for Categorical Attributes
Sudipto Guha, Rajeev Rastogi, Kyuseok Shim Serendip Data Mining Project (Bell Labs) Proceedings of the 15th International Conference on Data Engineering, 23-26 March 1999, Sydney, Austrialia, IEEE CS Press, 1999, pp. 512-521 |
|
BIRCH: an efficient data clustering method for very large databases
Tian Zhang, Raghu Ramakrishnan, Miron Livny Proceedings of the 1996 ACM SIGMOD international conference on Management of data , 1996, pp. 103-114 |
|
CURE: An Efficient Clustering Algorithm for Large Databases
Sudipto Guha, Rajeev Rastogi, Kyuseok Shim Serendip Data Mining Project (Bell Labs) Proceedings of the ACM SIGMOD Conference, 1998 |
|
|
A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise
M. Ester M., H.-P. Kriegel, J. Sander, X. Xu LudWig Maximilians Universitat Munchen Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD-96), Portland, OR, 1996, pp. 226-231 |
|
|
Efficient and Effective Clustering Methods for Spatial Data Mining
Raymond T. Ng, Jiawei Han Intelligent Database Systems Research Laboratory Proc. of 1994 Int'l Conf. on Very Large Data Bases (VLDB'94), Santiago, Chile, September 1994, pp. 144-155 |
|
|
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values Zhexue Huang Data Mining and Knowledge Discovery 2(3): 283-304 (1998) |
Combining Multiple Models |
|
|
On Combining Artificial Neural Nets
Sharkey, A.J.C.(1996) Boosting, bagging and the theory of ensembles Connection Science, 8, 3/4, pp 299-314 |
Beyond Association Rules |
|
|
User Profiling in Personalization Applications through Rule Discovery and Validation
Gediminas Adomavicius, Alexander Tuzhilin Proc. 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), 1999 Aug, pp. 377-381 |
|
|
Scalable Techniques for Mining Causal Structures
Craig Silverstein, Rajeev Motwani, Sergey Brin, and Jeff D. Ullman Proceedings of the 24th International Conference on Very Large Data Bases (VLDB), 1998 |
|
|
Using Association Rules for Product Assortment Decisions: A Case Study
Tom Brijs, Gilbert Swinnen, Koen Vanhoof, Geert Wets Proc. 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), 1999 Aug, pp. 254-260 |
|
|
A Statistical Theory for Quantitative Association Rules
Yonatan Aumann, Yehuda Lindell Proc. 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), 1999 Aug, pp. 261-270 |
Support Vector Machines |
|
|
A tutorial on Support Vector Machines for Pattern Recognition
Christopher J. C. Burges Bell Labs SVM page Data Mining and Knowledge Discovery, Vol. 2, Number 2, p. 121-167, 1998 |
|
|
A tutorial on Support Vector Regression
Alex Smola and Bernhard Schökopf German SVM page Neuro COLT Technical Report TR-1998-030, Royal Holloway College |
Web Mining & Information Retrieval |
|
|
Searching the World Wide Web
Steve Lawrance and C. Lee Giles Science 280, p. 98, April 3, 1998 |
Online Interaction & Processing |
|
|
Interactive Data Analysis: The Control Project
Joseph M. Hellerstein, et al. The CONTROL project IEEE Computer, 32(8), Aug, 1999, pp. 51-59 |
Sampling |
|
|
Density Biased Sampling: An Improved Method for Data Mining and Clustering
Christopher R. Palmer and Christos Faloutsos CMU Technical Report CMU-CS-99-113 |
|
|
Optimal Sample Allocation for Normal Discrimination and Logistic Regression Under Stratified Sampling
Pages 1 2 3 4 5 Tzu-Cheg Kao, George P. McCabe Journal of the American Statistical Association, 86, 1991, pp 432-436 |
Scalability Issues |
|
|
A Survey of Methods for Scaling Up Inductive Learning Algorithms"
Foster .J. Provost, Venkat Kolluri Data Mining and Knowledge Discovery Journal 3(2), pp 131-169 |
Miscellaneous |
| Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem Mauricio A. Hernández, Salvatore J. Stolfo Data Mining and Knowledge Discovery, 1997, pp. 9-37 |
|
Discovering Robust Knowledge from Databases that Change Chun-Nan Hsu and Craig A. Knoblock Data Mining and Knowledge Discovery, 2(1), 1998, pp. 69-95 |
|
|
Statistics and Data Mining Techniques for Lifetime Value Modeling
D.R. Mani, James Drew, Andrew Betz, Piew Datta Proc. 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), 1999 Aug, pp. 94-103 |
|
|
Detecting Changes in Categorical Data: Mining Contrast Sets
Stephen D. Bay, Michael J. Pazzani Proc. 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), 1999 Aug, pp. 302-306 |
General Interest Stories |