EE 380L : Practicum in Data Mining

Spring 2003
Prof. Joydeep Ghosh
 

Reading List
EE 380L - A Practicum in Data Mining
Spring 2003

Notices

Paper Selection Policy


I. General Reading (but not for class presentations) supplement to the 7 papers in your course packet.

 

  1. Information retrieval on the web
    Mei Kobayashi and Koichi Takeda
    ACM Computing Surveys, vol.32, no.2, 144-173, 2000
  2. Data mining for hypertext: A tutorial survey S. Chakrabarti
    ACM SIGKDD Explorations, 1(2), 1-11, 2000
  3. Impact of Similarity Measures on Web-page Clustering
    A.Strehl, J. Ghosh and R. Mooney
    Proc. AAAI workshop on AI for Web Search, K. Bollacker (Ed)
    TR WS-00-01, AAAI Press, July 2000, pp. 58-64
  4. Data Preparation for Mining World Wide Web Browsing Patterns
    Robert Cooley, Bamshad Mobasher, and Jaideep Srivastava
    Knowledge and Information Systems, V1(1), 1999
  5. An Internet-enabled Knowledge Discovery Process
    by Alex Buchner, et. al., MINEit Software Ltd., 1999

II. Hyperlinks

  1. Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery
    Soumen Chakrabarti, Martin van den Berg, Byron Dom, WWW8
    Selected: Tabassum
  2. Random Walks with "Back Buttons"
    Ronald Fagin et. al., Proc. 2000 ACM Symposium on Theory of Computation
  3. Stochastic models for the Web graph
    R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and Eli Upfal Proc. of the 41th IEEE Symp. on Foundations of Computer Science.2000
    Selected: Rajalingam Alagesan : 3/4/03
  4. "Winners Don't Take All: Characterizing the Competition for Links on the Web,"
    D. Pennock, G.W. Flake, S. Lawrence, E.J. Glover, C.L. Giles Proceedings of the National Academy of Sciences, 99(8),5207-5211, April 2002.
    Selected:Kunal
  5. Learning Probabilistic Models of Link Structure
    Lisa Gentoor, Nir Friedman, Daphne Koller etal. JMLR. 3(Dec) 2002

III. Information Retrieval

  1. Hierarchical Bayesian models for applications in information retrieval
    David Blei, Michael Jordan and Andrew Y. Ng Bayesian Statistics 7, (Oxford University Press) 2003
    Selected: Andromache Howe : 3/4/03
  2. A Probabilistic Framework for the Hierarchic Organisation and Classification of Document Collections
    Alexei Vinokourov and Mark Girolami
    BUBL journals: Information Processing and Management, 2002
    Selected: Matt MacMahon : 3/6/03
  3. Latent Semantic Kernels
    Nello Cristianini and Huma Lodhi and John Shawe-Taylor
    Journal of Intelligent Information Systems, 18:2/3, 127-152, 2002
    Selected: Hyuk Cho : 3/6/03
  4. Learning Approaches for Detecting and Tracking News Events
    Y. Yang et. al., IEEE Intelligen Systems, 14(4):32--43, 1999
    Selected: Kunal : 3/18/03
  5. Text Classification in a Hierarchical Mixture Model for Small Training Sets
    Kristina Toutanova, Francine Chen, Kris Popat, and Thomas Hofmann
    Proceedings of the Tenth International ACM Conference on Information and Knowledge Management, CIKM 2001
    Selected: Vidya Narayan : 3/18/03
  6. Topic Distillation at TrecWeb2002
    Tsinghua, City Univ. London, IBM Haifa.
    Selected: Abhinav Sharma 4/17/03
  7. Name Page finding ate TrecWeb2002: Top3 ( Tsinghua, CMU, Glasgow )

IV. Contents + Links

  1. The missing link - a probabilistic model of document content and hypertext connectivity
    David Cohn and Thomas Hofmann, NIPS-13, 2001
    Selected: Hyuk Cho :3/27/03
  2. Categorization of web pages and user clustering with mixture of hidden Markov models
    A.Ypma and T.Heskes WEBKDD'02 pp 31-43
    Selected: Matt MacMahon : 3/27/03
  3. The impact of site structure and user environment on session reconstruction in Web usage analysis
    B.Masand , M.Spiliopoulou, J.Srivastava etal Working Notes of the Fourth WebKDD Web Mining for Usage Profiles, Workshop at KDD, pp 115-129, 2002
    Selected: Alexander Y. Liu : 4/17/03
  4. Stumme, G., Hotho, A., & Berendt, B. (2002). Usage Mining for and on the Semantic Web. In Proceedings of the National Science Foundation Workshop on Next Generation Data Mining, Baltimore, Nov. 1-3, 2002. Selected Tabassum : 4/22/03
  5. Shaping the Web: Why the politics of search engines matters
    Lucas D. Introna and Helen Nissenbaum.
    Selected: Vidya Venkat : 3/20/03

V. Personalization

  1. LumberJack: Intelligent Discovery and Analysis of Web User Traffic Composition (2002)
    Ed H. Chi, Adam Rosien, Jeffrey Heer WebKDD02
    Selected: Rajalingam Alagesan : 4/22/03
  2. ]Efficient and Anonymous Web-Usage Mining for Web Personalization
    Cyrus Shahabi, Farnoush Banaei-Kashani
    Selected: Abhinav Sharma: 4/24/03
  3. A critical View of Recommendor Systems
    Andreas Mild and Martin Natter Working Paper No.82 July 2001
    Selected: Alexander Y. Liu: 4/24/03
  4. Generative Models for Cold-Start Recommendations
    Andrew I. Schein, Alexandrin Popescul, Lyle H. Ungar and David M. Pennock
    SIGIR-01 Workshop on Recommender Systems
  5. PVA: A Self-Adaptive Personal View Agent
    Chien Chin Chen, Meng Chang Chen and Yeali Sun
    Journal of Intelligent Information Systems, 18:2/3, 173-194, 2002