Tuesday, April 15, 2014
Saturday, April 12, 2014
Reading Notes - Unit 13
IIR - Chapter 13
IIR - Chapter 14
IIR - Chapter 16
IIR - Chapter 17
- Text Classification and Naive Bayes
- The Text Classification Problem
- Supervised learning
- Naive Bayes Text Classification
- A probabilistic learning method
- Relation to Multinomial Unigram Language Model
- Formally identical, its a special case
- The Bernoulli Model
- Equivalent to the binary independence model
- Properties of Naive Bayes
- An alternative formalization of the multinomial model represents each document
- Feature Selection
- The process of selecting a subset of the terms occurring in the training set
- Mutual information
- x^2 Feature selection
- Frequency-based feature selection
- Feature election of multiple classifiers
- Mutual information and x^2 represent rather different feature selection methods
- Evaluation of Text Classification
- The classic Reuters-21578 collection was the main benchmark for text classification evaluation
- We can measure recall, precision, and accuracy
IIR - Chapter 14
- Vector Space Classification
- Document representations and measure of relatedness in vector spaces
- Rocchio classification
- k nearest neighbor
- Time complexity and optimality of kNN
- Linear versus nonlinear classifiers
- Classification with more than two classes
- The bias-variance tradeoff
IIR - Chapter 16
- Flat clustering
- Clustering in information retrieval
- States the fundamental assumption we make when using clustering in information retrieval
- Problem statement: Given (i) a set of document a desired number of clusters K and an objective function that evaluates the qua lit of a clustering, we want to compute an assignment that minimized the objective function
- Cardinality - the number of clusters
- The evaluation of clustering
- K-means
- Cluster cardinality in K-means
- Model-based clustering
IIR - Chapter 17
- Hierarchical clustering
- Clustering is efficient but has some drawbacks
- Hierarchical agglomerative clustering
- Single-link and complete-link clustering
- The complexity of the naive HAC algorithm is O(n^3)
- Group-average agglomerative clustering
- Centroid Clustering
- The similarity of two clusters is defined as the similarity of their centroids
- Optimality of HAC
- Single-link
- GAAC
- complete-link
- Divisive clustering
- Cluster hierarchy can be generated top down
- Cluster Labeling
- human users interact with clusters, need labeling
- Implementation notes
- Problems require the computation of a large number of dot products
Wednesday, April 9, 2014
Friday, April 4, 2014
Reading Notes - Unit 12
Ahn et al. - Personalized Web Exploration with Task Models
This paper is about personalized web search, specifically exploratory web search. Exploratory searches are those that go beyond the typical "how many inches in a foot" type searches that seek a simple answer. This paper covers the testing of a tool the authors came up with called TaskSieve. TaskSieve uses relevance feedback to offer the user personalized search.
Pazzani & Billsus - Content-Based Recommendation Systems
This paper is about content-based recommendation systems. These systems are used everyday from web search to Amazon.com as a way to help the customer find other items they may enjoy and/or to help the retailer sell more product. These systems are usually helped by algorithms that analyze a user's prior history, though sometimes they also have the user enter the information too.
Gauch et al. - User Profiles for Personalized Information Access
This paper dove tails nicely with the previous two of this week. It covers the profiling of users. It covers methods for user identification, and other collection techniques. The paper looks at the need of companies and projects to have access to more specific information about their customers and participants. It is interesting that they only briefly touch on the privacy implications of all the interesting facts that you can glean from the user, both implicitly and explicitly.
This paper is about personalized web search, specifically exploratory web search. Exploratory searches are those that go beyond the typical "how many inches in a foot" type searches that seek a simple answer. This paper covers the testing of a tool the authors came up with called TaskSieve. TaskSieve uses relevance feedback to offer the user personalized search.
Pazzani & Billsus - Content-Based Recommendation Systems
This paper is about content-based recommendation systems. These systems are used everyday from web search to Amazon.com as a way to help the customer find other items they may enjoy and/or to help the retailer sell more product. These systems are usually helped by algorithms that analyze a user's prior history, though sometimes they also have the user enter the information too.
Gauch et al. - User Profiles for Personalized Information Access
This paper dove tails nicely with the previous two of this week. It covers the profiling of users. It covers methods for user identification, and other collection techniques. The paper looks at the need of companies and projects to have access to more specific information about their customers and participants. It is interesting that they only briefly touch on the privacy implications of all the interesting facts that you can glean from the user, both implicitly and explicitly.
Tuesday, April 1, 2014
Muddiest Points - Unit 11
It seems that an easier task would be to translate documents and then search them. Rather than to try and dynamically search documents with queries in a different language?
Subscribe to:
Comments (Atom)