Collaborative filtering

All unapproved Citizendium articles may contain errors of fact, bias, grammar etc. A version of an article is unapproved unless it is marked as citable with a dedicated green template at the top of the page, as in this version of the 'Biology' article. Citable articles are intended to be of reasonably high quality. The participants in the Citizendium project make no representations about the reliability of Citizendium articles or, generally, their suitability for any purpose.

This article is currently being developed as part of an Eduzendium student project. The course homepage can be found at CZ:Special_Topics_2010.
To provide students with experience in collaboration, you are warmly invited to join in here, or to leave comments on the discussion page. The anticipated date of course completion is 13 August 2010. One month after that date at the latest, this notice shall be removed.
Besides, many other Citizendium articles welcome your collaboration!

Main Article

Discussion

Collaborative Filtering Techniques

Collaborative Filtering techniques can be separated into 3 classes:

1. Memory-based(Heuristic) Recommendation Technique

Memory-based algorithms make predictions by operating on data (users, items and ratings) stored in memory. They can be classified into:
(i ) Nearest Neighbor algorithms
(ii) Top-N recommendation algorithms

(i) Nearest neighbor algorithms are the most commonly used Memory based CF algorithms. Users similar to the current user with respect to preferences are called as neighbors. Nearest neighbor algorithms can be further classified into:

User-based nearest neighbor
Item-based nearest neighbor algorithms

User-based nearest neighbor algorithms generate predictions for a given user based on ratings provided by users in the neighborhood. A common approach to user-based nearest neighbor algorithm is:

Weight all users who are similar to the current user. Similarity weighting is done either by using:
- Pearson correlation coefficient between ratings for current user c, and another user, u.
- Cosine-based correlation where in the two users c and u are considered to be two vectors in an m-dimensional space and the similarity between the two is measured by computing the cosine of the angle between them.
Select a subset of the users (neighbors) to use as predictors.
- Neighbor selection is done by finding similar users or users having a similarity weighting above a certain threshold.
Normalize ratings and compute a prediction from a weighted combination of the selected neighbors’ ratings.
Present items with highest predicted ratings as recommendations.

Item-based algorithms generate predictions based on similarities between items.

The correlation of each item i_k with other items is computed.
For each user u_i , the ratings of the items highly correlated with i_k are aggregated.

The table below represents a User-Item Rating matrix for 4 users and 4 movies. Movies 'A-Team' and 'Salt' have more or less similar ratings. Based on this, we can predict the rating X for User C by building a weighted average of User C's other ratings. Since 'Inception' has a higher rating than 'A-Team', we can guess that the rating for 'Inception' is more important. Hence, we predict rating X = 0.25*5 + 0.75*3 = 3.5
The weights are assumed in this calculation.

	Inception	A-Team	Salt	The Last Airbender
User A	4	3	3	2
User B	5	2	3	1
User C	5	3	X
User D	4	4	4	2

(ii) Top-N recommendations
Top-N recommendation is to recommend a set of N top-ranked items that will be of interest to a certain user. Top-N recommendation techniques analyze the user-item matrix to correlate different users or items and use them to compute the recommendations. They are further classified into:

User-based Top-N algorithms
Item-based Top-N algorithms

User-based Top-N Collaborative Filtering Algorithms.

Identify the k most similar users (nearest neighbors) to the active user using the Pearson correlation or vector-space model.
The corresponding rows of the k most similar users in the user-item matrix R are aggregated to identify a set of items, C, purchased by the group.
With the set C, user-based CF techniques recommend the top-N most frequent items in C that the current user has not purchased.

User-based top-N recommendation algorithms have limitations related to scalability and real-time performance.

Item-Based Top-N Collaborative Filtering Algorithms.

Compute the k most similar items for each item according to the similarities.
Identify the set, C, as candidates of recommended items by taking the union of the k most similar items and removing each of the items in the set, U, that the user has already purchased.
Calculate the similarities between each item of the set C and the set U.
Resulting set of the items in C, sorted in decreasing order of the similarity, will be the recommended item-based Top-N list

Item-based top-N recommendation algorithms have been developed to address the scalability problem of user-based top-N recommendation algorithms.

Advantages of Memory-based algorithms:
1. It is easy to implement.
2. It scales well with correlated items.
3. It does not require the content of items, only the ratings are sufficient.
4. New data can be added easily.

Disadvantages of Memory-based algorithms:
1. It depends on human ratings.
2. Correlations are skewed when data is sparse.
3. Time and memory requirements scale with the number of users and ratings.
4. It cannot recommend for new users and items(Cold start Problem).

2. Model-based Collaborative Filtering Technique

Model-based algorithms use the collection of ratings to learn a model, which is then used to make rating predictions. Model-based CF algorithms include Bayesian models (probabilistic) and clustering models.

Advantages:

Model-based CF technique addresses the shortcomings of memory-based CF algorithms such as scalability and sparsity.
It also improves prediction performance.

Disadvantages:

Model-based CF technique improve scalability at the cost of prediction performance.
Model building is expensive.

3. Hybrid Collaborative Filtering Technique

Hybrid Recommendation Systems are another class of Recommendation Systems that combine Content-based and Collaborative-filtering techniques so as to overcome the limitations of either approach. See also: Recommendation system

Challenges of Collaborative Filtering

Data sparsity (421425 all)
Scalability
Synonymy
Gray Sheep
Shilling
Privacy and Security
Trust 4321

Collaborative Filtering in e-commerce

Collaborative Filtering algorithms are employed in websites of several e-commerce businesses such as:

Amazon.com
Amazon.com is an online retailer that uses recommendation algorithms to personalize each customer's online store. Amazon uses an item-to-item collaborative filtering algorithm which is iterative;it correlates each customer's purchased and rated products to similar products,and aggregates these similar products to recommend the most popular products.

For each item in product catalog I₁
  For each customer C who purchased I₁
    For each item I₂ purchased by customer C
      Record that a customer purchased I₁ and I₂
  For each item I₂
    Compute the similarity between I₁ and I₂^[1]

If the number of customers is M and the number of products is N, the computation's worst case is O(N²M). However, Amazon computes this similar-items table offline unlike traditional CF algorithms whose online computation scales with the number of products and consumers. This offline computation makes the Amazon algorithm more scalable even for large data sets, hence improving the quality of recommendations.

Netflix.com
Netflix.com is an e-commerce site that offers online video-streaming and BluRay and DVD rentals by mail. Netflix uses a similar recommendation system as Amazon's to generate recommendations to a user based on the videos or movies watched and rated by him or her. In 2006, Netflix announced an open competition for a collaborative filtering algorithm which would improve Netflix's own algorithm, Cinematch by 10% in predicting recommendations for users based on the user's past preferences and ratings. The $1 Million grand prize was won by the team 'BellKor's Pragmatic Chaos' in 2009 for coming up with an algorithm that beat Cinematch by 10.06%

References

↑ Amazon.com Recommendations

[1] Amazon.com Recommendations

[1]

Collaborative filtering

Contents

Collaborative Filtering Techniques

1. Memory-based(Heuristic) Recommendation Technique

2. Model-based Collaborative Filtering Technique

3. Hybrid Collaborative Filtering Technique

Challenges of Collaborative Filtering

Collaborative Filtering in e-commerce

References

Navigation menu

Collaborative filtering

Collaborative Filtering Techniques

1. Memory-based(Heuristic) Recommendation Technique

2. Model-based Collaborative Filtering Technique

3. Hybrid Collaborative Filtering Technique

Challenges of Collaborative Filtering

Collaborative Filtering in e-commerce

References

Navigation menu

Search