Breaking Through the Filter Bubble — How to Approach Content Personalization

Featuring Recipe Finder
25/02/2014

Breaking Through the Filter Bubble — How to Approach Content Personalization

There are 2.4 billion Internet users in the world. Two million blog posts and 60,000 new websites are added to the web every day. As of December 2012, there were more than 634 million websites, and more than 51 million are added every year.

The problem of information overload is, like any problem, just one side of a bright new opportunity. As the quantity of news and information grows by the hour, we need new means and approaches to organize and make sense of it all. It becomes increasingly harder to find what matters to us. Many tools use content filters to try to solve this problem, which media theorist Clay Shirky once described as “not information overload, but filter failure.”

Dissecting Traditional Approaches

The first and most common way to determine the significance of an article is its social rating. This is determined through an advanced technique called Collaborative Filtering, which collects taste preferences or personal information (such as language, country, etc.) from many users and uses that data to make automatic predictions. However, this approach causes a widely-known problem in the area of content personalization: the emergence of the ‘Filter Bubble.’

When algorithms selectively guess what information a user would like to see based on information about the user (such as location, past click behavior and search history), users become separated from information that disagrees with their viewpoints. This effectively isolates them in their own cultural or ideological bubbles. Prime examples are Google’s personalized search results, which Maria Popova describes here, and Facebook’s personalized news stream, which you can read about here.

This leads to a significant number of news articles that becomes invisible to the user, sometimes called the Long Tail.

Moreover, selecting news based solely on its rating value increases the risk of showing irrelevant news, and articles with unfairly high ratings may distort the true picture of a user’s preferences. Unfairly high ratings happen when one news source is much more popular than others, and thus its articles get more social network likes. As Facebook doesn’t provide metrics on sites’ traffic to balance it out with user activity feedback, it determines content popularity by giving a higher rating to articles with more likes.

Methods for Solving the Long Tail and Filter Bubble

The most common way to break a Filter Bubble is to use a variety of mathematical algorithms, such as cluster analysis, to detect Web classifications (defining news topics) and prepare recommendations based on a user’s preferences (in a tag cloud). Unfortunately, in most cases, this only serves to split the “bubble” into subgroups, creating many small bubbles.

To structure them properly and avoid recommending similar news articles (or worse, duplicate articles) – which happens from time to time in the Web – there is a need to detect the proximity of topics within these subgroups. Most personalization engines then attempt to combine similar topics into common segments, applying auto-classification methods that determine the dominant topic tags. However, this omits less important tags outside of the main topics.

In fact, while auto-classification does determine the similarity of tags, it does not detect the similarity of the tag groups, preventing it from going beyond the “tag bubble.” The system determines new tags that are associated with a user’s cloud of interests, but does not show popular tags from similar topics.

Characteristics of Effective Content Personalization

One approach to effective content personalization is called ‘the classification of trends,’ and is based on the principles of identifying the most significant relationships between keywords, creating a unique chain of keywords called a trend. A trend contains one or more keywords from Web content, and determines specific subtopics. The main characteristic of a trend is its use of dynamic chains or keywords, with positive (growing) or negative (fading) conditions over a specific period of time.

Trends are determined by analyzing content in the current news state of the daily Web, and at their most basic form, the most relevant daily news topics. If a recommendation engine calculates the thematic proximity of trends, then it can auto-classify them into trend segments, so that similar sub-topics are put in the same segments. This auto-classification of segments splits Web content on various major topics.

A recommendation engine that applies this classification process on trends (and not tags) solves two major personalization problems:

1. It solves the Long Tail, making news recommendations from different segments possible.
2. It solves the problem of thematic proximity, making sure that similar or duplicate news is filtered out.

Content Personalization of the Future

The end of Google Reader meant the beginning of a lot of new, similar services. And as information becomes increasingly dense, consumers deserve to get the news that they want to read – not the news an algorithm thinks they want. All if takes is a content personalization algorithm that can solve the challenges of the filter bubble and long tail.

Slawa Gorobets is the CTO and lead developer of SNATZ

Read more: http://insights.wired.com/profiles/blogs/breaking-through-the-filter-bubble-how-to-approach-content#ixzz2uKC8Bs74

Comments are closed.