|
Andreas Weigend Stanford University Data Mining and Electronic Business Stat 252 and MS&E 238 Spring 2008 Table of ContentsClass 6Contributors:Quinn Slack email removedAlex Gleitz email removed Charles Tripp email removed Jaehyeok Heo email removed Myunghwan Kim email removed Bill Whiteley email removed What is the space recommendation systems live in?Discovery and recommendations are key elements in online commerce.Amazon SurveyAmazon makes 20-30% off its recommendation systems. A survey: What you plan to do at Amazon.com today? 3% response rate. Free text user responses classified by category with multiple categories possible:
Survey Data AnalysisTo get insights, stated user preferences from the survey were compared with user actions:
What other problems have the same underlying structure as recommender systems?All manner of tasks, people or events can be couched as recommendation problems.
Recommendation historyGenerally, items do not have any persistent history. A book on Amazon would be showed the same way to two different people, but an individual does have history. Example: Netflix, where a user is a sum of all his or her ratings. Socially, people listen to their friends - a book recommendation warrants at least a look.Compare Amazon's recommendations with a social network's:
What can $5 do for you? For $5 Acxiom in Little Rock, AR will provide you 350 fields of information that summarizes what the world can see about you. How likely are you to buy a hot tub?... They have that answer :-) Here is an older article about the depth and breadth of Acxiom's involvement in economic, political, and security endeavors, not to mention security problems of their own. What do all discussed tasks have in common?Recommendations touch practically every aspect of our lives. Consequently, it is not surprising that almost any problem can be framed as a recommendation.User ActionsCollecting Recommendation Data From UsersSpeaker: **Toby Segaran** - Currently works at **Metaweb Technologies** (Those on Stanford campus can read his book online here) ![]() Ways of collecting recommendation data: * Voting data
* Consumer Data
Implicit Data
* Forcing the User to Act
ASW Metrics / Cost FunctionGetting to the right cost function is important. We need to evaluate the systems. We need standards to evaluate the system’s performances. Therefore, in this part, we did discuss about the proper cost functions.A/B TestingA/B testing is the testing method that allows us to test the performance of two or more different versions of the page or the system. A/B testers introduce different versions to different groups of people and look at differences in performances. This method can be used for several purposes.Examples
What to measure
References of A/B Testing1. http://www.sitetoolcenter.com/google-adsense-optimization/ab-testing.php2. http://en.wikipedia.org/wiki/A/B_testing 3. http://help.yahoo.com/l/us/yahoo/ysm/sps/start/overview_adtesting.html Clicks!!Single click sessions - Bounce rate and etcThis measures how much time people spend in a certain page. The bounce rate is more important than the number of clicks, if the main function of the web page is providing information. Longer one spends in a certain web page, more interested in the web page one might be. The factors like amount of valuable information and the type of media which is used for presenting such information mostly determine the bounce rate. By observing this, we can evaluate the values of web pages and the systemMaximization of the number of clicks VS optimization of clicksMaximization and optimization are related but different concepts. Sometimes you might need to maximize the number of clicks and sometimes you might need to optimize the clicks. "Pay Per Click" is the one of the situation that maximizing the number of clicks is more important. The number of clicks might not have meaning, even though web pages are mostly evaluated by the number of clicks. If the service provider designs the web page as it force visitors click, then the number of clicks would not have any meaning. Thus, in this sense, optimizing clicks is the fact that we need to consider about.
OthersWhen you build a model of purchasing behavior, you might want to predict the probability that somebody click on the item. And this is also different from maximizing and optimizing the number of clicks. There is not only one metric, but you may need variety metrics according to what you want to look at.Predict the ratingSuppose that here is a new movie that forces you to rate that movie. It is hard to predict you would give 5 stars or 1 star for the movie. This is very difficult metric. There would be errors. It is also hard to predict that even errors increase or decrease when the movie is a really good movie. Even if the rating is forced to watchers, predicting ratings is hard like this. However, in reality, you would not be forced to rate certain items. Thus, the prediction becomes harder. In the real life, if one rates the item, then it already has a lot of meanings in it. In real life, the ratings are done for more extreme cases.Use of rating prediction
Example - Cinematch and Netflix Prize
The Netflix Prize seeks to substantially improve the accuracy of predictions about how much someone is going to love a movie based on their movie preferences. Netflix has a recommender system called Cinematch. It’s job is to predict one’s movie rating based on his or her ratings for other movies. Netfilx thinks it would be hard to improve the system over 10%. Netflix made Netfilx prize which will be given to the team who would make an algorithm that perform very well in rating prediction. It is hard to predict the ratings exactly, but many people are trying to do this. Netflix prize is one of these trials. For more information, you can follow the link and see the rules. Musics - skipping songsIf a person like a song, he or she would keep listen the song. However, if one does not like it, then one would skip the song. Skipping songs is the negative sign for the recommenders. This kind of information can be used for evaluating music recommenders.ConclusionYou must not look at only one metric for the system. Also you should set the metric very well. However, many firms do not do this very well. Suppose that a firm only use bounce rates for its metric. This might give bad result for the firm. One of the ways to reduce the bounce rate is CLOSING the website. To avoid this kind of effect, you need to consider various metrics in proper way.Data sourcesMetadata (data about the data)Metadata is often the first kind of data source to come to mind. In many cases, a correlation between two things' metadata indicates that there is a correlation between those two things. Metadata examples for music (beyond the obvious artist/album/year/genre):
Session contextIn the music example, a recommender could take into account the session context:
The session context could help build a Markov model, which would play songs according to probability given the current song. Active learningSome actions by the user will result in a lot more new information for the recommendation model. For example, if a user has rated all of the Star Wars movies but Return of the Jedi as 5s, the recommender can say with high certainty that the user would also rate Return of the Jedi a 5. Not much new data from which to make new recommendations would be gained. But let's say the user rated the first Matrix movie a 5 and the last one zero stars. The recommender is likely to have little certainty as to how he would rate the second Matrix movie. It can scan through its recommendation model for cases like this and ask the user explicitly to rate The Matrix II, and in doing so fill in holes in its certainty. "Cheating"Prof Weigend noted that sometimes you can just recommend items that the user has already told you he likes. Amazon will often recommend items from your wishlist. Not only are these effective recommendations, but they increase the users' confidence and trust in Amazon recommendations. Efron's bootstrap paperProf Weigend says Bradley Efron's paper on bootstrapping is one of his favorites.
AlgorithmsDistance MeasuresDistance Measures can be used as a simple method of measuring the similarity (or difference) between various elements of a set. Distance Measures can be used to measure many different relationships, from preference similarity to the similarity of two documents or phrases. Distance Measures require a method for computing the distance in a space between two points in that space, and therefore require a norm and a notion of distance within the space on which the norm operates. This type of space is known as a Metric Space. There are many possible notions of distance, and the meaningfulness of each notion varies both by application, and by the choice of dimensions. Useful distances can be computed between many different entities, such as users, items for sale, documents, news headlines, etc. Distances metrics between click streams or pages visited can help to compare users Some great examples of using distance metrics in recommendation systems (with an emphasis on application using Ruby) can be found here. Example: Preference DifferenceThe euclidean norm of the difference between two user's ratings on a selection of movies can be used to compute the similarity of those two users, in terms of their movie preferences. For instance, here are three people's ratings for Iron Man and Smart People:
Example: Linguistic DifferenceOne linguistic distance measure is to take the euclidean norm over a space for which each dimension is the number of times a word appeared in a document. By comparing the number of occurrences of words in two documents, via a norm, one can have a simple but effective method for measuring the difference in word-choice, and possibly the content, of two documents. Bayesian FilteringBayesian filters, using a naive Bayes classifier, can be used to classify items with gradually increasing accuracy as additional samples are added to the dataset. Bayesian classifiers are based around Bayes' rule, and attempt to compute the probability that a given item belongs to each of the data classes, given each independent (or, assumed to be independent) feature of the sample. Naive Bayes classifiers are called naive because they assume that each feature is independent of the other features, an assumption which is not generally true. Despite this strong assumption, naive Bayes classifiers tend to perform quite well on many sets of real world data, even when several features are correlated. The most famous use of such classifiers is spam filtering.
Principal Components Analysis (PCA)
Principal Components Analysis is a method for reducing the dimensionality of a set of data. PCA finds a set of dimensions which are a linear combination of the original dimensions, and these dimensions are ranked in order of decreasing variance along the dimension. Thus, the highest variance dimensions can be selected, and the lower variance dimensions can be ignored, resulting in an efficient dimensional reduction which preserves the maximum amount of variance (when constrained to linear combinations of the original dimensions). PCA can be useful for mapping out the relationships and distributions between users, items, news stories, etc. The largest advantage of PCA is that it can allow data to be viewed in a space with reduced dimensionality. This can be helpful in filtering out less useful or interesting dimensions, as well as visualizing data. However, because PCA generally removes the meaning of the dimensions, it can be difficult to interpret graphs constructed using PCA. Independent Component Analysis (ICA)Independent Component Analysis is a method for separating independent sources from multiple, related data streams. For example, ICA can be used to create separate music and speech from two simultaneous recordings of speech and music. Applications to electronic business include separating contributing factors from multiple trends such as click rates and purchasing behavior, as well as the possibility of separating multiple "shadow users" from a single account (for instance, multiple family members using the same imdb account to rate multiple movies, or buy several books on amazon.com). ICA can be a very powerful technique when appropriately applied, and one of the factors which effects its effectiveness is the particular type of ICA used. Several popular forms exist including linear noisy and noiseless ICA, as well as a number of nonlinear ICA methods.
Markov Models and Markov ChainsMarkov Chains are models of stochastic processes which are particularly well-suited to several electronic business problems. Markov chains model the probability of transitions from a particular state to each other state. For example, a Markov chain can be used to find the probabilities that a user will visit a page, given the previous three pages that they viewed. Other examples include predicting the next product a user might buy, or the next song they will play from a playlist. When the state is not directly observable, the model is referred to as a hidden Markov model (HMM). HMM's are harder to construct than standard Markov chains, but are also applicable to a much wider range of domains.
Set-Instance/Attribute-RelationWhen looking at data from an electronic business perspective, it can be quite useful to think about the different combinations of attributes, relationships, sets and instances have within a particular context. Some examples in the space of websites:
Within a several spaces this type of division can be quite helpful: See below contents and tables E-Business PerspectiveWhat really is shopping?Shopping as Process and Personal Activity
Shopping as Conversation (Dialog)
Shopping as Conversation (Multiple partners)
Shopping as Social Activity
Shopping spaces![]() What kind of technology is used?Integration
Visit Modeling
Example : eStara
Overview of Visit ModelingRelational Modeling
Dynamic Tracking(Reference : Wikipedia)
Components of ModelingDimension of Information Space
Behavioral Component
Location Component
Catalog Component
Other Contributions: In the last class while we were discussing the problem of music recommendations and discovery, someone suggested that Twitter could be used as a tool for this. I came across this article while going through my feeds a few days afterwards and thought it was relevant enough to share: http://www.techcrunch.com/2008/05/12/twitter-for-music/ In short, the article talks about the company Blip and how it provide a way to suggest and discuss thoughts on music through Twitter. - Jaebock Lee |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||