This is a mirror of the 2008 course wiki for Data Mining and Electronic Business ( Some links may be unavailable. For the current wiki, see

Andreas Weigend
Stanford University
Data Mining and Electronic Business
Stat 252 and MS&E 238
Spring 2008

Class 3

The MP3 recordings for this class can be found at
Video lectures for the class can be found at

Introduction and Logistics

Class 3's topic is social networks. It focuses on the most exciting data source --- relationship data. All types of data sources can be found in class 2 notes.
  • HW2 discussion---insights gotten from this homework
  • HW3 discussion---group efforts; If you need to be in a group of 3 or 4, please click here and add your name.
  • Video lectures of the class will be available for downloading from in WMV format. They will also be uploaded to google video, so that users can download them in MP4 format and watch on their IPODs. If you are interested in having the videos in other formats, such as Flash FLV and have comments on any of the formats, please send them to email removed as soon as possible.
  • Please comment on the blog about the Two Data Revolutions. Professor Weigend would love to see your comments.

What is a social network?

A social network is a "complex network or mix of irregularly connected entities". More formally, a social network is the set of relationships, both personal and professional, between individuals. Often times, social networks are used as a measure of "connectedness" as they represent the collection of ties between people, as well as the strength of those ties.

Some examples of social networks include:

  • Email
  • IM (or instant messenger) -- IM is a computer program that allows individuals from all over the world to have near-instantaneous communication with each other. Instant messenger can take a wide variety of forms, from text such as AOL Instant Messenger (i.e. AIM), Yahoo! Messenger, Google Talk or MSN Messenger to videoconferencing such as Skype. If you are not already participating in such a service and want to get connected to your co-workers, family, and friends make sure to visit on of the following sites for a free download:,,,, There are plenty other instant messenger sites out there, so if one of these sites does not fit your name simply search for instant messenger services and you are bounded to find one which suits you best.
  • Phone Calling Patterns
  • Yahoo! Group – Yahoo! Group is a free online service offered by Yahoo! that offers a platform for individuals with a common interest to meet and share stories, information, and photos. To learn more about Yahoo! Group and read stories from other users visit
  • Flickr – Flickr is an online photo sharing service offered by Yahoo! that allows users to share photos and videos with the rest of the world (
  • Twitter – Twitter is a free social messaging utility and micro-blogging service that allows individuals to stay connected in real-time. It works by allowing users to send updates up to 140 characters long to the Twitter website. The updates are then displayed on the user’s profile and instantly delivered to other users who have signed up to receive them. If you are worried about the privacy issues associated with a program like this, don’t be. You can restrict who receives your updates by changing the default setting to those individuals in your circle of friends ( Other similar products include frazr and Pownce. If you would like to stay up-to-date on Dr. Weigend’s whereabouts and happenings visit his twitter page at
  • PageRank – PageRank is a service provided by Google that assigns every webpage a numerical value which signifies the importance of that page to the web. Read “Google’s PageRank Explained and how to make the best of it” by Phil Craven ( to find out how Google determines the importance of a page to the web.

If you want to read more about social networking sites, a good article to consult is "Social Network Sites: Definitions, History, and Scholarship" by Danah M. Boyd from the University of California-Berkeley and Nicole B. Ellison from Michigan State University. (

Also, if you want to find the most popular social networking sites in the past few years visit and/or

Now that we have discussed some examples of social networks, how can we visualize them? Why would we even want to try to visualize them?

“A sort of social alchemy happens. You put this stuff out there and you don’t know what happens. You might make a friend, get a job or a date.” ~ Biz Stone, Co-founder of Twitter, Inc.

  • First we need to identify the differences between observed ( “realized”) and the true underlying networks. This distinction can be seen by considering the following probability example:
  • We want to get away from talking about throwing dice and their relation to known “underlying” probability distributions to the underlying distributions of networks.
    • This is in hopes of obtaining a basic understanding of the diversity of these networks in business, society, nature, etc.
    • It is very important to get at the underlying network because this is where the potential of social networking is captured.

  • We want to understand the properties of the social network so that they can be USEFUL. But useful for what?
    • Abstracting networks to make PREDICTIONS
      • In the Facebook application, Friends for Sale, possible predictions could be: Who will get bought next? What price will a certain person be bought for? Is there NO chance of ever getting bought?
    • Abstracting networks to THEORIZE
      • In the Friends for Sale example, are there differences between females and males in being bought? What insights can be gained in gender behavior by analyzing this network? From this, can better theories be formed that are related to e-commerce?
    • Abstracting networks in order to VISUALIZE
      • Eleven examples were shown in class, only a few are covered in these notes. If interested, all the images can be found in Arun Sundararajan’s “Modeling Complex Networks For (Electronic) Commerce”, ACM EC’07- June 12,2007.
      • Be VERY careful about what are ARTIFACTS of the social network and the TRUE underlying structure of the social network.
        • For instance, one can flip nodes(explained below) in such a way that relationships are very suggestive so that it is clear for anyone to see; at the same time it can be flipped so that this "suggestive" relationship is hidden.
        • More often then not, visualization of social networks can be more problematic then insightful. However, we saw in class some examples that highlighted a few key points. collaboration.PNG

  • highschool.PNG Example 2: Highschool Friendship Network
    • This brings up the question of "Is friendship bilateral, i.e. symmetric, or one-sided?"
    • "Binary" is a weird way to describe a relationship between two network nodes ( in this example, people)
    • For instance, is the friendship “ON" or "OFF". But this does not capture the full picture- rather, we would like to describe the network connections in terms of gradients.
      • Example: What kind of friendship is there? Academic, Romantic, Casual, Childhood, etc.
    • Discussion excerpt from class:
      • Student's remark: "In highschool I liked a girl but she did not like me back."
      • Professor's remark: "Did she know about it?"

  • companies.PNG Example 3: Companies linked by news stories
    • This illustrates how social networks are not just concepts related to dating sites and Facebook but also, for instance, for how companies form networks.
    • Again, we see that there are binary links but the strength of the connection should also be captured in some way. One way to do this is to put weights on the various connections.
    • We see that visualization makes the social networks like pretty random at times. This is because there are not 27-dim printers that can capture the relationships in the various networks.

  • Having seen these visual examples of social networks, we want to be able to describe them:
    • Terminology necessary for describing visualizations of social networks:
      • Graph: the collection of nodes and edges, the entire visual
      • Node: like the data point, the subject of the social network being drawn
      • Edge: the links that connect nodes to each other
      • Directed/Undirected : A directed graph shows one-way connections between nodes, whereas an undirected graph illustrates a mutual, bilateral connection.
        • Example: A directed graph would be one where Jackson likes a girl in highschool but she does not know. An undirected graph would be one in which Jackson likes a girl in highschool and she likes him as well.
        • As evident in the above example, the links of the two types of graphs are very different in nature.
      • Degree: this is the number of edges that a node has
        • In the highschool friendship network, pick a single node and see how many edges it has
        • Degree of distribution: systematically go through the network and create a histogram that records the frequency of the number of edges for each node
      • Component: This tells whether you can get from one node to another
        • Some pairs are isolated from the rest, they do not "communicate" with any of the others at all
        • Clustering in graphs: which areas appear to have nodes bunching up to each other
      • Weights: places a certain value for the strength of the connection between the two nodes that are being linked
        • For example, news feeds in Facebook had an option where the user could explicitly state in their preferences whether they wanted to hear less from a particular person. This "up or down" factor is an example of placing a weight on an edge.

Why Do People Study Social Networks?
  • To Learn:
    • Learning Goals:
      • Understand differences between traditional predictive modeling and predictive modeling with networked data
      • Describe example techniques and provide pointers into the literature to learn more
      • Illustrate with some experiments and successful applications
      • Learning: Considerable Power for predictive inference is inherent in the structure of many networks

  • To Predict:
    • Prediction Goals:
      • Node attribute value prediction
      • Node classification
      • Link attribute value prediction
      • Predicting link existence

    • Examples of Prediction
      • Fraud detection
      • Link-farm identification
      • Targeted marketing - When marketers target their sales to the neighbors of those who have already bought the product, they have higher success rates. An example of this can be found in the study conducted by Provost and Volinsky on a network-based marketing scheme of telephone companies which employs the technique of targeting product advertisement towards individuals who have had contact with a previous customer.


Hill, S., F. Provost, and C. Volinsky. “Network-based Marketing: Identifying likely adopters via consumer networks.” Statistical Science 21 (2) 256–276, 2006)

In this example 1.) Non-NN are those individuals who are not considered to be nearest neighbors and are targeted through the old model of marketing; 2.) NN 1-21 are those individals who are targeted both through the old model of marketing and the network-based marketing scheme; 3.) NN 22 are those individuals who are targeted using network-based marketing but not the traditional model; and 4.) NN not targeted by either marketing scheme. Thus, we can see that the network-based marketing scheme increases sales rates by a factor of 4.82 among those who would be targeted using both marketing models.

      • Counterterrorism analysis
      • Patent analysis
      • Epidemiology
      • Bibliometrics
      • Movie classifications
      • Firm/Industry classification

Marketing and Virality

How can a company make money at the different levels of a network?

  • Page Level

    • Google AdSense

      • Delivers relevant, targeted ads that match the site's content
      • external image nameadog_p1.gif

    • Model: Content
    • Action: Show Ads
  • Visit Level

    • Google AdWords
      • Delivers relevant ads according to the search keywords
    • external image b4.gif
    • Model: Intention, situation
    • Action: Session-based marketing

  • The per user business model

    • CLTV = LT * Gross margin/month - CAC
    • CLTV
      • Customer lifetime value
      • How much is each customer worth?
    • LT
      • Customer's lifetime in months
    • LT * Gross margin/month
      • How much do I make on each customer?
    • CAC
      • Customer acquisition cost
      • How much does a new customer cost?
    • simplified model, doesn't take into account the discount rate or retention costs, etc...

  • Virality - Defined

  • external image light-virus-1.jpg
    • "How is your exponent doing?"
      • Exponential growth with exponent > 1
      • e.g., one user brings in two other users in one time period, who each in turn bring in two more in the next time period
      • Assuming only new users (brought in during the latest time period) recruit,
      • Note that the world's population as of March 2008 is approximately 6.65 billion (source)
    • "The involuntary adoption of a product" - Steve Jurvetson
      • Example: ICQ in the early days of messaging. Someone signs up with ICQ and tells his or her friends about instant messaging. They just have to try it out and have no choice but to use ICQ.
    • Why is virality such a big deal?
    • It removes the CAC for huge savings in the per user business model!

  • Other virality information (extra)

  • The Four Viral App Objectives (Great Resource)
  • What determines virality?
    • Distribution
      • How many people will an infected host make contact with on average?
    • Infection
      • How likely is a person to become infected after contact with an infected host?
    • Distribution * Infection = R-zero
    • R-zero (viral reproduction rate from medicine) = # of people a host infects while the host remains infected
    • The R-zero is the exponent!
  • The Four Viral App Objectives
    • Increase the percentage of active hosts (infected hosts who actively make contact with uninfected individuals)
    • Increase the contact rate for active hosts (# of uninfected individuals an active host comes in contact with per time period)
    • Increase the duration of each active host's contagious period
    • Increase the infection conversion rate (the chance that uninfected individuals become infected)

Discussion of what metrics we want and what data we need

After creating a Facebook application, the creator may want to measure the application's success. Depending on the goal, there are many metrics which can be examined to determine the popularity or success of a particular application. To name a few:
  • As discussed below, some applications such as FriendsForSale (FFS) use a virtual economy, and the creators may be concerned with inflation in this economy. See below for a more detailed discussion.
  • Active users cause messages to be generated and sent to others. Does this make others more active?
  • A more basic breakdown of users helps the creators determine which market they are in. For example, they may want to explore gender, age, socioeconomic status, and school affiliation of users. For example, are Stanford students more likely to have FFS installed than students at Harvard?
  • The long term effect of an application is also an important metric. If people meet through an application, how does their underlying relationship end up? How do we measure when a user loses interest and the application becomes "stale"?
  • Is the user willing to broadcast his or her activity in the application on the news feed? Also, do they trust the application - which boxes do they check when installing the app?
  • Is the application more likely to grow through internetwork virality or transnetwork virality?
  • The creator may also want to track demographic data on the users. For example, how does age, gender, or location relate to virality? How does weather or time of day affect the likelihood of a new user accepting the invitation to install the app?
  • Revenues and page views are also important, particularly in relation to advertising. How much money is your application actually making you?
  • Conversion rates are also provide important insights. We may want to know the percent of existing users who invite new ones, or of those invitations, how many are accepted.

This wiki has code that can be added to a Facebook app to get various daily metrics. This can both help you if you want to create an app and measure its success, or may just be interesting to see which metrics can be easily obtained.

In order to rate an application on the above metrics, we need data from user activity to answer our questions. For example, when is a notification of activity in the application sent out? If a person uninstalls the application but then reinstalls it, what is his or her first action after the reinstall?

FriendsForSale has 7 millions users currently installed, with 20,000 uninstalls per day, and 600,000 active users per day. They currently have 50 million users in their database, gathered from invitations sent out.

The most powerful part of looking at metrics and data for the Facebook applications is the ability to do cheap and easy experiments after having defined these metrics. Being online, experiments provide quick feedback, and the application creators can determine, for example, which color button encourages the most active users.

Another point of interest - Stanford offers a class called "Create Engaging Web Applications Using Metrics and Learning on Facebook."
For the syllabus, click here

The Use of Facebook's Social Data


  • How is a Facebook application different than a regular web app?

    • When a user requests the application, the application gets his or her Facebook data as well.
      • Relationship data for every request
      • Complete list of friends
      • Detailed demographics
        • Marital Status, interests, location, affiliations, books
        • Everything about a user’s friends
      • Where the request came from
  • How is a Facebook app different from a different social network app?

  • How can one use this data?
    Parking Wars' Interface
    Parking Wars' Interface

    • Natively integrate the data
    • Blur lines between uninstalled and installed users (because you get the data of a user's friends)
    • Expose interaction points
    • Example: Parking Wars
      • The application can convert a non-installed user through interaction with his or her installed friends.
      • Natively integrate all the user's friend data, then encourages the user to interact with those friends
    • Example: PackRat
      Johnny's expensive turntables are free for the taking.
      Johnny's expensive turntables are free for the taking.
      • You can steal stuff from a friend who doesn't have the application, and then he is notified that you stole from him, and is encouraged to install the application.
      • This application also natively integrates one's friend data.

  • What is Friends for Sale?

    • "Hot-or-Not with a market economy"
      Johnny's owner can push his pet around.
      Johnny's owner can push his pet around.
    • "The most cynical dating application in the world"
    • What is ownership?
      • You have a list of "pets" on your application page
      • You can give a "pet" a nickname
      • You can make your pets do certain things (such a poke)
    • How do you get money?
      • Logging in
      • Being purchased
    • How do you spend money?
      • Purchasing pets
      • Purchasing gifts
    • Why?
      • "Owning" your friends
      • Meeting new people

The Virtual Economy in FriendsForSale

As mentioned earlier, FriendsForSale (FFS) uses a virtual economy. Since all the "money" is obviously fake in the virtual world, the scarcity in the economy is entirely artificial--there's no technological reason why FFS can't inject more money into the system or allow one person to be owned by multiple people.
Every virtual economy must have a currency faucet (a way for the creators to inject money into the system) and a currency sink (a way for the creators to take money away so that inflation does not become rampant).
There are several ways in which the FFS creators can tweak the amount of money circulating in the economy. The main method for injecting money into the system is by simply giving it to its users--every four hours, a user can log in to the system and receive $2000 in free money.
To take money away, FFS allows users to buy gifts--more suggestive/interesting toys cost more money (e.g. a weird ninja toy costs $500; a heart balloon costs $250,000). However, the main currency sink is FFS's system of transactional taxes--every time someone buys a pet, a tax is charged. The tax is a knob that can be adjusted to the whim of the creators. In order to keep inflation under control, they attempt to keep the average amount of currency per user in the economy roughly constant.

An illustration of money flowing into the economy.
An illustration of money flowing into the economy.

Interestingly, profiting in FFS does not work exactly like it would in a stock market. Every time a user is sold, the user gets half of the profits. The site's creator compares this to a line of girls vs. guys standing outside a club—people can get rich either by being popular or by being a good investor.
Although the currency is not real, there is some inherent value in money (being able to control people, for example). In fact, many people are willing to pay lots of real money for lots of fake money.

For further reference on virtual economies, these links may be helpful:
Castronova, Edward. "Virtual Worlds: A First-Hand Account of Market and Society on the Cyberian Frontier".
Presentation by Sam Lewis at AGDC, 9/7/06. "Economic Theory and MMOGs".
Raph Koster’s blog is also good source for random ruminations on the topic of virtual economies:

Finally, is a fantastic resource for new papers on virtual economies.

Web Credibility (talk by Enrique Allen)

Enrique Allen spoke to the class about his research in web credibility. Enrique works with the Stanford Persuasive Technology Lab, which focuses on understanding how computing products can be designed to change what people believe and what they do. This study of computers as persuasive technologies is coined "CAPTOLOGY".

Most of Enrique's past research has focused on credibility for general web sites, and he is now beginning to study social networks and web applications as well. He shared that the key ingredient of what makes a website or application credible is people. Trust online is becoming more and more about people and our perception of them based on the general factors of trustworthiness and expertise:


He also touched on the prominence-interpretation theory, which stipulates that a credibility assessment is possible only when the user notices something or makes a judgment about it. The second slide below shows some examples of elements that can increase a credibility assessment.



Enrique also briefly spoke about the growing importance of Status Message Updates (SMU), for example in Facebook. Communicating status is essential to attention, which is our most valuable source of capital. As such, SMU can be a very powerful outlet for persuasion if used correctly. You can visit this page for more of Enrique's thoughts on this subject.

Finally, the following are additional resources on the area of web credibility:

The Future of Facebook

David Kirkpatrick, who writes the column Fast Forward at Fortune Magazine, wrote a column last week about Facebook, discussing its usefulness to adults who join after college. He e-mailed Professor Weigend to ask, “how big a deal student users think Facebook is likely to be over time--and how broad its ramifications are likely to be.” There seemed to be a consensus that people would be using some social networking site for the rest of their lives, but not necessarily Facebook. One student made a comment, salient to David’s column, that his usage of Facebook had become less frequent as he and his friends graduated from college.

Other responses ran the gamut from some who said they would be using Facebook forever to others who said they were likely to switch when the next big thing comes along. Those arguing in favor of Facebook’s longevity pointed to the time users have invested in setting up their profiles and building networks, suggesting that the transition costs would deter people from switching to another social networking site. New sites need to build critical mass before being useful, which will further deter people from switching. However, those who were more skeptical about Facebook’s long-term prospects pointed out that the transition between sites is likely to get easier over time with the increased prevalence of import/export capabilities on sites.

Some bloggers have suggested that Google’s AppEngine and OpenSocial are likely to threaten Facebook:

App Engine, Facebook Platform, OpenSocial, and the Future of the Web (O'Reilly Radar)
Why Facebook's future is not so bright (Computerworld)

Initial Contributors:

Tom Bankston
email removed
Shirley Bao
email removed
Jaebock Lee
email removed
Janine Molino
email removed
Nelson Ray
email removed
Elizabeth Reinoso
email removed
Valerie Smith
email removed
Eric Sun
email removed
Ross Wait
email removed

Other Contributions:

I stumbled upon a video lecture given by Jon Kleinberg on some insights and issues about Social Network Data. Thought I'd share it with the class.
Here is the link:

-- Yi Chai