| Andreas
Weigend Stanford University Data Mining and Electronic Business Stat 252 and MS&E 238 Spring 2008 Note: thanks to James Mao, we have a mp3 version of the first class (33MB). it is at http://www.weigend.com/files/teaching/stanford/recordings/WeigendStanford2008Class1.mp3 Table of ContentsClass 1This wiki is designed to note and extend the first lecture of Data Mining and Electronic BusinessSee also last year http://aweigend.wikispaces.com IntroductionData is cheap, or approximately (asymptotically?) free. Data RevolutionsThere have been, roughly, three data revolutions: 1) Online Data Collection Companies began to realize the potential of collecting data. Amazon.com was a big first mover in this revolution.
Users enjoy sharing information about themselves, and this information is mineable, especially the links. Facebook - how much personal information people share NavTech - Sold to Nokia, who understood the value of good map data; $200MM to start, sold for $8.5Bn. Why pay for data, when people are willing to give it for free? The question is how to create that incentive for people to provide the data? 3) Consumer Data Revolution Companies can employ economic models to these emerging data dynamics. New consumer data revolution - now, the user's in the center. They are now adding economic systems to data. Users are beginning to realize that the data they spread has value, and they want to be compensated for this.
![]() Roughly organized (1) 20 years ago (2) 10 years ago (3) 5 years ago (4) Right Now Communication is (essentially) free1. Data collection - Change in time scale; collection time shrinks from months to minutes as the collection process becomes automated. 2. Experiments - Easy (and cheap) to run side by side experiments with web pages. 3. User contribution - Architectures of participation - users create both format and content instead of being "handed tablets" by editors. 4. User interaction - Users can now "connect the dots" by combining existing elements. Data - If people are paid for giving data, the data might not end up being truthful. The better way is to provide an incentive to better their lives by using the data they give. Sources1. Wall Street
Metrics
1. Trading models -
3. User centric - (Think Facebook) - metrics move to the point of engagement and away from the company-centric approach. 4. Relationships - This is the next step. We are not quite there yet. How can we get there? Applications1. “Idea” - products and services, not necessarily related to the web.2. E-Business - The company is in the center. 3. Me-Business - Anti-copernican - the user is in the center. 4. We-Business - Focus on the community - interactions, relationships, networks. Recommendations1. Expert
Conversations1. None - Marketing agencies collect one-way data on opinions and preferences. 2. Push / Targeting - behavioral targeting Ex) TV
3. Discovery (Pull) - people seeking their own content 4. True conversations - C2C - consumer to consumer Companies mentioned in this section:
PART 2• Data mining (insights | data) → Data mining (data | problem)
The 3 flavors of Collective Intelligence 1. Decomposition (parallel execution, similar to MTurk, results are added together) 2. Portfolio (predictive markets). Later we'll have a class about this. 3. Immersion (people creating the architecture of interaction) Data sources
Data economics Production Proprietary → Peer-production ?? Data strategy? Topics of Interest: Maps- Ryan's Mashup
Collective Intelligence - Amazon's Mechanical Turk. Small example of an endeavor fuelled by MT: SheepMarket Facebook - how to use social data Digital Network Economyproduction costs $highdistribution costs $0 In other words: the cost of the first item is high, the cost of duplicating and distributing is zero. ex) Walmart mandates RFID for tags
Powerpoint slides - Set 9 : New business of Consumer Data - Who pays whom? Slide 17. Consumer Data in the Digital Networked Economy Economics of bits. prices have dropped by 5 orders of magnitude over the last 20 years Storage is free, communication is free. Communication is the heart of this economy. It used to be that distribution was somethign people got paid for. Eg. the chinese TV factory who wants to sell to US Now. distribution is easy because of standards It is now easier to collect data than beforehand. RFIDs: WalMart believes they save more than the revenues of Amazon ( numbers? ) just by knowing where their stuff is. Amazon collects about a hundred terabytes of clicks per year.
Data Types
bi-directional data flow to improve the GPS system. estimate the flow of traffic and how much time you need to reach a place using data collection from vehicles sending information back to Dash.net Amazon Amazon makes $40-$50 from each review. Experiment of stripping reviews from one of two very similar books, and measuring how much they earn. All these are examples of how data can be captured to help people make's decisions better. However Privacy can be a concern In Pay as you Drive Insurance, GPS data can be used to determine that a user is speeding, and thus make him ineligible for claims. DNA results 23andme.com might lead to a higher insurance rates if the person has a higher risk of contracting cancer at a later age. Fast innovation through experimentation
Where do people get their information from?
Data Silos and the Attention Economy
![]() Innovative Companies Utilizing Exemplifying the new Economy: (bidirectional communication - reducing asymmetries of Information)
Initial Contributors
|
||||||||||||||||||||