Office Location: Sequoia
127
Office Hours: MTW
2:15-2:45pm
Stat252
Summer 2005
Class location: Skilling
193 (entrance upstairs)
Class time: MTW 3:00 (on time!) - 5:00pm
Welcome! This page is the skeleton of the course. It consists of 12 classes of 2 hours each. When there is a guest speaker, I tend to split the time with them, framing the problem they are solving in the context of the course. In the table below, the categories Key insights and Data mining opportunities will be filled out as we proceed. I am very open to feedback: Please do contact me via email with any suggestions or remarks about this page or the course. Or if you are in class, just talk to me. Thank you.
|
Date |
Topic |
Guest Speaker |
|
M 7/18 |
|
- |
|
After
class: Read http://weigend.com/Teaching/Stanford/Papers/SeizeTheOccasion%2520RozanskiBollmanLipman%2520StrategyAndCompetition2001.pdf (Clustering based on session behavior,
rather than on history), as well as an Interview on Data
Mining and E-Business (SAS Magazine 2005) Key
insights: Size of e-Business: 1trillion USD, problems of counting. Atoms: 1M,
Bits: 1Bn units per day. Communication is cheap (VoIP)
central: common protocol. Supply-side economies of scale vs
demand side econ of scale. Network effects. Fast feedback: empowers
experiments. Constantly learn: threshold of new things = 0 à Internet speed. Data
mining opportunities: Company
case: AMZN Class
notes: www.weigend.com/Teaching/Stanford/ppt/Weigend1Ebusiness.ppt |
||
|
T 7/19 |
|
- |
|
Before
class: (1) Read
the relevant chapters from the textbook by Baldi, Frasconi and Smyth. (2) Install Yahoo desktop search on your computer
(or a similar product such as X1, or MSN toolbar). Hand in on one sheet of
paper at beginning of class: What are the data mining opportunities and
challenges based on the data that are collected with this tool? (10 points for
well-formulated, solid thoughts.) actionable Key
concepts: Storage
is free. Measuring, Modeling, Predicting, and Acting. RFID. Explicit data vs implicit data. Agents for mobile phones. Creating
value to end user. LOTS LOTS of DM opps. Perspective
of companies: create transparency. Airline: publishing delays. Class
notes: www.weigend.com/Teaching/Stanford/ppt/Weigend2Transparency.ppt |
||
|
W 7/20 |
|
|
|
Before
class: Read www.weigend.com/Teaching/Stanford/Papers/EfficientSchedulingOfInternetBannerAdvertisements
AmiriMenon ACMInternetTechnology2003.pdf Key
insights: Advertising
is the fuel of the Inet economy. Sources of info
for selecting an ad: context, current situation (where from)?, past behavior (persistent history / pseudonyms / real
person). (Recommendation = ads = navigation.) Purpose:
What ad to show to a given visitor to a website. Structure of the industry:
Parties involved and value propositions in online advertising. Network vs advertiser vs publisher (=
website) perspective. Algorithms for ad placement based on decision trees.
Pros and cons of CART compared do alternatives. (Equivalence to linear
programming problem allows us to allocate resources based on nonlinear
programming.) Data
mining opportunities: Class
notes: |
||
|
M 7/25 |
|
Ulf
Reips, |
|
Before
class: Create
an account on Google’s AdWords or AdSense (or similar products by Yahoo or others) and use it
for a site of interest to you. How would you create this powerful set of
keyword suggestions? Hand in one page with the idea for the algorithm, and a
printout of the page that shows your ad. (5 points for well-formulated, solid
thoughts.) Key
insights: Produce
info -> find info -> rate info. Relevance function. 2 stage deal. Have
to have good search, that enables monetization. “Sell”
keywords. Sell: using an auction. Scientific
experiments on web. Tools: response time, error rates. Avoid typical
mistakes. Recruit subjects. Experiments,
vs surveys. Paid
search. Short-term vs long-term effects Data
mining opportunities: Search relevance Company
case: GOOG Class
notes: www.weigend.com/Teaching/Stanford/ppt/Weigend3Search.ppt |
||
|
T 7/26 |
|
|
|
After
class: Discuss the
ingredients (e.g., what inputs etc.) of an algorithm that presents the user
with mutual matches based on implicit behavior on the site? What would you
need to log? (10 points for well-formulated, solid thoughts.) Frame this in a decision
theoretic framework (10 bonus points.) Due Monday 8/8 before class. Key
insights: Limitation
to horizontal search. Pros more state / prefs
provided. Data
mining opportunities: Class
notes:
|
||
|
W 7/27 |
|
Andrew Sispoidis and Darren Herman (CEO and CCO, www.iga-partners.com) |
|
Before
class: Key
insights: Data
mining opportunities: Class
notes: |
||
|
M 8/1 |
|
|
|
Before
class: Read
www.firstmonday.org/issues/issue2_4/goldhaber/
and at www.well.com/user/mgoldh/natecnet.html
also as pdf at www.weigend.com/Teaching/Stanford/Papers/AttentionEconomy
Goldhaber FirstMonday2005.pdf Key
insights: Platform,
Network effects, lock-in. Data
mining opportunities: Company
case: MSFT Additional
reading: If you
are interested in the networked digital economy, read chapters 1-3 and 5-7 of
Shapiro and Varian. Class
notes: www.weigend.com/Teaching/Stanford/ppt/Weigend4Digital.ppt |
||
|
T 8/2 |
|
|
|
Before
class: Required
is the 1997 classic: The
Coming Battle for Customer Information (Hagel
and Rayport, 1997; Harvard Business Review) For
students with more technical background,
additional reading is www.cs.washington.edu/homes/pedrod/papers/kdd04.pdf,
also available at www.weigend.com/Teaching/Stanford/Papers/AdversarialClassification
DalviEtAl KDD04.pdf, Key
insights: Data
mining opportunities: Class
notes: |
||
|
W 8/3 |
|
Francisco |
|
Before
class: Sign up at www.MusicStrands.com to obtain an understanding
of how the discovery system is working. Hand in on one sheet
of paper at the beginning of the next class: What are the data
mining opportunities and challenges based on the data that are collected
with this tool? (10 points for well-formulated, solid thoughts.) Key
insights: Data
mining opportunities: Class
notes: www.weigend.com/Teaching/Stanford/ppt/Weigend6Customers.ppt |
||
|
M 8/8 |
|
|
|
Before
class: (Online
dating algorithms due, see 7/26) Key
insights: Data
mining opportunities: Company
case: EBAY Class
notes: www.weigend.com/Teaching/Stanford/ppt/Weigend5Markets.ppt Abstract:
Multi-Industry, Multi-Purpose, Multi-Lingual Mining: Application and Language
Matters. Text mining applications will ultimately rest on common
service-oriented platforms which can support a variety of applications and
industries. Factors forcing this outcome include the cost and competitive
pressures on enterprises, the maturation of the software/service
infrastructure, the likely combination of search and extraction capabilities,
and the inherent challenge of robust, linguistic analysis of text.
Nevertheless, the particular requirements of different applications and
industries must be properly addressed. This talk will illustrate how a
general platform can be applied across a range of industries and applications
including competitive intelligence, counter-terrorism, law enforcement,
publishing, legal, and compliance. |
||
|
T 8/9 |
|
Chris
Alden (CEO, www.Rojo.com) |
|
Before
class: Create
accounts on www.rojo.com and on del.icio.us. Explore the power of discovery in
both cases. What data mining opportunities do you see in these applications?
Please had in two thoughtful pages, one for each company. (2 * 10 = 20 points
for well-formulated, solid thoughts.) Key
insights: Push
‘vs’ pull? Data
mining opportunities: Class
notes: www.weigend.com/Teaching/Stanford/ppt/Weigend7Innovation.ppt |
||
|
W 8/10 |
|
|
|
Before
class: Read www.weigend.com/Teaching/Stanford/Papers/PrivacyInECommerce%2520BerendtGuentherSpiekermann%2520CACM2005.pdf
Key insights: Data
mining opportunities: Class
notes: |
||
Additional topics if
time available:
Grading:
Teaching Assistants:
http://www.weigend.com/Teaching/Stanford/index.html
by
| +1 (917) 697-3800
| www.weigend.com