stanford2008

This is a mirror of the 2008 course wiki for Data Mining and Electronic Business (http://stanford2008.wikispaces.com). Some links may be unavailable. For the current wiki, see http://www.weigend.com/teaching.


Andreas Weigend
Stanford University
Data Mining and Electronic Business
Stat 252 and MS&E 238
Spring 2008

Audio (as mp3): http://www.weigend.com/files/teaching/stanford/recordings/WeigendStanford2008Class5

Class 5:

A. Life on an instrumented planet


In the first part of class, Dr Colin Harrison and Dr Martin Fleming of IBM gave their joint talk: Life on an Instrumented Planet. The sole purpose behind this work is to combine the serious problems that mankind has with the data about people.

The slides from the talk are at http://weigend.com/files/teaching/stanford/readings/ColinHarrisonMartinFleming.Stanford2008.05.05.ppt

Abstract: We are populating the planet with large numbers of sensors that generate new streams of data. The availability of this data is leading to the emergence of robust science- and heuristics-based models of real-world phenomena. What are the defining characteristics, requirements, and immediate and foreseeable applications of such platforms? How will The Instrumented Planet address a growing number of societal (data about people) challenges?

We also discussed with Colin how his view on the next big thing has changed since his brilliant 1997 talk, http://weigend.com/files/teaching/stanford/readings/970911ColinHarrisonNBT.ppt

1. It's a Brave New World


1.1 The Past:


About 4 to 5 decades back, there was abysmally little data about people though this might be hard for the present generation to accept. In the beginning there was paper - Birth certificates were handwritten, drivers licenses did not have photos, and Passports were bound in leather. Around the turn of the century Railway timetables where the first portable information systems that people could carry around that was not an instrument but 'information'.

Day of the Jackal : The first few pages of this book is intriguing about a person who tries to obtain a new identity for himself and thus goes in search of the same to a graveyard.

1.2 Identity of an Individual:

IDENTITY.jpg
A comparison is drawn about the identity of a person 40-50 years back and a person in the new generation today. In those days, an identity was about a physical person but today identity is more about a pattern of information about the person, more than the physical person itself. For example: a person's SSN, email address, cell phone number etc, give an abstraction to the real person behind who is almost considered non-existent.

1.3 The Present:


Then came information technology: The new age brought along with it a sudden deluge of information which can be categorized into levels:

Information about People
1. Institutional - given to us by the government, banks, employers, criminal systems, etc.
2. Personal - created by us- like emails, videos, shopping, downloads etc.

Real-World information - Information about our planet
1. Environmental - creating models to represent the state of the earth. Example: Weather forecasting using mathematical models on computers -> satellites, weather stations, images of the universe, experiments to measure etc.
2. Societal - How society behaves - financial systems, transportation systems - integrated global form
300pxrecent_sea_level_rise.png
Many new sources of data, many different approaches to obtain data, many different machines to analyze this data. Along with the large amounts of data came large machines that could process huge data and draw conclusions to provide great insights. These insights helped pave the way to numerous innovations.

The picture on the right shows one example of the kind of conclusions these data can give us. Using the data we at present have we can draw insights about both the past and the future with a certain amount of accuracy. Such data can help us determine the problems that the world might face due to natural conditions or disasters. It helps us be as prepared as we can and also to take as much precautions as possible. Exhaustion of resources, Earthquakes, Melting of ice and glaciers, Droughts, Global warming, nuclear power, electricity etc. These conclusions cannot be drawn with just one set of data, lot of data has to be aggregated, combined and analyzed to draw one simple conclusion.

Moore's law brings us to the point that computation and bandwidth is freely available leading to new forms of innovation. Are there innovations possible for the environmental markets? Oil and energy prices are rapidly rising and show no inclination to stop. If everyone on the planet is given a house and electricity there is not enough copper in the resources we have at present to facilitate the same. Copper plumbing pipes are being phased out for PVC. Insert picture of Puerto Rican house with copper ripped out.

Study published 1972 Links to growth - predicting that humanity will run out of resources - not all predictions came true. Copper prices went down for a while.

hg1700_1024.png

Oil still in the ground will last another 100-150 years. Consumption will double in the next 50 years. Higher standard of living, better food. But they will need more energy - doubling our current consumption.
g2.jpgg3.jpgg1__.jpg

Climate.jpg We are entering a time of climate change, and we will need to deal with that in various ways. One of the main problems is on rainfall/rivers, and snowfall and the decline of snow mass and polar ice caps. Sierra Snow mass provides water for California and this resource is said to be soon declining The following rivers are in danger of going into distress - upper line the ratio of demand/availability begins to approach 2 to 1 shortage (lower line) approaches unity.



Similar problems exist in Australia, Sub-saharian Africa.

Nuclear power station life cycles range from 60 years to a century. How can you predict that in a century there will still be enough water to cool that power station? 15% of California's energy supply is used for pumping water.

We have models for climate, ocean. Most statistical models - how do you combine these models with so much data? How can we use this data to guide people on building new nuclear power plants, cities, etc.

Three Hard Truths:
1. Global demand for energy is rising - driven by population in Asia
2. Global energy supply is not rising - we are not finding new oil that is as easy to extract as old oil, expensive and messy to extract Canadian oil from beneath the sandstone.
3. Burning fossil fuels is bad for the environment - because of global warming, it is probably not a good thing to burn it.

waterusage.jpg

Emerging economies have an advantage over us. They are not stuck with the old infrastructures that we are.

The rising cost of energy and the availability of resources are becoming important issues. Is there enough water for my suppliers to run their plants too?

1.4 What the world faces ahead:


Some of the issues that people could face due to these climate changes are:
  1. Increase in cost of resources and depletion of quantity of resources
  2. New opportunities for services and products
  3. Reputation - Stocks, management of resources, social responsibility. e.g. Cornflakes - companies advertise their green-ness but are they willing to pay premium for it?
  4. Regulation and Legislation - Government initiatives. Regulation is a major driver in shaping behaviors of consumers in how we consume energy. A new notion is to put a price to an entity that has so far been considered free property to mankind. This kind of power, to do this, vests only in the government.

Investors - Investment banks keep track of reports of how companies consume energy. Are these companies going to be exposed in the future in terms of lack of resources, taxes, etc

Employees - want or prefer to work for companies who are green

New Businesses - stress drives adaptation of business. Drives adaption through innovation. IBM develops more energy efficient computers and software

China has had a dirty past but now recognize that it needs to work towards a cleaner way of manufacturing



2. What roles can IT play in this?


2.1 Present Initiatives:


About 2% of the electricity consumption in the world is for running data centers across the world (IT). The remaining 98% has to be handled efficiently in order to reduce consumption.
It has now been realized that IT can do a lot to reduce the consequences or effects of the problems that the world faces. IBM has been working in this direction through some market experiments they are trying, like:
Trying to apply machine intelligence to
  1. Traffic models - to help optimize the traffic flow in big cities (Singapore, Stockholm, London)
  2. Utility networks - smooth the demand for power to enable the generators to run their systems more efficiently so that they can utilize their infrastructure to the best possible level. Improve energy management.
  3. Water management - how we can best use the water we have now, how to best use aqua furs, rivers etc. Improved water quality and reduced water usage.
  4. Carbon credits - reduce emissions of carbon by bringing in solutions like recycling, waste management etc.

2.2 Some strategies:


"Greening" the existing "Brown" - a new strategy that would increase the efficiency and optimize the existing manufacturing and infrastructure systems through use of intelligent methodologies.
This will be done by collecting information through various means: surveillance cameras, security cameras, sensors embedded in the sea bed , etc. Once this large collection of data has been obtained running analytics on them to drive large scale simulations.

"Increasing Adoption of Green"
There are plans to start a new project in Masdar, Abu Dhabi to make it a zero carbon city by changing the transportation system there using the new insights they gain from their experiments. They aim to develop small vans or cars to accommodate 4-6 people at a time and dynamically schedule stops and pick-ups by learning from the passengers instead of a pre-scheduled timetable.

2.3 Areas for Research:

Some avenues open are:
  1. Science: Physics, Synthetic Biology (new materials for purifying water), Climate Physics
  2. Math: Machine learning, Simulation
  3. Systems: Semiconductor physics, Virtualization
  4. Software: Efficiency, Power-aware applications
  5. E-Business: ERP (Enterprise Resource Planning - Integration technology) for utilities, Instrumented Planet



3. The Instrumented Planet


Instrumented Planet is but a move in the direction of using Integration Technology in this field.

3.1 The Idea:


For many years now, there have been financial companies and banks that have collected streams of data on transactions and customers and applied analytics on them. These analytical feeds were then sold back as "indicators" that could help and guide make more "informed" business decisions. This one idea behind 'The Instrumented Planet' by wanting to try and apply the same to environmental and societal data to see if such analytics could be applied on the streams of data obtained through these sensors to come up with real-time insights that could help in decision making.

3.2 Present Scope:


At present, we have been able to apply this for localized weather forecasting. These data feeds are collected and analyzed for 1 km to 1 km area, on the scale of 5-10 minutes. There are many people who would be interested in such kind of data. For example: Airports, Baseball stadiums and anyone concerned with real time events. Utility companies are another target customers who want to keep track of storm fronts and other such natural events to know where they should position their repair teams to make effective usage of time once the event is passed.

There are many more such effective conclusions and applications that can be thought of and thats the direction progress is being made in.


3.3 Life on an Instrumented Planet - The Bloom-berg of Earth Systems:


lec.jpg The picture on the right captured the architecture of the system we want in place. There are various kinds of data available that can be tapped into to draw some interesting insights from. The collection of earth systems consist of the analytics that will be used right from collection of data to the end decision making. Different varieties of data can be integrated to see what kind of conclusions they might provide. For example, we could combine Weather forecasts with traffic flow and say the program on TV tonight.

If we want to see how this could better our world, we just have to identify what businesses, processes or entities would benefit from fore-knowledge of events like traffic data, water productions, changes in real-time demand for a product etc.

We then think of how to model these data to make the best of it in the 'Bloom-berg' section.

We can think of the end result as some kind of Personal Optimizer that triggers you indicating the right time to take smart decisions, to help make critical decisions. It would be a means to connect the individual to the happenings around in the world.

Examples of the instrumented planet around us:

Instrumented Home: With availability of ZigBee and WiFi enabled power consumption meters it becomes possible to publish the consumption rates of a household on the web, enabling us to watch how one compares with peers in a social network. It is also possible to control household devices from office or on the go. Another example is the SmartAC program launched by PG&E in California. They install a SmartAC controller in households and businesses that sign up. This device monitors overall usage in the AC usage by the house and temperature etc., When the consumption reaches dangerous levels in a city that may result in a widespread outage, using cellular signals PG&E turns off some of the AC devices for small periods of time.

Traffic Management and Parking: Siemens of Germany is a specialist in traffic management systems and implemented Ruhrpilot in Germany. You can use this system to see live traffic conditions, find parking area status on internet and mobile devices.

Instrumented Automobile: The DASH device in addition to being a GPS device, for a $10 monthly subscription provides realtime traffic trends and traffic-aware routes. Each DASH device transmits destination, route and speed information to a central server that looks out for any adverse traffic conditions on the intended route and suggests live alternatives to drivers. If this technology is widely adopted as a standard, it may be possible to do very efficient routing of large volumes of traffic in cities.

3.4 Entities that benefit from fore-knowledge:


  1. Commodity trading: Predicted frosts in Florida can drive up orange juice futures.
  2. City / Regional Management - Flow of traffic, waste water, electricity etc through a city; Major weather events
  3. Transportation - Movement of traffic, real-time tracking of vehicles
  4. Manufacturing - Materials / process optimization
  5. Retail - Weather and Transportation related
  6. Consumers - For use by individuals.



4. People, Places, and Data: What we can do -


4.1 Challenges IT faces ahead:

  1. Find efficient ways to use energy and other resources.
  2. Find ways to extract new sources of energy and other resources.
  3. Find ways to recycle the waste from our consumption of energy and other resources.

4.2 Challenges we face ahead:

  1. How do we influence, induce or motivate human behavior?
  2. What are the incentives required to make people think this way?
  3. Does public access to such information induce people to change their behavior?
  4. How can we make a "user-interface" (some kind of black box) that would make a low-carbon lifestyle appealing to adopt?

Taking small initiatives like changing the bulb might be a starting step but definitely not sufficient for the end. An immense change is necessary in our lifestyle in order to reduce the CO2 emissions. It is necessary to recognise that this change does not mean worsening our living style or standards.

Behavioral economics has shown us that the participation rates are higher when you make things mandatory. There needs to be a trade off between economic growth and what is good for the planet. We should always remember that actions today will effect future generations.

Some thoughts from the discussion that followed:
    • Read Collapse by Jared Diamond - A book about how societies that have lived in a certain ecological pattern have fared as they continued in the same pattern.
    • The amount of energy it takes to send a gigabyte across the world over the internet equals a block of coal.
    • Switches are very energy intensive.
    • Carbon taxes is a thought in the coming years.
    • There has also been consideration for direct financial incentives to encourage organizations.
    • Now developing countries are trying to find a balance between actions that lead to long-term benefits rather than short-term benefits. This is a very confusing decision for many.




B. What to do with your ideas?


In the second part of class, Warren Spar had joined us. We had been informed a little about him last week when we were asked if we wanted to see him. v He hopped on a plane to share some of his Wall Street experience with you, and why he set up his own practice of bringing ideas and money together. Together, we will try answer any question you have. You might want to look at his presentation http://weigend.com/files/teaching/stanford/readings/WarrenSpar.Stanford.2008.05.05.ppt to get a feeling of where his rich experience lies and what of it can be useful for you. Questions are best written on paper and given to me in the break or during his talk.

Warren had met with interested students after class for dinner.

/* If interested, send Warren an email (email removed, please with me in cc, his mobile is). He will have one hour at Three Seasons restaurant in Palo Alto, 6:30 – 7:30.*/

1. Opportunity and how to seek it:


Zerofootprint.net - A website that helps you understand your present footprint on the world's environment, what you want your future footprint to be and how to realize the distance between them.

Warren Spar joined the firm 8 years ago and helped countries, corporations, banks to raise money.

1.1 New Ideas:

There are a number of VC companies out there that have their websites and portfolio's filled with their success stories. Bessemer Venture Partners' stands out among these because of their Anti-Portfolio of some good companies they had turned down - like eBay, Apple, Google, Intel, Intuit, Paypal

This should be an encouragement to all those who have been turned down by some of the VCs. The reasons that a VC rejects you might not necessarily mean your idea will not succeed. Don't be discouraged. You have to knock on a lot of doors to find someone who is interested and will support your idea. E.g. J.K. Rowling was turned down by 12 different publishers and we all know what a huge success her books were. So don’t get frustrated!! Another example from history is Walt Disney, he was fired from a newspaper because he lacked imagination.

It is necessary to move around the market to understand what makes keeps the market ticking, make acquaintances, acquire strategic corporate partners and do a lot of field digging before you throw your pitch. The only thing you would need to convince anyone about your idea is 'proof of concept'. It is very much necessary to convince them that your idea would sell and that people will be interested in what you do. For e.q. How do you know people would really use your website –like eBay, Facebook.

Different firms like different businesses, some of them are more interested in Bio-Technology, some in enterprise applications. So before you pitch your idea you might want to first determine whether the VC you are approaching has domain expertise in the field you are looking for and/or companies in the same market segment but not direct competitors. A theory from a Stanford Psychology Professor - Self Efficacy – The unshakable belief that some people have what it takes to succeed. Even though some VC rejects your idea you should still be able to sustain your belief in yourself and your idea, you have to keep yourself going.

“If you think that you can or you think that you can’t, you’re probably right” - A quote by Henry Ford. The general belief is that if you are not going to adjust your buiness model based upon what your peers feedback or feedback from someone whose opinion you trust or according to the market demands, then you do not have what it takes to make your idea successful. There will be hurdles and bumps, but if you believe in it, you should stick with it.

Compete - is a website that has done some analytics on the website traffic and usage of a few domains across a few years. We can notice in those analytics that most of the domains have their popularity over time, some tremendously and some minimally. There are actually very few who have actually increased in popularity. It can thus be seen that its a highly competitive market out there that is awaiting new players. The market is very dynamic and the cost of entry very low. If you are starting off right now then all you have to do is come up with a good idea or a good solution. If you are incumbent in the area then you have to very careful. There is definitely always place in each of these fields for more players, but it is necessary to convince a lot of people that you can reach and sustain popularity in order to be accepted. The reason being that the development cost, bandwidth cost, storage cost, access to internet and people cost are all really low, the equipment cost might be one thing to consider. All the big giants now in a field could very well be torn to pieces by the newcomers say in another 12 months, you never know. From the perspective of new comers, all they should care about should be coming up with an idea or solution.

The internet is being transformed rapidly, with more processor power, more storage, more bandwidth, more data and thus there is tremendous potential for change in the mornths and years to come. Things are constantly evolving – whether you are an incumbent or a newcomer, you have to be constantly on the watch and on the top of change.

1.2 How do people come up with ideas:


People come up with ideas and companies because they have an inkling or an intuition, some information or data, some empirical perception of what people want to see out there. Sometimes comes from disagreements within a prior company and this disagreement spurs them to break up and start off afresh. (e.g. Michael Bloomberg). There are lot of different reasons for people to come up with new ideas. But whatever the reason, you really need a lot of passion and a lot of conviction to be able to pursue the idea to make it fruitful. The biggest ideas are often born out of ordinary needs!

1.3 Tidbits about the Venture Capital Market:


The amount of Venture Capital is enormous at around $20-25B/year in the US alone. The average size of a deal can be seen as 4-5 million dollars. The Venture market has been a very stable market over the years inspite of various disruptions over the years. The reason being that the amount of intelligence, talent, intuition, hunger of entrepreneurs, ingenuity and interest that people have is still very large and consistent. Capital needs to be raised for the modeling, the development and implementation of the idea.

The market here is largely divided into segments with the largest of them being the following:
  1. Biotechnology - mainly because of the large amounts of data available and the ability to process them is still being explored.
  2. Internet - for obvious reasons.
  3. Clean-tech - people looking at alternative energy and resources.

Transformation in China: In the first quarter of this year, $950M venture financing closed with 120 different companies. Of these only 25% was from US VCs and 75% from Indigenous VCs or European VCs.

When you want to start a company and continue your ownership in it, then it is advisable to keep the capital to as low as possible. Companies go through many rounds of capital raising. The first round is when they start of with the idea and are developing it, the capital raised during this period is quite low. The second row is when they enter into various markets and are expanding their business. The final round is when they are ready to jump in full fledged into international markets or new markets. The capital raised increases with the age of the venture, because by then there are some signs of the future that the venture has. The buiness model might have been proven, the end market might have been proved, etc.

1.4. Why do people decide to put in money in these ventures?

One main reason is that they end up being a stakeholder in that company.

Risk/Reward Profile: PUT Image here

  1. Private Equity: Invest in private companies that have mostly established themselves by then. It is a lower risk and modest returns area.
  2. Angel investors: High risk, High returns: This is when you go to your family and friends asking them to invest in your venture.
  3. Early Stage firms: High risk, High returns: They invest in new ventures when they find potential.
  4. Growth Equity: Medium risk, modest returns.

1.5 What makes a good business model:


The following are some of the criteria a VC uses to evaluate a new venture model.
  1. Total Addressable Market – VCs looking for $1B market sizes. VCs prefer to work with ventures that have huge potential markets. That definitely does not mean that businesses of the scale of $10-50 millions do not flourish. There are lot of examples of these too.
  2. Conversion/Adoption Rates (low cost to go in, easy application access, available for download in most cases) - what is the brain damage involved in trying to adapt to this new idea? In the case of Facebook, Google, eBay etc., the adoption rates are fairly low, the amount of effort required to adapt to these are fairly low.
  3. Revenue Model – How are you going to bring revenue into the company (e.g. annual contracts, advertisements, subscription, selling traffic somehow). In order to at some point make your company self sustaining.
  4. Growth Rates – think about how quick these growth rates should be – low at the beginning (relatively simple features, haven’t determined what the market place wants)
  5. Concentration of Client Base - Customer concentration.
  6. Depth of Management Team - There will be lot of issues to consider, and thus being able to have a good team to delegate all the work efficiently is necessary.
  7. Barrier to Entry /IP - Related to patents and Intellectual property. Might be a short lived thing that you will have to milk for all it can give if it is not a sustainable idea.

1.6 How do people come up with business models:


There are two ways to go about this:

1.6.1 Think “In-the-Box”

  1. Mimic Ideas from Overseas: Like Auction sites or Ring tones. E.g. GILT

  2. Improve on Existing Technology / Idea

  3. Apply One Technology to Another Industry: Like the GPS

1.6.2 Think “Out-of-the-Box”

  1. Create a Whole New Market

  • Social Networks

  • Clean Tech


1.7 Startup Tips


The Earliest Capital is the Most Expensive!
  • Beg your friend and family for money
  • Don’t pay yourself
  • Beg your developer friends to build the software for you
  • Give sweat equity to whomever you need in the early stages of the company
  • Penny pinch on everything – you most likely don’t need a CFO, nice offices, etc. Just get the product out of the door!

Yale used to give grants and then you pay back 2% of future income stream.

Using a Knowledgeable Banker:

  • Use a banker with domain expertise.
  • Avoids conflicts with potential investors
  • Creates a perception of professionalism and investor demand
  • Running a sale process increases valuations and will result in better transaction terms
  • Bankers have strong relationships
  • Bankers have valuable insights into markets
  • Bankers can offer "out-of-the-box" thoughts on potential investors

The Capital Raise Process (Enlarge):

STAGE I - Develop Marketing Material
  • Executive Summary
  • PowerPoint Presentation
  • Financial Model
STAGE II - Contact Institutions
  • Contact Potential Strategic/Private Equity Investors
  • Distribute Executive Summary
  • Schedule Conference Calls/Meetings
STAGE III - Meetings & Conference Calls
  • Engage in Conference Calls/Meetings
  • Initial Due Diligence
STAGE IV - Review Term Sheets
  • Review Term Sheets
  • Negotiation Process
  • Select Acquirer / Investor
STAGE V - Due Diligence & Closing
  • Complete Due Diligence
  • Documentation
  • Close and Fund

Important Transaction Terms:

  • Pre-Money Valuation + Investment = Post-Money Valuation
  • Participating vs. Non Participating Preferred
  • Liquidation preference
  • Anti-Dilution provisions
  • Pay-to-Play

The world out there is very flat and there is immense potential out there to be taped and explored.





C. Share your ideas

There are so many good ideas in class ranging from useful apps to big things. The last half hour of class was set aside to give up to 10 students or groups of students the chance to give a crisp 2-minute presentation. Crisp does not mean a superficial marketing pitch – you need to make it clear what the problem you are solving is, why it is important, and why you have a chance to get there. It does not need to be a power point presentation, just think through your points and deliver them clearly. Brief feedback was given in class.

THOSE STUDENTS WHO DID MAKE THESE PRESENTATIONS IN CLASS ARE KINDLY REQUESTED TO WRITE A BRIEF PARA ABOUT THE SAME HERE





D. Remarks on homework

I am happy to hear from some of you that you are having fun with mining Friends for Sale. Since we got Homework 4 up late, the deadline is extended to Thursday May 8. I am greatly looking forward to going through your plots, no more than 5 graphs per group. Those will also be due at Thursday 5pm in my mailbox in the statistics department (in the general slots for Visitors that includes letter W).
Homework 5 (recommender system for delicious) will be posted tomorrow, and will be due on Sunday May 18. May I suggest you get an early start with Homework 5 so can ask Toby Segaran who will be in class next week if there are things about it that are not clear. You can peek at last year’s version at http://stanford2007.wikispaces.com/Stanford2007HW3



James Mao - I am helping to write this page.
Lisa Seeman - I will help write this page.
Pavani Vantimitta - I would like to join the team on this page too. Email: email removed
Sreeram Duvur -email removed
Jiajing Xu - I'm joining to help (email removed)