Data Mining and Electronic Business
Stat 252 and MS&E 238
Audio (as mp3): http://www.weigend.com/files/teaching/stanford/recordings/WeigendStanford2008Class5
Table of Contents
A. Life on an instrumented planet
In the first part of class, Dr Colin Harrison and Dr Martin Fleming of IBM gave their joint talk: Life on an Instrumented Planet. The sole purpose behind this work is to combine the serious problems that mankind has with the data about people.
The slides from the talk are at http://weigend.com/files/teaching/stanford/readings/ColinHarrisonMartinFleming.Stanford2008.05.05.ppt
Abstract: We are populating the planet with large numbers of sensors that generate new streams of data. The availability of this data is leading to the emergence of robust science- and heuristics-based models of real-world phenomena. What are the defining characteristics, requirements, and immediate and foreseeable applications of such platforms? How will The Instrumented Planet address a growing number of societal (data about people) challenges?
We also discussed with Colin how his view on the next big thing has changed since his brilliant 1997 talk, http://weigend.com/files/teaching/stanford/readings/970911ColinHarrisonNBT.ppt
1. It's a Brave New World
1.1 The Past:
About 4 to 5 decades back, there was abysmally little data about people though this might be hard for the present generation to accept. In the beginning there was paper - Birth certificates were handwritten, drivers licenses did not have photos, and Passports were bound in leather. Around the turn of the century Railway timetables where the first portable information systems that people could carry around that was not an instrument but 'information'.
Day of the Jackal : The first few pages of this book is intriguing about a person who tries to obtain a new identity for himself and thus goes in search of the same to a graveyard.
1.2 Identity of an Individual:
A comparison is drawn about the identity of a person 40-50 years back and a person in the new generation today. In those days, an identity was about a physical person but today identity is more about a pattern of information about the person, more than the physical person itself. For example: a person's SSN, email address, cell phone number etc, give an abstraction to the real person behind who is almost considered non-existent.
1.3 The Present:
Then came information technology: The new age brought along with it a sudden deluge of information which can be categorized into levels:
Information about People
1. Institutional - given to us by the government, banks, employers, criminal systems, etc.
2. Personal - created by us- like emails, videos, shopping, downloads etc.
Real-World information - Information about our planet
1. Environmental - creating models to represent the state of the earth. Example: Weather forecasting using mathematical models on computers -> satellites, weather stations, images of the universe, experiments to measure etc.
2. Societal - How society behaves - financial systems, transportation systems - integrated global form
Many new sources of data, many different approaches to obtain data, many different machines to analyze this data. Along with the large amounts of data came large machines that could process huge data and draw conclusions to provide great insights. These insights helped pave the way to numerous innovations.
The picture on the right shows one example of the kind of conclusions these data can give us. Using the data we at present have we can draw insights about both the past and the future with a certain amount of accuracy. Such data can help us determine the problems that the world might face due to natural conditions or disasters. It helps us be as prepared as we can and also to take as much precautions as possible. Exhaustion of resources, Earthquakes, Melting of ice and glaciers, Droughts, Global warming, nuclear power, electricity etc. These conclusions cannot be drawn with just one set of data, lot of data has to be aggregated, combined and analyzed to draw one simple conclusion.
Moore's law brings us to the point that computation and bandwidth is freely available leading to new forms of innovation. Are there innovations possible for the environmental markets? Oil and energy prices are rapidly rising and show no inclination to stop. If everyone on the planet is given a house and electricity there is not enough copper in the resources we have at present to facilitate the same. Copper plumbing pipes are being phased out for PVC. Insert picture of Puerto Rican house with copper ripped out.
Study published 1972 Links to growth - predicting that humanity will run out of resources - not all predictions came true. Copper prices went down for a while.
Oil still in the ground will last another 100-150 years. Consumption will double in the next 50 years. Higher standard of living, better food. But they will need more energy - doubling our current consumption.
We are entering a time of climate change, and we will need to deal with that in various ways. One of the main problems is on rainfall/rivers, and snowfall and the decline of snow mass and polar ice caps. Sierra Snow mass provides water for California and this resource is said to be soon declining The following rivers are in danger of going into distress - upper line the ratio of demand/availability begins to approach 2 to 1 shortage (lower line) approaches unity.
Similar problems exist in Australia, Sub-saharian Africa.
Nuclear power station life cycles range from 60 years to a century. How can you predict that in a century there will still be enough water to cool that power station? 15% of California's energy supply is used for pumping water.
We have models for climate, ocean. Most statistical models - how do you combine these models with so much data? How can we use this data to guide people on building new nuclear power plants, cities, etc.
Three Hard Truths:
1. Global demand for energy is rising - driven by population in Asia
2. Global energy supply is not rising - we are not finding new oil that is as easy to extract as old oil, expensive and messy to extract Canadian oil from beneath the sandstone.
3. Burning fossil fuels is bad for the environment - because of global warming, it is probably not a good thing to burn it.
Emerging economies have an advantage over us. They are not stuck with the old infrastructures that we are.
The rising cost of energy and the availability of resources are becoming important issues. Is there enough water for my suppliers to run their plants too?
1.4 What the world faces ahead:
Some of the issues that people could face due to these climate changes are:
Investors - Investment banks keep track of reports of how companies consume energy. Are these companies going to be exposed in the future in terms of lack of resources, taxes, etc
Employees - want or prefer to work for companies who are green
New Businesses - stress drives adaptation of business. Drives adaption through innovation. IBM develops more energy efficient computers and software
China has had a dirty past but now recognize that it needs to work towards a cleaner way of manufacturing
2. What roles can IT play in this?
2.1 Present Initiatives:
About 2% of the electricity consumption in the world is for running data centers across the world (IT). The remaining 98% has to be handled efficiently in order to reduce consumption.
It has now been realized that IT can do a lot to reduce the consequences or effects of the problems that the world faces. IBM has been working in this direction through some market experiments they are trying, like:
Trying to apply machine intelligence to
2.2 Some strategies:
"Greening" the existing "Brown" - a new strategy that would increase the efficiency and optimize the existing manufacturing and infrastructure systems through use of intelligent methodologies.
This will be done by collecting information through various means: surveillance cameras, security cameras, sensors embedded in the sea bed , etc. Once this large collection of data has been obtained running analytics on them to drive large scale simulations.
"Increasing Adoption of Green"
There are plans to start a new project in Masdar, Abu Dhabi to make it a zero carbon city by changing the transportation system there using the new insights they gain from their experiments. They aim to develop small vans or cars to accommodate 4-6 people at a time and dynamically schedule stops and pick-ups by learning from the passengers instead of a pre-scheduled timetable.
2.3 Areas for Research:Some avenues open are:
3. The Instrumented Planet
Instrumented Planet is but a move in the direction of using Integration Technology in this field.
3.1 The Idea:
For many years now, there have been financial companies and banks that have collected streams of data on transactions and customers and applied analytics on them. These analytical feeds were then sold back as "indicators" that could help and guide make more "informed" business decisions. This one idea behind 'The Instrumented Planet' by wanting to try and apply the same to environmental and societal data to see if such analytics could be applied on the streams of data obtained through these sensors to come up with real-time insights that could help in decision making.
3.2 Present Scope:
At present, we have been able to apply this for localized weather forecasting. These data feeds are collected and analyzed for 1 km to 1 km area, on the scale of 5-10 minutes. There are many people who would be interested in such kind of data. For example: Airports, Baseball stadiums and anyone concerned with real time events. Utility companies are another target customers who want to keep track of storm fronts and other such natural events to know where they should position their repair teams to make effective usage of time once the event is passed.
There are many more such effective conclusions and applications that can be thought of and thats the direction progress is being made in.
3.3 Life on an Instrumented Planet - The Bloom-berg of Earth Systems:
The picture on the right captured the architecture of the system we want in place. There are various kinds of data available that can be tapped into to draw some interesting insights from. The collection of earth systems consist of the analytics that will be used right from collection of data to the end decision making. Different varieties of data can be integrated to see what kind of conclusions they might provide. For example, we could combine Weather forecasts with traffic flow and say the program on TV tonight.
If we want to see how this could better our world, we just have to identify what businesses, processes or entities would benefit from fore-knowledge of events like traffic data, water productions, changes in real-time demand for a product etc.
We then think of how to model these data to make the best of it in the 'Bloom-berg' section.
We can think of the end result as some kind of Personal Optimizer that triggers you indicating the right time to take smart decisions, to help make critical decisions. It would be a means to connect the individual to the happenings around in the world.
Examples of the instrumented planet around us:
Instrumented Home: With availability of ZigBee and WiFi enabled power consumption meters it becomes possible to publish the consumption rates of a household on the web, enabling us to watch how one compares with peers in a social network. It is also possible to control household devices from office or on the go. Another example is the SmartAC program launched by PG&E in California. They install a SmartAC controller in households and businesses that sign up. This device monitors overall usage in the AC usage by the house and temperature etc., When the consumption reaches dangerous levels in a city that may result in a widespread outage, using cellular signals PG&E turns off some of the AC devices for small periods of time.
Traffic Management and Parking: Siemens of Germany is a specialist in traffic management systems and implemented Ruhrpilot in Germany. You can use this system to see live traffic conditions, find parking area status on internet and mobile devices.
Instrumented Automobile: The DASH device in addition to being a GPS device, for a $10 monthly subscription provides realtime traffic trends and traffic-aware routes. Each DASH device transmits destination, route and speed information to a central server that looks out for any adverse traffic conditions on the intended route and suggests live alternatives to drivers. If this technology is widely adopted as a standard, it may be possible to do very efficient routing of large volumes of traffic in cities.
3.4 Entities that benefit from fore-knowledge:
4. People, Places, and Data: What we can do -
4.1 Challenges IT faces ahead:
4.2 Challenges we face ahead:
Taking small initiatives like changing the bulb might be a starting step but definitely not sufficient for the end. An immense change is necessary in our lifestyle in order to reduce the CO2 emissions. It is necessary to recognise that this change does not mean worsening our living style or standards.
Behavioral economics has shown us that the participation rates are higher when you make things mandatory. There needs to be a trade off between economic growth and what is good for the planet. We should always remember that actions today will effect future generations.
Some thoughts from the discussion that followed:
B. What to do with your ideas?
In the second part of class, Warren Spar had joined us. We had been informed a little about him last week when we were asked if we wanted to see him. v He hopped on a plane to share some of his Wall Street experience with you, and why he set up his own practice of bringing ideas and money together. Together, we will try answer any question you have. You might want to look at his presentation http://weigend.com/files/teaching/stanford/readings/WarrenSpar.Stanford.2008.05.05.ppt to get a feeling of where his rich experience lies and what of it can be useful for you. Questions are best written on paper and given to me in the break or during his talk.
Warren had met with interested students after class for dinner.
/* If interested, send Warren an email (email removed, please with me in cc, his mobile is). He will have one hour at Three Seasons restaurant in Palo Alto, 6:30 – 7:30.*/
1. Opportunity and how to seek it:
Zerofootprint.net - A website that helps you understand your present footprint on the world's environment, what you want your future footprint to be and how to realize the distance between them.
Warren Spar joined the firm 8 years ago and helped countries, corporations, banks to raise money.
1.1 New Ideas:There are a number of VC companies out there that have their websites and portfolio's filled with their success stories. Bessemer Venture Partners' stands out among these because of their Anti-Portfolio of some good companies they had turned down - like eBay, Apple, Google, Intel, Intuit, Paypal
This should be an encouragement to all those who have been turned down by some of the VCs. The reasons that a VC rejects you might not necessarily mean your idea will not succeed. Don't be discouraged. You have to knock on a lot of doors to find someone who is interested and will support your idea. E.g. J.K. Rowling was turned down by 12 different publishers and we all know what a huge success her books were. So don’t get frustrated!! Another example from history is Walt Disney, he was fired from a newspaper because he lacked imagination.
It is necessary to move around the market to understand what makes keeps the market ticking, make acquaintances, acquire strategic corporate partners and do a lot of field digging before you throw your pitch. The only thing you would need to convince anyone about your idea is 'proof of concept'. It is very much necessary to convince them that your idea would sell and that people will be interested in what you do. For e.q. How do you know people would really use your website –like eBay, Facebook.
Different firms like different businesses, some of them are more interested in Bio-Technology, some in enterprise applications. So before you pitch your idea you might want to first determine whether the VC you are approaching has domain expertise in the field you are looking for and/or companies in the same market segment but not direct competitors. A theory from a Stanford Psychology Professor - Self Efficacy – The unshakable belief that some people have what it takes to succeed. Even though some VC rejects your idea you should still be able to sustain your belief in yourself and your idea, you have to keep yourself going.
“If you think that you can or you think that you can’t, you’re probably right” - A quote by Henry Ford. The general belief is that if you are not going to adjust your buiness model based upon what your peers feedback or feedback from someone whose opinion you trust or according to the market demands, then you do not have what it takes to make your idea successful. There will be hurdles and bumps, but if you believe in it, you should stick with it.
Compete - is a website that has done some analytics on the website traffic and usage of a few domains across a few years. We can notice in those analytics that most of the domains have their popularity over time, some tremendously and some minimally. There are actually very few who have actually increased in popularity. It can thus be seen that its a highly competitive market out there that is awaiting new players. The market is very dynamic and the cost of entry very low. If you are starting off right now then all you have to do is come up with a good idea or a good solution. If you are incumbent in the area then you have to very careful. There is definitely always place in each of these fields for more players, but it is necessary to convince a lot of people that you can reach and sustain popularity in order to be accepted. The reason being that the development cost, bandwidth cost, storage cost, access to internet and people cost are all really low, the equipment cost might be one thing to consider. All the big giants now in a field could very well be torn to pieces by the newcomers say in another 12 months, you never know. From the perspective of new comers, all they should care about should be coming up with an idea or solution.
The internet is being transformed rapidly, with more processor power, more storage, more bandwidth, more data and thus there is tremendous potential for change in the mornths and years to come. Things are constantly evolving – whether you are an incumbent or a newcomer, you have to be constantly on the watch and on the top of change.
1.2 How do people come up with ideas:
People come up with ideas and companies because they have an inkling or an intuition, some information or data, some empirical perception of what people want to see out there. Sometimes comes from disagreements within a prior company and this disagreement spurs them to break up and start off afresh. (e.g. Michael Bloomberg). There are lot of different reasons for people to come up with new ideas. But whatever the reason, you really need a lot of passion and a lot of conviction to be able to pursue the idea to make it fruitful. The biggest ideas are often born out of ordinary needs!
1.3 Tidbits about the Venture Capital Market:
The amount of Venture Capital is enormous at around $20-25B/year in the US alone. The average size of a deal can be seen as 4-5 million dollars. The Venture market has been a very stable market over the years inspite of various disruptions over the years. The reason being that the amount of intelligence, talent, intuition, hunger of entrepreneurs, ingenuity and interest that people have is still very large and consistent. Capital needs to be raised for the modeling, the development and implementation of the idea.
The market here is largely divided into segments with the largest of them being the following:
Transformation in China: In the first quarter of this year, $950M venture financing closed with 120 different companies. Of these only 25% was from US VCs and 75% from Indigenous VCs or European VCs.
When you want to start a company and continue your ownership in it, then it is advisable to keep the capital to as low as possible. Companies go through many rounds of capital raising. The first round is when they start of with the idea and are developing it, the capital raised during this period is quite low. The second row is when they enter into various markets and are expanding their business. The final round is when they are ready to jump in full fledged into international markets or new markets. The capital raised increases with the age of the venture, because by then there are some signs of the future that the venture has. The buiness model might have been proven, the end market might have been proved, etc.
1.4. Why do people decide to put in money in these ventures?One main reason is that they end up being a stakeholder in that company.
Risk/Reward Profile: PUT Image here
1.5 What makes a good business model:
The following are some of the criteria a VC uses to evaluate a new venture model.
1.6 How do people come up with business models:
There are two ways to go about this:
1.6.1 Think “In-the-Box”
1.6.2 Think “Out-of-the-Box”
1.7 Startup Tips
The Earliest Capital is the Most Expensive!
Yale used to give grants and then you pay back 2% of future income stream.
Using a Knowledgeable Banker:
The Capital Raise Process (Enlarge):
Important Transaction Terms:
The world out there is very flat and there is immense potential out there to be taped and explored.
C. Share your ideasThere are so many good ideas in class ranging from useful apps to big things. The last half hour of class was set aside to give up to 10 students or groups of students the chance to give a crisp 2-minute presentation. Crisp does not mean a superficial marketing pitch – you need to make it clear what the problem you are solving is, why it is important, and why you have a chance to get there. It does not need to be a power point presentation, just think through your points and deliver them clearly. Brief feedback was given in class.
THOSE STUDENTS WHO DID MAKE THESE PRESENTATIONS IN CLASS ARE KINDLY REQUESTED TO WRITE A BRIEF PARA ABOUT THE SAME HERE
D. Remarks on homeworkI am happy to hear from some of you that you are having fun with mining Friends for Sale. Since we got Homework 4 up late, the deadline is extended to Thursday May 8. I am greatly looking forward to going through your plots, no more than 5 graphs per group. Those will also be due at Thursday 5pm in my mailbox in the statistics department (in the general slots for Visitors that includes letter W).
Homework 5 (recommender system for delicious) will be posted tomorrow, and will be due on Sunday May 18. May I suggest you get an early start with Homework 5 so can ask Toby Segaran who will be in class next week if there are things about it that are not clear. You can peek at last year’s version at http://stanford2007.wikispaces.com/Stanford2007HW3
James Mao - I am helping to write this page.
Lisa Seeman - I will help write this page.
Pavani Vantimitta - I would like to join the team on this page too. Email: email removed
Sreeram Duvur -email removed
Jiajing Xu - I'm joining to help （email removed)