Background

Electronic commerce (eCommerce) is part of our everyday lives. Whether we purchase a book on Amazon.com, sell a DVD on eBay.com, or click on a sponsored link on Google.com, electronic commerce surrounds us. Electronic commerce also produces a large amount of data: when we click, bid, rate, or pay, our digital “footprints” are recorded and stored. Yet, despite this abundance of available data, the field of statistics has, at least to date, played a rather minor role in contributing to the development of methods for empirical research related to electronic commerce. The goal of this book is to change that by highlighting the many statistical challenges that eCommerce data pose, by describing some of the methods that are currently being used and developed, and to engage researchers in this exciting interdisciplinary area. The chapters are written by researchers and practitioners from the fields of statistics, data-mining, computer science, information systems, and marketing.

The idea for this book originated from a conference that we organized in May 2005 at the University of Maryland. The theme of this workshop was rather unique: Statistical Challenges and Opportunities in Electronic Commerce Research. We organized this workshop because during our collaboration with non-statistician researchers from the area of electronic commerce we found that there was a disconnect between the available data (and its challenges) and the methods used to analyze that data. In particular, there was a strong disconnect between statistics (which, as a discipline, is based upon the science of data) and the domain research, where statistical methods were used for analyzing electronic commerce data. The conference was a great success: we were able to secure an National Sciences Foundation (NSF) grant, over 100 participants attended from academia, industry and government, and finally, the conference resulted in a special issue of the widely-read statistics journal Statistical Science. Moreover, the conference has become an annual event and is currently in its third year (2006 at the University of Minnesota, 2007 at the University of Connecticut; future conferences are planned at NYU and Carnegie Mellon University). All in all, this inaugural conference has created a growing community of researchers from statistics, information systems, marketing, computer science and related fields. This book is yet another fruitful outcome of the efforts of this community.

Electronic commerce has received an extreme surge of popularity in recent years. By eCommerce we mean any form of transaction using the internet like buying or selling goods or exchanging information related to goods. eCommerce has had a huge impact on the way we live today compared to a decade or so ago: It transformed the economy, eliminated borders, opened the door to many innovations, and created new ways in which consumers and businesses interact. Although many predicted the death of eCommerce with the “burst of the internet bubble” in the late 90ties, eCommerce is thriving more than ever.

There are many examples of eCommerce. These include electronic transactions (e.g., online purchases), selling or investing; electronic marketplaces like Amazon.com or online auctions like eBay.com; internet advertising (e.g., sponsored ads by Google, Yahoo! and Microsoft); clickstream data and cookie-tracking; e-bookstores and e-grocers; web-based reservation systems and ticket purchasing; marketing email and message postings on web-logs; downloads of music, video and other information; user groups and electronic communities; online discussion boards and learning facilities; open source projects; and many, many more. All of these eCommerce components have a large impact on the economy in general, and they have also transformed consumers' and businesses' life.

The public nature of many internet transactions has allowed empirical researchers new opportunities to gather and analyze data in order to learn about individuals, companies and societies. Theoretical results, founded in economics and psychology and derived for the offline, brick-and-mortar world, have often proven not to hold in the online environment. Possible reasons are the world-wide reach of the internet and related anonymity of users, its unlimited resources, constant availability and continuous change. For this reason, and also due to the availability of massive amounts of freely available high-quality web-data, empirical research is thriving.

The fast growing empirical eCommerce research has been concentrated in the fields of information systems, economics, computer science, and marketing. However, the availability of this new type of data also comes with many new statistical challenges in the different stages of data collection, data preparation and exploration, and in the modeling and analysis stages. These challenges have been widely overlooked in many of these research efforts. The absence of statisticians from this field is surprising. Two possible explanations are the physical distance between researchers from the fields of information systems and statistics and a technological gap. In the academic world, it is rare to find the two groups or departments located within the same school or college. Information systems departments tend to be located within business schools, whereas statistics departments are typically within the social sciences, engineering, or the liberal arts and sciences. The same disconnect often takes place in industry, where it appears that only now statisticians are slowly being integrated into eCommerce companies. This physical disconnect has kept many statisticians unknowledgeable of the exciting empirical work done in information systems departments. The second explanation for the disconnect is the format in which eCommerce data often arrive. eCommerce data, although in many cases publicly available on the web, arrive in the form of HTML pages. This means that putting together a standard database requires the collection and extraction of HTML pages to obtain the desired information. These skills are not common components of the statistics education and thus the discipline is often unaware of web crawling and related data collection technologies which open the door to eCommerce empirical research.

Our collaboration with information systems and marketing colleagues has shown just how much the two sides can benefit from crossing the road. eCommerce data are different than other types of data in many ways, and they pose real statistical challenges. Using off-the-shelf statistical methods can lead to incorrect or inaccurate results, and furthermore, to missing out on important real effects. The integration of statistical thinking into the entire process of collecting, cleaning, displaying, and analyzing eCommerce data can lead to more sound science and to new research advances. We therefore see this as an opportunity to establish a new interdisciplinary area: Empirical research in eCommerce.