De
Veaux is Professor of Statistics at Williams College, but for next
year (2006/7) he will be the Kenan Visiting Professor for Distinguished
Teaching at Princeton University. This year he is an invited professor
at the Université Réné Descartes in Paris
(Medical School).
He was also an Assistant Professor at the Wharton School and Princeton.
He has won numerous teaching awards and won both the Wilcoxon and
Shewell awards from the American Society for Quality and was elected
a fellow of the ASA in 1998. De Veaux has been a consultant for over 20 years for such Fortune
500 companies as Hewlett-Packard, Alcoa, American Express, Bank
One, GlaxoSmithKline, Dupont, Pillsbury, Rohm and Haas, Ernst and
Young, and General Electric. He is the author of over 30 refereed
journal articles. He is the co-author of the critically acclaimed
textbooks “Intro Stats”, “Stats:
Modeling the World” and “Stats: Data and Models”. Predictive Analytics Insight was fortunate enough to have
this exclusive interview with the guru. PAI: As a statistician,
what do you understand by the term “Predictive Analytics”? RD: Predictive analytics is the ability to model the world and to forecast
using data. It means that you are able to answer key questions about your customers
and about your business. In the past, a traditional shopkeeper knew his customers
inside out, and would be able to predict what they needed from day to day.
This is an example of predictive analytics on a small scale, but it is also
highly intuitive. Predictive analytics tries to distill that same kind of intelligence
for consistent, corporate use through algorithms and models. PAI: How important is the role of statistics within
Predictive Analytics? RD: Many people talk about the perils of information overload,
but it is important to draw a distinction here. Information is
a good thing and you can never have enough. However, most firms
suffer not from too much information, but from too much data.
It is the job of predictive analytics to turn this raw data into
useful information. By analysing it carefully, it is possible
for companies to figure out where they are going. It
all comes down to being able to make the right decisions in an
uncertain world. PAI: Why do we need software to help
distill information from data? RD: One of the biggest reasons for needing an automated process
is that humans tend to bend intelligence to fit in with their
own experiences and anecdotes. This probably has to do with the
way the human brain has evolved as it is constantly seeking connections
and explanations between different events. For example, if we
experience two unlikely events at the same time, we like to presume
there’s a relationship when it could be nothing more than
coincidence. In a business environment, we call this “the
manager’s folly” as it can lead to some highly dubious
decision making. Just because you did something and then something
else happened doesn’t mean you caused it. You need experiments
to determine that. For this reason it is important to create
data models that will help validate hypotheses. PAI: Does predictive analytics require a certain cultural
mind set before it can work? RD: Senior managers have to believe in the power of the information
and be prepared to act upon it, otherwise there is little point
in building models and using them to predict. If an executive
has made the decision already there is little point in collecting
data and building models. For a potential product launch, one
of the first questions I ask executives is whether they will
go ahead with the launch if the data tell them to do it. Most
answer yes, of course, so then I ask them what they’ll
do if the data tell them not to launch. If they tell me that
they’ll go ahead anyway, I might advise them to save a
lot of time and money by not collecting the data and employing
predictive analytics. We need to be open to what the models have
to say. That doesn’t mean we shouldn’t be sceptical,
but we have to be open to incorporating the predictions into
our decision making process or there’s little to gain. PAI: Which industry sector can benefit the most from
Predictive Analytics? RD: Credit card banks and other financial firms are often associated
with the use of Predictive Analytics. But any company that has
business applications, relevant data and access to them can benefit.
Within the company itself, marketing and customer service have
probably made the greatest use of Predictive Analytics up to
now. But in reality, all decision makers are battling against
uncertainty, and could all greatly benefit from high quality
information to boost their chances of success. PAI: Why is it often associated with customer service? RD: Well this is one of the killer applications
of Predictive Analytics, as so many organizations are trying
to view their customers holistically through customer data integration.
The ultimate goal of such systems is to provide a snap shot of
the customer’s value to the organization over their entire
lifetime. This means defining their lifecycle within the firm
in order to maximize cross and up selling opportunities, new
offers, improve retention rates and optimize product portfolios. PAI: How do you get the most out of Predictive Analytics? RD: Managers need to remain focused if they are to get the most
out of Predictive Analytics. The danger is that people often
have unrealistic expectations about what such systems are capable
of achieving. A common occurrence is for business goals to
remain too vague. For example, seeking to simply improve profit
by 20 percent in the next quarter is much too broad an aim
to benefit from analytics unless you have a very direct way
of measuring such efficiency. It is far better to achieve this
goal by focusing on more granular details, such as working
out which customers are more likely to respond to a particular
offer. PAI: So how do you work out what data to focus on? RD: It’s a good idea to pull together
a team of a manageable size across the business to work out the
business problem, and then determine what data you will need
to glean in order to meet the objectives. These teams will naturally
come up with the most relevant questions through open debate.
Once the goals are clear, the question of whether the right data
have been collected becomes clearer. PAI: How do you spot significant trends? RD: Good information is often locked away in data. To release
this information, it is important to build models from data
that will pick out consistent patterns, free of the biases
and anecdotes that humans tend to focus on. Anecdotal information
often consists of nothing more than random events. Assuming
links between them and your actions often leads to incorrect
business decisions. PAI: That’s an example of a predictive analytic
success? RD: For example, by gathering the right data
based on either historical customer behaviour data or demographic
data, banks are able to work out the best time to do their
customers offers such as a new credit card or home refinancing
deals. Or catalogue retailers can work out which customers
are least likely to respond to their catalogues, saving them
a fortune on wasted mail out costs. PAI: How does Predictive Analytics keep data reliable? RD: That is the role of what we call cross-validation. Under
this process, part of the data is used to build the model.
Then the rest is used to see how well the model will work on
data that it hasn’t seen before. In this way it is possible
to select the model that produces the best results based on
unseen data providing a completely objective analysis. You
can then start to predict behavioural patterns to retest the
model. As the environment changes and new data become available,
this kicks off a constant process of honing and fine tuning.
The data set becomes akin to a living organism and you have
to feed it with new data to make it perform better and better.
With Predictive Analytics you can query data ad infinitum,
which is a key advantage over simple database queries. PAI: How do you go about deploying your first Predictive
Analytics project? RD: It’s important to resist the temptation of starting
these projects on an enterprise-wide scale, as you can’t
do everything at once. It’s much better to begin with bite-sized
projects that will deliver impressive pay offs and create success
stories to gain further buy in. It’s important to remember
that Predictive Analytics is a discipline and a process, as well
as technology. PAI: What’s the most exciting thing about Predictive
Analytics? RD: It means that valuable information is
being put in the hands of people who need to make decisions,
as it becomes a desktop application. Historically, data was
given to a group of analysts or statistics experts but they
would fail to forward it or communicate the results to the
people who could really use it. PAI: What are the most common causes of failure? RD: The most likely sticking point is a lack of teamwork as getting
it right is a team effort. The first step is to define the
business objective. It’s imperative to have a specific
question in mind rather than giving in to the temptation to “see
what’s in the data”. And you have to prepare the
data for use. You need to collect the right data and have it
ready to feed into the system, otherwise the whole project
is on hold. Data management is the next big hurdle. Data sets are large
and complex and you often need to work hard on merging data from
three or four different sources. The system will only deliver
meaningful results if the quality of the data is sufficiently
high. Giving up too soon is also an occupational hazard. It sometimes
takes a lot of patience to pick up minute fluctuations in the
behavioural norm against a lot of background noise, but it is
worth persevering. Having said that, knowing when to give up
can be just as important. Sometimes the answer to a specific
question is simply not within the data’s capability. It’s also important not to let human preconceptions and
prejudice interfere with the findings as the data can often produce
surprising and unexpected results that can be very illuminating
and valuable when fully analyzed. It’s wrong to jump to
conclusions, but it’s good to generate hypotheses. PAI: How quickly can you see a return on investment? RD: The speed of the return on investment depends on a whole
host of variables: how integrated is IT with the rest of the
organization, what level of management is involved and what
the competition is doing. Rather than focus on ROI, I often
ask firms whether they can afford not to predict where they
are going. This sometimes helps them get things into perspective. |