Solving The Wrong Problem: Examples

Solving The Wrong Problem: Examples#

In the last section, we discussed why understanding your problem is so important, reasons data scientists often fail to do so, and a few suggestions for ways to improve your understanding of the problem you are trying to solve.

In this section, we will work a longer example to illustrate what “getting the problem wrong, then adapting and getting it right” looks like in practice.

Pizza Ltd. Advertising#

Pizza Ltd. is a (fictitious) pizza delivery chain interested in improving their online sales. Last year they increased their online advertising budget three-fold, but saw almost no change in their online sales, despite increasing in-store sales.

You have been hired to help improve the effectiveness of their advertising. Pizza Ltd. provides you with data on their previous advertising campaigns, including information on ad impressions and clicks broken down by user demographics, geography, and past interactions with Pizza Ltd.

“Well,” you reason, “maybe the problem is that Pizza Ltd’s ads aren’t being shown to the right people. After all, it seems unlikely that any ad for pizza—no matter how appealing—is likely to draw a click if it’s shown to a 75-year-old at 7 am.” And sure enough, the data provided by Pizza Ltd shows that they are not doing a lot of ad targeting — their ads are being shown to an extremely diverse set of users, including many who probably aren’t that interested in pizza!

Using the data provided, you train a model to answer the question, “given a user’s demographics and online behavior, how likely are they to click on a Pizza Ltd. ad?” You try out a few different models, tune the model parameters, and eventually settle on a neural network model with extremely high precision and recall. Hooray!

As expected, the model shows that Pizza Ltd. was showing too many ads to people who were probably not even that interested in pizza, when they should have been targeting people who have ordered pizza from Pizza Ltd in the past, people searching for “Pizza Ltd,” and people who live close to Pizza Ltd locations.

You hand over your model to Pizza Ltd, who immediately reallocate their ads based on their models. Within a week, Pizza Ltd. sees that the share of ad impressions that result in clicks and pizza purchases has increased five-fold. Everyone congratulates you, and you move on to the next project feeling very smug.

The Other Shoe Drops#

A few months later, you are called into a meeting with the Pizza ltd advertising team, online sales team, and the company’s Chief Financial Officer (CFO). They’ve been looking over the numbers, and despite the huge rise in ad clicks, ad clicks per impression, ad clicks per dollar spent, and clicks that result in sales, when they crunch their quarterly sale numbers they find that, to their surprise, overall online sales haven’t risen at all. Moreover, in-store sales are stable, searches for Pizza Ltd. haven’t declined, and social media sentiment and posting rates all seem stable, suggesting the fact overall sales haven’t risen isn’t related to a decline in overall demand.

So, what went wrong?

OK, this is the place in most books where the authors ask you that question, and you look up at the ceiling for a minute, shrug, and then read on.

But I’m really, really serious about this: close your laptop, stand up, set a 5-minute timer on your phone, and go for a walk. Ponder this example. See if you can figure out what’s going on. This is precisely the kind of problem you will soon face as a professional data scientist, so why not practice trying to think through the problem?

Counter-Factual Advertising#

So how should Pizza Ltd. have approached solving their problem? The answer — as we’ll explore in detail in later readings — is that they should have run an A/B experiment. Track a group of users, and show a random subset of those users a Pizza Ltd. ad. Then measure the effect of the ads by comparing online purchase rates between the group that saw ads and the group that didn’t.

This data can then be used to improve targeting by looking at the difference in purchase rates between the group that saw the ad and the group that didn’t for different demographic sub-populations (younger users, men versus women, users in different geographic areas, etc.). And of course this strategy can also be used to test different ads to figure out what ad is most effective.

This idea — that the goal of ads is to have an effect on consumer purchase behavior, not to be clicked on — is often referred to as “counter-factual advertising,”[3] and it’s the basis for how nearly all major advertising platforms work today.

It’s also why companies like Meta and Google are so eager to track user behavior across apps and websites. To demonstrate the effectiveness of ads, these companies need to be able to not only track users after they click an ad (to see whether they eventually make a purchase), but also track users who haven’t seen an ad (so they can establish a behavioral baseline for the “control” group of users who haven’t seen an ad). This allows these companies to estimate the true effect of ads on sales, data they use to improve ad targeting and justify higher prices to advertisers.

Solving The Wrong Problem: Examples

Contents

Solving The Wrong Problem: Examples#

Pizza Ltd. Advertising#

The Other Shoe Drops#

Using Our Problem Refinement Skills#

Counter-Factual Advertising#