Adversarial User Examples

Adversarial User Examples#

“Economics can be summed up in four words: people respond to incentives. The rest is commentary.”

Stephen Landsburg

Whenever we deploy a model or algorithm to make decisions, that deployment changes what behavior is rewarded — that is, that deployment changes the incentives faced by actors. And people respond to incentives. Indeed, even when we try to design our model to reward exactly what we want, alignment issues are sure to arise, creating space between what we want our model to reward and what it actually rewards.

It is for this reason that an empirical regularity may be a good basis for answering Passive Predictive Questions in historical data, but often fails once the model is actually deployed — because deployment itself causes changes in behavior.

To illustrate just how pervasive this is, this reading details a set of examples of adversarial users in domains ranging from internet search to car insurance.

Search Engine Optimization#

The history of Google and other search engines is, essentially, a history of adversarial users ruining things for everyone. Have you ever wondered why, when you search for a recipe online, you have to scroll through paragraphs of pointless narrative before you get to the actual recipe? Or why YouTube thumbnails are full of shocked faces and clickbait titles? In short: adversarial users!

Google’s perpetual challenge is to (a) find features that identify the websites that users want to see at the top of their search results, (b) update their search algorithm to up-rank sites with those features, then (c) find new features to use as everyone figures out what features Google is rewarding and adds them to their spammy sites.

In the beginning, for example, Google’s first ranking algorithm — PageRank — essentially up-ranked sites that were linked to by other sites.[1] The more the web seemed to “like” a site, the higher it would rank in Google! Essentially, it outsourced the evaluation of website quality to web itself, generating results of a quality that quickly turned “Google” from a noun to a verb.

It wasn’t long, though, before people realized there was a way to game this system. If a site owner could increase the number of links to their site, they could increase their ranking and, thus, their site traffic. So people started creating websites just to create links to the page on which they made money. Entire ecosystems emerged of people and sites linking to one another to “artificially” boost rankings, a practice known as Search Engine Optimization (SEO).

Google, of course, noticed this and shifted metrics. Over the years, Google has been forced to turn to a nearly endless number of different heuristics for evaluating page quality, including things like “time users spend on a page” or “number of user clicks on a result.” Each time this occurred, adversarial users looked for ways to game the system, and even well-intentioned websites were forced to join the rat race as well, often making their sites worse to ensure they could compete with the “high scores” being generated by bad actors.

(If you’re interested in multitudinous ways in which SEO is responsible for how the internet looks and feels today, you can do no better than this recent feature from The Verge.)

Zillow#

In 2021, the real estate information website Zillow announced that it was shutting down an initiative to use its price models to buy and flip US residential houses, an initiative it later admitted had lost a staggering $881 million dollars in 2021 alone.

So what went wrong? Portions of Zillow’s loss appear to have been the result of over-confidence in their ability to predict overall housing market movements. But as pointed out by a group of finance professors at the Stanford Graduate School of Business, another major contributor was likely to have been a phenomenon known as “adverse selection,” a special flavor of the adversarial user problem.

To understand what happened, put yourself in Zillow’s shoes — you have a model that’s quite good at predicting the price at which houses will sell (the estimates from this model are referred to as “Zestimates”). So good, in fact, that you think you could make some money by offering to pay homeowners cash to buy their homes at a discount to your “Zestimate” of the home’s value, then flip the house for what it’s really worth.

(If you’ve never done it before, selling a home in the US is an incredibly complicated and time-consuming process, so it’s not unreasonable to expect many people would accept a slightly low price in exchange for a quick sale.)

So you pour billions into the house flipping business, using your Zestimate model to decide what homes are worth. And… well, you know how this ends: by the end of 2021, you’d lost 881 million dollars. But why?

Well, imagine there are two houses in similar neighborhoods with similar square footage, the same number of beds and bathrooms, the same school system, etc. Suppose your Zestimate for both homes — based on past sales and all publicly available data on the house and a few questions you ask the sellers — is $350,000. So you offer them both $300,000 for their homes, figuring you’ll make a profit of $50,000 on each.

But as a data scientist, you know that all models are imperfect. Sure, on average, your Zestimates are dead on, but neither of these homes is probably worth exactly $350,000. Suppose the first home — Home A — has a truly beautiful view over one of the best parks in the city, and while it’s not too far from the freeway, there is a set of tall apartment buildings between the home and the freeway that block all traffic noise. Let’s suppose that because of all of these factors — factors that aren’t available to the Zillow algorithm — the true value of Home A is actually $450,000.

Now let’s consider Home B. It’s the same distance from the freeway, but where tall apartment buildings block traffic noise from Home A, the local geography channels the noise right at Home B. Moreover, while every other home on the street has a great view of the nearby park, right across the street from Home B is a city electrical utility station. The house also has fewer windows, and all are blocked by neighbor’s homes. Again, because of all these factors that aren’t available to the Zillow algorithm, the true value of Home B is actually only $250,000.

So when you, Zillow, make offers of $300,000 to the owners of both Home A and Home B, what do you think each owner will do? Well, obviously, the owner of Home A is gonna think, “I’m getting low-balled. No way I’m selling for $300,000.” And the owner of Home B is gonna say, “holy cow, what are these idiots thinking? Yes! Please! I will absolutely sell you my house for $300,000!” And Zillow will lose at least $50,000 on Home B even before it has to pay fees and taxes. And that’s how companies lose hundreds of millions trying to buy and sell homes at scale based on models, a phenomenon that’s not only impacted Zillow but also other companies that tried to do something similar during this period.

Adverse Selection and Asymmetric Information#

Economists use the term adverse selection to describe this phenomenon. Homeowners were selecting whether to accept Zillow’s offer and doing so in a way that was adverse (bad) for Zillow. Deals that were bad for Zillow were especially likely to happen because the homeowners in our example had private information about the value of their homes that was unavailable to Zillow’s algorithm. In economics, we refer to this as asymmetric information, and it happens everywhere.

Insurance#

Another context in which adversarial users are a problem (again due to adverse selection concerns) is insurance. For example, consider car insurance. In the United States, car insurance is provided by private companies, and these private companies have to be sure that the total amount of money they take in monthly premiums from all their clients is enough to cover what they pay out to clients who experience accidents. That means that the more clients a car insurance company has that are safe drivers, the less they will have to pay out for car repairs, and thus the lower they can set their monthly premiums and the more generous they can be with deductibles.

BUT: most people have a sense of whether they are good drivers or not, and the people who get the most out of car insurance are the people who get into a lot of accidents. So if you’re an insurance company, how do you ensure that you aren’t swamped by bad drivers, especially while also offering as low a price as possible to win business?

Reduce the Information Asymmetry#

With car insurance, the main strategy companies use is to try and reduce the information asymmetry that gives rise to adverse selection (i.e., learn as much as they can about whether a potential customer is a good driver). Car insurance companies collect as much information on clients as they can before setting a price, like customer accident and traffic ticket histories. Insurance companies also offer discounts to drivers willing to install a device on their car that monitors how they drive (e.g., do they speed, slam the brakes, etc.) if they show evidence someone is a good driver. One of my students even reported that when they applied for car insurance, the company even asked for the GPA!

Insurance companies will even go as far as to secretly buy data that car manufacturers are quietly (and in some cases, illegally) collecting on individual’s driving behavior.

Limit Opportunities for Selection and Sorting#

Attempting to reduce information asymmetries is one strategy for avoiding adverse selection; another strategy is to limit opportunities for customers to shop around for the policy that is best for them. That’s the strategy commonly employed by US health insurance companies. If customers could change health insurance any time they wanted, a strategic customer would not enroll in health insurance (or enroll in a low-premium, high deductible, high co-pay policy) when they were healthy, and if they got sick, switch to a more expensive but more generous policy.

To prevent this, US health insurance policies only allow people to enroll or change their policies once a year (during an “open enrollment” period, which is usually October).[2] This limits customers’ ability to change policies in response to sudden changes in their health status.

Similarly, some health insurance policies are only open to qualified customers. For example, Duke has several health insurance policies only available to Duke faculty and employees. Most people who are especially prone to illness can’t just “become a Duke employee,” so adverse selection is less of a concern for this group of customers. That helps ensure Duke employees don’t end up with excessive medical bills, allowing the insurance provider to provide Duke employees with lower-cost insurance.