How to Match#

In this reading, I’ll give a high level summary of how matching works before referring to a youtube lesson a the nitty gritty of a few specific implementations.

Pruning Your Data#

As noted in our last reading (this reading is a follow-on to The Why of Matching, so if you haven’t read that, start there), matching could be more appropriately called “pruning,” as the goal is to winnow down your dataset until you have a set of observations for which your control and treatment variables look very similar in terms of observable characteristics. So how do we do that?

A simple matching algorithm would proceed like this:

  1. Loop over all your treated observations.

  2. For each treated observation, look for the most similar untreated observation (not already in a pair) in terms of your control variables.

  3. If that untreated observation is too dissimilar to the treated observation, throw away the treated observation. (As the user you have to pick a threshold for how dissimilar is ok).

  4. If not, call them a pair and keep both.

  5. When you’ve finished looping over your treated observations, throw away any unpaired untreated observations.

When you’re done, you’ll have a collection of pairs of observations (one treated, one untreated), where the members of each pair are very similar in terms of their observable control variables. All other data has been thrown away.

To illustrate with the example from the last reading, if we started with this data: matching_king_1

A simple matching algorithm would probably prune it down to something like this:

matching_king_4

Then, and here’s the cool part, you take this dataset and analyze just the way you would otherwise! Just run your regression on this dataset!

Measuring Similarity#

The biggest decision you have to make when doing matching is deciding how you want to measure whether two observations are “similar.”

The most simple, commonly used strategy is what is called “Mahalanobis Distance Matching.” In DMD, distance between two observations is measured by:

Calculate the difference between the observations in their value of each explanatory variable (so in the toy example above, the difference in the value of education). Normalize that difference by dividing the difference by the standard deviation of the explanatory variable. This helps ensure that the units of each explanatory variable don’t matter. Squared those normalized values, add them up, and take the square root of the sum.

(So basically, calculate Euclidean distance in terms of units normalized by variables’ standard deviations).

Of course this is not the only strategy – the video linked at the bottom of this reading will direct you to a talk on three very good strategies, as well as their strengths and weaknesses – but that’s the idea.

When Can / Should I Use Matching?#

Matching is best used in a somewhat odd situation: a place where you have some overlap in what your treated and untreated observations look like (called having “common support”) but where you also have some areas where they don’t overlap (imbalance).

The first is necessary because when you prune your data, the goal is to keep only observations that look similar, so you need some area of overlap, or you won’t have anything to match!

At the same time, however, if there’s no imbalance, then you don’t really need to do matching.

So in other words, you use matching in situations where the distribution of explanatory variables have both areas of overlap and areas of imbalance.

Checking Balance#

There’s a balance you have to strike when matching: the more strict you are about the maximium dissimilarity you’re willing to include before you throw out a pair of observations, the more balanced your final dataset will be, but the smaller your dataset will be do.

Right? If you reject any pairs that aren’t almost exactly identical, you’ll end up with less data, but what’s left will be more balanced.

So for your application, you have to decide on whether it’s better to have the statistical power of more observations, or the better balance from fewer.

Analyze!#

Now the best part of matching: now you just do do what you would have done normally.

In other words, you can think of this as a kind of “pre-processing step”, and now you can carry forward by feeding this into a regression just the way you would with the original data.

Specific Models#

OK, for the details of a few common models, please go watch this great video by Gary King – you can probably start 15 minuntes in, and should watch till at least 45 minutes, though what follows is also really interesting!

Gary King on Matching