{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Fixed Effects: Indicator Variables for Groups\n",
    "\n",
    "One common use of indicator variables are as *fixed effects*. Fixed effects are used when our data as a \"nested\" structure (we think individual observations belong to groups), and we suspect different things may be happening in each group. \n",
    "\n",
    "For example, suppose we have a dataset of student test scores, and students are all grouped into different schools; or perhaps we have data on earnings and gender across US cities. In these examples, individual observations can be thought of as being grouped into schools or cities. \n",
    "\n",
    "One option with this kind of data is to just ignore the groups. For example, if we want to know about differences in the academic performance of minority children across the school system, then we might not want to add controls for students' schools because we think that part of way race impacts performance is though sorting of minority students into worse schools. If we added school fixed effects, we'd lose that variation. \n",
    "\n",
    "But suppose we were interested in understanding whether school administrators treat minority children differently, and whether this affects academic performance. Principles, for example, may be more likely to suspect Black children than White children. If that were our interest, then what we really want to know about is how race impacts academic performance *among students in the same school*. And that's where fixed effects are useful -- they let us control for group-level effects (like the fact all children in one school might tend to get lower grades) so we can focus on explaining *intra-group* variation (differences among children *at the same school*).  \n",
    "\n",
    "In this regard, fixed effects are analogous in purpose to hierarchical models, though they are slightly different in implementation (differences between fixed effects and hierarchical models are [discussed here](fixed_effects_v_hierarchical.ipynb)). \n",
    "\n",
    "## Implementing Fixed Effects\n",
    "\n",
    "To illustrate, let's try and estimate how gender impacts earnings in the US using data from the US Current Population Survey (CPS) on US wages in 2019. We'll begin with a simple model of earnings:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "# Load survey\n",
    "cps = pd.read_stata(\n",
    "    \"https://github.com/nickeubank/MIDS_Data/blob/\"\n",
    "    \"master/Current_Population_Survey/morg18.dta?raw=true\"\n",
    ")\n",
    "\n",
    "# Limit to people currently employed and working full time.\n",
    "cps = cps[cps.lfsr94 == \"Employed-At Work\"]\n",
    "cps = cps[cps.uhourse >= 35]\n",
    "\n",
    "# Annual earnings from weekly\n",
    "cps[\"annual_earnings\"] = cps[\"earnwke\"] * 48\n",
    "\n",
    "# And create gender and college educ variable\n",
    "cps[\"female\"] = (cps.sex == 2).astype(\"int\")\n",
    "cps[\"has_college_educ\"] = (cps.grade92 > 43).astype(\"int\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>OLS Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>     <td>annual_earnings</td> <th>  R-squared:         </th>  <td>   0.170</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>                   <td>OLS</td>       <th>  Adj. R-squared:    </th>  <td>   0.170</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>             <td>Least Squares</td>  <th>  F-statistic:       </th>  <td>   8393.</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>             <td>Thu, 14 Mar 2024</td> <th>  Prob (F-statistic):</th>   <td>  0.00</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>                 <td>12:47:27</td>     <th>  Log-Likelihood:    </th> <td>-1.4365e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>No. Observations:</th>      <td>122603</td>      <th>  AIC:               </th>  <td>2.873e+06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Df Residuals:</th>          <td>122599</td>      <th>  BIC:               </th>  <td>2.873e+06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Df Model:</th>              <td>     3</td>      <th>                     </th>      <td> </td>     \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Covariance Type:</th>      <td>nonrobust</td>    <th>                     </th>      <td> </td>     \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>t</th>      <th>P>|t|</th>  <th>[0.025</th>    <th>0.975]</th>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>        <td> 3.639e+04</td> <td>  296.455</td> <td>  122.766</td> <td> 0.000</td> <td> 3.58e+04</td> <td>  3.7e+04</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>female</th>           <td>-1.168e+04</td> <td>  170.390</td> <td>  -68.540</td> <td> 0.000</td> <td> -1.2e+04</td> <td>-1.13e+04</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>age</th>              <td>  407.7144</td> <td>    6.406</td> <td>   63.650</td> <td> 0.000</td> <td>  395.160</td> <td>  420.269</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>has_college_educ</th> <td> 3.054e+04</td> <td>  239.482</td> <td>  127.513</td> <td> 0.000</td> <td> 3.01e+04</td> <td>  3.1e+04</td>\n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "  <th>Omnibus:</th>       <td>18030.472</td> <th>  Durbin-Watson:     </th> <td>   1.805</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Prob(Omnibus):</th>  <td> 0.000</td>   <th>  Jarque-Bera (JB):  </th> <td>27634.350</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Skew:</th>           <td> 1.059</td>   <th>  Prob(JB):          </th> <td>    0.00</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Kurtosis:</th>       <td> 3.963</td>   <th>  Cond. No.          </th> <td>    159.</td> \n",
       "</tr>\n",
       "</table><br/><br/>Notes:<br/>[1] Standard Errors assume that the covariance matrix of the errors is correctly specified."
      ],
      "text/latex": [
       "\\begin{center}\n",
       "\\begin{tabular}{lclc}\n",
       "\\toprule\n",
       "\\textbf{Dep. Variable:}     & annual\\_earnings & \\textbf{  R-squared:         } &      0.170   \\\\\n",
       "\\textbf{Model:}             &       OLS        & \\textbf{  Adj. R-squared:    } &      0.170   \\\\\n",
       "\\textbf{Method:}            &  Least Squares   & \\textbf{  F-statistic:       } &      8393.   \\\\\n",
       "\\textbf{Date:}              & Thu, 14 Mar 2024 & \\textbf{  Prob (F-statistic):} &      0.00    \\\\\n",
       "\\textbf{Time:}              &     12:47:27     & \\textbf{  Log-Likelihood:    } & -1.4365e+06  \\\\\n",
       "\\textbf{No. Observations:}  &      122603      & \\textbf{  AIC:               } &  2.873e+06   \\\\\n",
       "\\textbf{Df Residuals:}      &      122599      & \\textbf{  BIC:               } &  2.873e+06   \\\\\n",
       "\\textbf{Df Model:}          &           3      & \\textbf{                     } &              \\\\\n",
       "\\textbf{Covariance Type:}   &    nonrobust     & \\textbf{                     } &              \\\\\n",
       "\\bottomrule\n",
       "\\end{tabular}\n",
       "\\begin{tabular}{lcccccc}\n",
       "                            & \\textbf{coef} & \\textbf{std err} & \\textbf{t} & \\textbf{P$> |$t$|$} & \\textbf{[0.025} & \\textbf{0.975]}  \\\\\n",
       "\\midrule\n",
       "\\textbf{Intercept}          &    3.639e+04  &      296.455     &   122.766  &         0.000        &     3.58e+04    &      3.7e+04     \\\\\n",
       "\\textbf{female}             &   -1.168e+04  &      170.390     &   -68.540  &         0.000        &     -1.2e+04    &    -1.13e+04     \\\\\n",
       "\\textbf{age}                &     407.7144  &        6.406     &    63.650  &         0.000        &      395.160    &      420.269     \\\\\n",
       "\\textbf{has\\_college\\_educ} &    3.054e+04  &      239.482     &   127.513  &         0.000        &     3.01e+04    &      3.1e+04     \\\\\n",
       "\\bottomrule\n",
       "\\end{tabular}\n",
       "\\begin{tabular}{lclc}\n",
       "\\textbf{Omnibus:}       & 18030.472 & \\textbf{  Durbin-Watson:     } &     1.805  \\\\\n",
       "\\textbf{Prob(Omnibus):} &    0.000  & \\textbf{  Jarque-Bera (JB):  } & 27634.350  \\\\\n",
       "\\textbf{Skew:}          &    1.059  & \\textbf{  Prob(JB):          } &      0.00  \\\\\n",
       "\\textbf{Kurtosis:}      &    3.963  & \\textbf{  Cond. No.          } &      159.  \\\\\n",
       "\\bottomrule\n",
       "\\end{tabular}\n",
       "%\\caption{OLS Regression Results}\n",
       "\\end{center}\n",
       "\n",
       "Notes: \\newline\n",
       " [1] Standard Errors assume that the covariance matrix of the errors is correctly specified."
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                            OLS Regression Results                            \n",
       "==============================================================================\n",
       "Dep. Variable:        annual_earnings   R-squared:                       0.170\n",
       "Model:                            OLS   Adj. R-squared:                  0.170\n",
       "Method:                 Least Squares   F-statistic:                     8393.\n",
       "Date:                Thu, 14 Mar 2024   Prob (F-statistic):               0.00\n",
       "Time:                        12:47:27   Log-Likelihood:            -1.4365e+06\n",
       "No. Observations:              122603   AIC:                         2.873e+06\n",
       "Df Residuals:                  122599   BIC:                         2.873e+06\n",
       "Df Model:                           3                                         \n",
       "Covariance Type:            nonrobust                                         \n",
       "====================================================================================\n",
       "                       coef    std err          t      P>|t|      [0.025      0.975]\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept         3.639e+04    296.455    122.766      0.000    3.58e+04     3.7e+04\n",
       "female           -1.168e+04    170.390    -68.540      0.000    -1.2e+04   -1.13e+04\n",
       "age                407.7144      6.406     63.650      0.000     395.160     420.269\n",
       "has_college_educ  3.054e+04    239.482    127.513      0.000    3.01e+04     3.1e+04\n",
       "==============================================================================\n",
       "Omnibus:                    18030.472   Durbin-Watson:                   1.805\n",
       "Prob(Omnibus):                  0.000   Jarque-Bera (JB):            27634.350\n",
       "Skew:                           1.059   Prob(JB):                         0.00\n",
       "Kurtosis:                       3.963   Cond. No.                         159.\n",
       "==============================================================================\n",
       "\n",
       "Notes:\n",
       "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n",
       "\"\"\""
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import statsmodels.formula.api as smf\n",
    "\n",
    "smf.ols(\"annual_earnings ~ female + age + has_college_educ\", cps).fit().summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this model, we're getting estimates of how education and gender explain variation across all Americans. \n",
    "\n",
    "But in this dataset, we also have a variable that tells us the industry in which each respondent is employed. If we want to understand the relationship between gender and income through *both* workplace bias and sectoral sorting, we can use the model above. But suppose we want to estimate wage discrimination in the workplace after controlling for the industry into which someone chooses to work. In other words, we want to know about the impact of gender on wages *within industries*. \n",
    "\n",
    "To do so, we can add an indicator for each respondent's industry (in the `ind02` variable):"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then we can run the following regression:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```python\n",
    "smf.ols(\n",
    "    \"annual_earnings ~ female + age + has_college_educ + C(ind02)\", cps\n",
    ").fit().summary()\n",
    "```"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "which will generate output that will look approximately like this (note your output will be VERY long—I'm omitting all the industry coefficients for space. We'll talk later about how to suppress those in your output):\n",
    "\n",
    "\n",
    "![cps_fe_unclustered_p1](images/cps_fe_unclustered_p1.png)\n",
    "\n",
    "```\n",
    "               .\n",
    "               .\n",
    "               .\n",
    "```\n",
    "![cps_fe_unclustered_p2](images/cps_fe_unclustered_p2.png)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Voilà! What you've just estimated is no longer the relationship between gender and income across all Americans, but rather the relationship between gender and income *within each industry*. \n",
    "\n",
    "To be clear, fixed effects aren't *mathematically* different from adding a normal control variable. One could say that adding `has_college_educ` means that we're now estimating the relationship between gender and income among college educated and among non-college educated. *Mechanically*, fixed effects are just additional indicator variables. But because we often use them for groups, thinking about the fact that, when added, one is effectively estimating variation *within* the groups specified by the fixed effects is a powerful idea. \n",
    "\n",
    "Perhaps no place is this more clear than in full panel data, where you have data on the same entities over time. In a panel regression, the addition of entity fixed effects allow you to difference out any *constant* differences between entities, and focus only on changes within each entity over time. This even works for people! In a panel with individuals observed over time, adding individual fixed effects means you're effectively controlling for anything constant about each individual (things that don't change over time), and now you're just studying *changes over time* for each individual. "
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Clustering\n",
    "\n",
    "When working with fixed effects, however, it's also often a good idea to cluster your standard errors by your fixed effect variable. Clustering is a method for taking into account some of the variation in your data isn't coming from the individual level (where you have lots of observations), but rather from the group level. Since you have fewer groups than observations, clustering corrects your standard errors to reflect the smaller effective sample size being used to estimate those fixed effects (clustering *only* affects standard errors -- it has no impact on coefficients themselves. This is just about adjustments to our confidence in our inferences). \n",
    "\n",
    "Clustering is thankfully easy to do—just use the `get_robustcov_results` method from `statsmodels`, and use the `groups` keyword to pass the group assignments for each observation. \n",
    "\n",
    "(R users: as we'll discuss below, I think the easiest way to do this is to use the [plm package](https://cran.r-project.org/web/packages/plm/vignettes/plmPackage.html).) \n",
    "\n",
    "**TWO IMPORTANT IMPLEMENTATION NOTES:**\n",
    "\n",
    "\n",
    "1. First, if you're using formulas in statsmodels, the regression is automatically dropping observations that can't be estimated because of missing data, so you have to do the same before passing your group assignments to-`get_robustcov_results`—otherwise you'll get the error:\n",
    "\n",
    "```\n",
    "ValueError: The weights and list don't have the same length.\n",
    "```\n",
    "\n",
    "because the number of observations in the model doesn't match the number of observations in the group assignment vector you pass!\n",
    "\n",
    "2. Whatever you pass to `groups` has to be a numeric array of group identifiers. If you don't, you'll get an error like:\n",
    "\n",
    "```\n",
    "TypeError: '<' not supported between instances of 'float' and 'str'\n",
    "```\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2     222\n",
       "3     201\n",
       "4     220\n",
       "6     158\n",
       "17    141\n",
       "dtype: int16"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.ols(\n",
    "    \"annual_earnings ~ female + age + has_college_educ + C(ind02)\", cps\n",
    ").fit()\n",
    "\n",
    "# Drop any entries with missing data from the model\n",
    "fe_groups = cps.copy()\n",
    "for i in [\"annual_earnings\", \"female\", \"age\", \"ind02\", \"has_college_educ\"]:\n",
    "    fe_groups = fe_groups[pd.notnull(fe_groups[i])]\n",
    "\n",
    "# Convert `ind02` categorical into group codes by\n",
    "# pulling codes used in its categorical encoding.\n",
    "\n",
    "# If you have a string instead of a categorical,\n",
    "# just make it a categorical first with `pd.Categorical()`\n",
    "group_codes = fe_groups.ind02.cat.codes\n",
    "group_codes.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```python\n",
    "model.get_robustcov_results(cov_type=\"cluster\", groups=group_codes).summary()\n",
    "```"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![cps_fe_clustered_p1](images/cps_fe_clustered_p1.png)\n",
    "\n",
    "```\n",
    "               .\n",
    "               .\n",
    "               .\n",
    "```\n",
    "\n",
    "![cps_fe_clustered_p2](images/cps_fe_clustered_p2.png)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see, while our point estimates haven't changed at all (the coefficient on `female`, for example, is still $\\sim$-10,650), we have increased the size of our standard errors. The SE on `female`, for example, has gone from 182 without clustering to 583 with clustering."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Computationally Efficient Fixed Effects\n",
    "\n",
    "OK, so everything we've describe up till here is a reasonable approach to fixed effects, but it has two limitations: our regression output looks *terrible*, and computing all those intercepts was slow.\n",
    "\n",
    "This brings us to some of the specialized methods for calculating fixed effects. It turns out that if you aren't interested in the coefficient on each fixed effect, there are much more computationally efficient methods of calculating fixed effects. But to use them, we'll have to use a different library: [linearmodels](https://bashtage.github.io/linearmodels/doc/index.html) (installable using `conda install linearmodels` or `pip install linearmodels`).\n",
    "\n",
    "(R users: see note at bottom on doing this in R)\n",
    "\n",
    "In particular, we'll be using the `PanelOLS` function from `linearmodels`. As the name implies, `PanelOLS` is designed for linear regression (social scientists call linear regression Ordinary Least Squares, or OLS) with panel data, which is really any form of data organized along two dimensions. Normally a panel has data on many entities observed several times, so the first dimension is the `entity` dimension, and the second is the `time` dimension. \n",
    "\n",
    "In this case, we don't really have a panel—just nested data—but because fixed effects are commonly used in panels, we'll use this tool. \n",
    "\n",
    "The only catch is: you have to use `multiindexes` in `pandas`. I *know*, I hate them too. But the multi-index is required by the library for it to understand what variable constitutes the \"group\" for which you want to add fixed effects. Basically `PanelOLS` calls the first level of the multi-index the `entity` and the second level `time`. In this case, though, we'll just make the first level our counties, and the second level individual identifiers, then use `entity` fixed effects (and clustering)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>hhid</th>\n",
       "      <th>intmonth</th>\n",
       "      <th>hurespli</th>\n",
       "      <th>hrhtype</th>\n",
       "      <th>minsamp</th>\n",
       "      <th>hrlonglk</th>\n",
       "      <th>hrsample</th>\n",
       "      <th>hrhhid2</th>\n",
       "      <th>serial</th>\n",
       "      <th>hhnum</th>\n",
       "      <th>...</th>\n",
       "      <th>ch35</th>\n",
       "      <th>ch613</th>\n",
       "      <th>ch1417</th>\n",
       "      <th>ch05</th>\n",
       "      <th>ihigrdc</th>\n",
       "      <th>docc00</th>\n",
       "      <th>dind02</th>\n",
       "      <th>annual_earnings</th>\n",
       "      <th>female</th>\n",
       "      <th>has_college_educ</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>000110339935453</td>\n",
       "      <td>January</td>\n",
       "      <td>1.0</td>\n",
       "      <td>Unmarried civilian female primary fam householder</td>\n",
       "      <td>MIS 4</td>\n",
       "      <td>MIS 2-4 Or MIS 6-8 (link To</td>\n",
       "      <td>0701</td>\n",
       "      <td>07011</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>12.0</td>\n",
       "      <td>Office and administrative support occupations</td>\n",
       "      <td>Health care services , except hospitals</td>\n",
       "      <td>43344.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>000110339935453</td>\n",
       "      <td>January</td>\n",
       "      <td>1.0</td>\n",
       "      <td>Unmarried civilian female primary fam householder</td>\n",
       "      <td>MIS 4</td>\n",
       "      <td>MIS 2-4 Or MIS 6-8 (link To</td>\n",
       "      <td>0701</td>\n",
       "      <td>07011</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>12.0</td>\n",
       "      <td>Office and administrative support occupations</td>\n",
       "      <td>Administrative and support services</td>\n",
       "      <td>19200.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>000110359424339</td>\n",
       "      <td>January</td>\n",
       "      <td>1.0</td>\n",
       "      <td>Unmarried civilian female primary fam householder</td>\n",
       "      <td>MIS 4</td>\n",
       "      <td>MIS 2-4 Or MIS 6-8 (link To</td>\n",
       "      <td>0711</td>\n",
       "      <td>07111</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Healthcare practitioner and technical occupations</td>\n",
       "      <td>Hospitals</td>\n",
       "      <td>60000.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>000110651278174</td>\n",
       "      <td>January</td>\n",
       "      <td>1.0</td>\n",
       "      <td>Civilian male primary individual</td>\n",
       "      <td>MIS 8</td>\n",
       "      <td>MIS 2-4 Or MIS 6-8 (link To</td>\n",
       "      <td>0601</td>\n",
       "      <td>06011</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>12.0</td>\n",
       "      <td>Transportation and material moving occupations</td>\n",
       "      <td>Transportation and warehousing</td>\n",
       "      <td>32640.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>007680515071194</td>\n",
       "      <td>January</td>\n",
       "      <td>1.0</td>\n",
       "      <td>Civilian male primary individual</td>\n",
       "      <td>MIS 8</td>\n",
       "      <td>MIS 2-4 Or MIS 6-8 (link To</td>\n",
       "      <td>0611</td>\n",
       "      <td>06112</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>12.0</td>\n",
       "      <td>Transportation and material moving occupations</td>\n",
       "      <td>Retail trade</td>\n",
       "      <td>38400.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 101 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "               hhid intmonth  hurespli  \\\n",
       "2   000110339935453  January       1.0   \n",
       "3   000110339935453  January       1.0   \n",
       "4   000110359424339  January       1.0   \n",
       "6   000110651278174  January       1.0   \n",
       "17  007680515071194  January       1.0   \n",
       "\n",
       "                                              hrhtype minsamp  \\\n",
       "2   Unmarried civilian female primary fam householder   MIS 4   \n",
       "3   Unmarried civilian female primary fam householder   MIS 4   \n",
       "4   Unmarried civilian female primary fam householder   MIS 4   \n",
       "6                    Civilian male primary individual   MIS 8   \n",
       "17                   Civilian male primary individual   MIS 8   \n",
       "\n",
       "                       hrlonglk hrsample hrhhid2 serial  hhnum  ... ch35  \\\n",
       "2   MIS 2-4 Or MIS 6-8 (link To     0701   07011      1      1  ...    0   \n",
       "3   MIS 2-4 Or MIS 6-8 (link To     0701   07011      1      1  ...    0   \n",
       "4   MIS 2-4 Or MIS 6-8 (link To     0711   07111      1      1  ...    0   \n",
       "6   MIS 2-4 Or MIS 6-8 (link To     0601   06011      1      1  ...    0   \n",
       "17  MIS 2-4 Or MIS 6-8 (link To     0611   06112      2      2  ...    0   \n",
       "\n",
       "    ch613  ch1417  ch05  ihigrdc  \\\n",
       "2       0       1     0     12.0   \n",
       "3       0       0     0     12.0   \n",
       "4       0       0     0      NaN   \n",
       "6       0       0     0     12.0   \n",
       "17      0       0     0     12.0   \n",
       "\n",
       "                                               docc00  \\\n",
       "2       Office and administrative support occupations   \n",
       "3       Office and administrative support occupations   \n",
       "4   Healthcare practitioner and technical occupations   \n",
       "6      Transportation and material moving occupations   \n",
       "17     Transportation and material moving occupations   \n",
       "\n",
       "                                     dind02  annual_earnings  female  \\\n",
       "2   Health care services , except hospitals          43344.0       1   \n",
       "3       Administrative and support services          19200.0       1   \n",
       "4                                 Hospitals          60000.0       1   \n",
       "6            Transportation and warehousing          32640.0       0   \n",
       "17                             Retail trade          38400.0       0   \n",
       "\n",
       "    has_college_educ  \n",
       "2                  0  \n",
       "3                  0  \n",
       "4                  0  \n",
       "6                  0  \n",
       "17                 0  \n",
       "\n",
       "[5 rows x 101 columns]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cps.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>hhid</th>\n",
       "      <th>intmonth</th>\n",
       "      <th>hurespli</th>\n",
       "      <th>hrhtype</th>\n",
       "      <th>minsamp</th>\n",
       "      <th>hrlonglk</th>\n",
       "      <th>hrsample</th>\n",
       "      <th>hrhhid2</th>\n",
       "      <th>serial</th>\n",
       "      <th>hhnum</th>\n",
       "      <th>...</th>\n",
       "      <th>ch35</th>\n",
       "      <th>ch613</th>\n",
       "      <th>ch1417</th>\n",
       "      <th>ch05</th>\n",
       "      <th>ihigrdc</th>\n",
       "      <th>docc00</th>\n",
       "      <th>dind02</th>\n",
       "      <th>annual_earnings</th>\n",
       "      <th>female</th>\n",
       "      <th>has_college_educ</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ind02</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Residential care facilities, without nursing (6232, 6233, 6239)</th>\n",
       "      <th>2</th>\n",
       "      <td>000110339935453</td>\n",
       "      <td>January</td>\n",
       "      <td>1.0</td>\n",
       "      <td>Unmarried civilian female primary fam householder</td>\n",
       "      <td>MIS 4</td>\n",
       "      <td>MIS 2-4 Or MIS 6-8 (link To</td>\n",
       "      <td>0701</td>\n",
       "      <td>07011</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>12.0</td>\n",
       "      <td>Office and administrative support occupations</td>\n",
       "      <td>Health care services , except hospitals</td>\n",
       "      <td>43344.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Business support services (5614)</th>\n",
       "      <th>3</th>\n",
       "      <td>000110339935453</td>\n",
       "      <td>January</td>\n",
       "      <td>1.0</td>\n",
       "      <td>Unmarried civilian female primary fam householder</td>\n",
       "      <td>MIS 4</td>\n",
       "      <td>MIS 2-4 Or MIS 6-8 (link To</td>\n",
       "      <td>0701</td>\n",
       "      <td>07011</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>12.0</td>\n",
       "      <td>Office and administrative support occupations</td>\n",
       "      <td>Administrative and support services</td>\n",
       "      <td>19200.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Hospitals (622)</th>\n",
       "      <th>4</th>\n",
       "      <td>000110359424339</td>\n",
       "      <td>January</td>\n",
       "      <td>1.0</td>\n",
       "      <td>Unmarried civilian female primary fam householder</td>\n",
       "      <td>MIS 4</td>\n",
       "      <td>MIS 2-4 Or MIS 6-8 (link To</td>\n",
       "      <td>0711</td>\n",
       "      <td>07111</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Healthcare practitioner and technical occupations</td>\n",
       "      <td>Hospitals</td>\n",
       "      <td>60000.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Truck transportation (484)</th>\n",
       "      <th>6</th>\n",
       "      <td>000110651278174</td>\n",
       "      <td>January</td>\n",
       "      <td>1.0</td>\n",
       "      <td>Civilian male primary individual</td>\n",
       "      <td>MIS 8</td>\n",
       "      <td>MIS 2-4 Or MIS 6-8 (link To</td>\n",
       "      <td>0601</td>\n",
       "      <td>06011</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>12.0</td>\n",
       "      <td>Transportation and material moving occupations</td>\n",
       "      <td>Transportation and warehousing</td>\n",
       "      <td>32640.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>****Department stores and discount stores (s45211)</th>\n",
       "      <th>17</th>\n",
       "      <td>007680515071194</td>\n",
       "      <td>January</td>\n",
       "      <td>1.0</td>\n",
       "      <td>Civilian male primary individual</td>\n",
       "      <td>MIS 8</td>\n",
       "      <td>MIS 2-4 Or MIS 6-8 (link To</td>\n",
       "      <td>0611</td>\n",
       "      <td>06112</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>12.0</td>\n",
       "      <td>Transportation and material moving occupations</td>\n",
       "      <td>Retail trade</td>\n",
       "      <td>38400.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 100 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                  hhid  \\\n",
       "ind02                                                                    \n",
       "Residential care facilities, without nursing (6... 2   000110339935453   \n",
       "Business support services (5614)                   3   000110339935453   \n",
       "Hospitals (622)                                    4   000110359424339   \n",
       "Truck transportation (484)                         6   000110651278174   \n",
       "****Department stores and discount stores (s45211) 17  007680515071194   \n",
       "\n",
       "                                                      intmonth  hurespli  \\\n",
       "ind02                                                                      \n",
       "Residential care facilities, without nursing (6... 2   January       1.0   \n",
       "Business support services (5614)                   3   January       1.0   \n",
       "Hospitals (622)                                    4   January       1.0   \n",
       "Truck transportation (484)                         6   January       1.0   \n",
       "****Department stores and discount stores (s45211) 17  January       1.0   \n",
       "\n",
       "                                                                                                 hrhtype  \\\n",
       "ind02                                                                                                      \n",
       "Residential care facilities, without nursing (6... 2   Unmarried civilian female primary fam householder   \n",
       "Business support services (5614)                   3   Unmarried civilian female primary fam householder   \n",
       "Hospitals (622)                                    4   Unmarried civilian female primary fam householder   \n",
       "Truck transportation (484)                         6                    Civilian male primary individual   \n",
       "****Department stores and discount stores (s45211) 17                   Civilian male primary individual   \n",
       "\n",
       "                                                      minsamp  \\\n",
       "ind02                                                           \n",
       "Residential care facilities, without nursing (6... 2    MIS 4   \n",
       "Business support services (5614)                   3    MIS 4   \n",
       "Hospitals (622)                                    4    MIS 4   \n",
       "Truck transportation (484)                         6    MIS 8   \n",
       "****Department stores and discount stores (s45211) 17   MIS 8   \n",
       "\n",
       "                                                                          hrlonglk  \\\n",
       "ind02                                                                                \n",
       "Residential care facilities, without nursing (6... 2   MIS 2-4 Or MIS 6-8 (link To   \n",
       "Business support services (5614)                   3   MIS 2-4 Or MIS 6-8 (link To   \n",
       "Hospitals (622)                                    4   MIS 2-4 Or MIS 6-8 (link To   \n",
       "Truck transportation (484)                         6   MIS 2-4 Or MIS 6-8 (link To   \n",
       "****Department stores and discount stores (s45211) 17  MIS 2-4 Or MIS 6-8 (link To   \n",
       "\n",
       "                                                      hrsample hrhhid2 serial  \\\n",
       "ind02                                                                           \n",
       "Residential care facilities, without nursing (6... 2      0701   07011      1   \n",
       "Business support services (5614)                   3      0701   07011      1   \n",
       "Hospitals (622)                                    4      0711   07111      1   \n",
       "Truck transportation (484)                         6      0601   06011      1   \n",
       "****Department stores and discount stores (s45211) 17     0611   06112      2   \n",
       "\n",
       "                                                       hhnum  ... ch35  ch613  \\\n",
       "ind02                                                         ...               \n",
       "Residential care facilities, without nursing (6... 2       1  ...    0      0   \n",
       "Business support services (5614)                   3       1  ...    0      0   \n",
       "Hospitals (622)                                    4       1  ...    0      0   \n",
       "Truck transportation (484)                         6       1  ...    0      0   \n",
       "****Department stores and discount stores (s45211) 17      2  ...    0      0   \n",
       "\n",
       "                                                       ch1417  ch05  ihigrdc  \\\n",
       "ind02                                                                          \n",
       "Residential care facilities, without nursing (6... 2        1     0     12.0   \n",
       "Business support services (5614)                   3        0     0     12.0   \n",
       "Hospitals (622)                                    4        0     0      NaN   \n",
       "Truck transportation (484)                         6        0     0     12.0   \n",
       "****Department stores and discount stores (s45211) 17       0     0     12.0   \n",
       "\n",
       "                                                                                                  docc00  \\\n",
       "ind02                                                                                                      \n",
       "Residential care facilities, without nursing (6... 2       Office and administrative support occupations   \n",
       "Business support services (5614)                   3       Office and administrative support occupations   \n",
       "Hospitals (622)                                    4   Healthcare practitioner and technical occupations   \n",
       "Truck transportation (484)                         6      Transportation and material moving occupations   \n",
       "****Department stores and discount stores (s45211) 17     Transportation and material moving occupations   \n",
       "\n",
       "                                                                                        dind02  \\\n",
       "ind02                                                                                            \n",
       "Residential care facilities, without nursing (6... 2   Health care services , except hospitals   \n",
       "Business support services (5614)                   3       Administrative and support services   \n",
       "Hospitals (622)                                    4                                 Hospitals   \n",
       "Truck transportation (484)                         6            Transportation and warehousing   \n",
       "****Department stores and discount stores (s45211) 17                             Retail trade   \n",
       "\n",
       "                                                       annual_earnings  \\\n",
       "ind02                                                                    \n",
       "Residential care facilities, without nursing (6... 2           43344.0   \n",
       "Business support services (5614)                   3           19200.0   \n",
       "Hospitals (622)                                    4           60000.0   \n",
       "Truck transportation (484)                         6           32640.0   \n",
       "****Department stores and discount stores (s45211) 17          38400.0   \n",
       "\n",
       "                                                       female  \\\n",
       "ind02                                                           \n",
       "Residential care facilities, without nursing (6... 2        1   \n",
       "Business support services (5614)                   3        1   \n",
       "Hospitals (622)                                    4        1   \n",
       "Truck transportation (484)                         6        0   \n",
       "****Department stores and discount stores (s45211) 17       0   \n",
       "\n",
       "                                                       has_college_educ  \n",
       "ind02                                                                    \n",
       "Residential care facilities, without nursing (6... 2                  0  \n",
       "Business support services (5614)                   3                  0  \n",
       "Hospitals (622)                                    4                  0  \n",
       "Truck transportation (484)                         6                  0  \n",
       "****Department stores and discount stores (s45211) 17                 0  \n",
       "\n",
       "[5 rows x 100 columns]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Move county groups into highest level of multi-index,\n",
    "# with old index in second level.\n",
    "# PanelOLS will then see the first level as the `entity`\n",
    "# identifier.\n",
    "cps_w_multiindex = cps.set_index([\"ind02\", cps.index])\n",
    "cps_w_multiindex.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/nce8/opt/miniconda3/lib/python3.11/site-packages/linearmodels/panel/model.py:1214: MissingValueWarning: \n",
      "Inputs contain missing values. Dropping rows with missing observations.\n",
      "  super().__init__(dependent, exog, weights=weights, check_rank=check_rank)\n",
      "/Users/nce8/opt/miniconda3/lib/python3.11/site-packages/linearmodels/panel/data.py:594: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.\n",
      "  group_mu = self._frame.groupby(level=level).transform(\"mean\")\n",
      "/Users/nce8/opt/miniconda3/lib/python3.11/site-packages/linearmodels/panel/data.py:594: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.\n",
      "  group_mu = self._frame.groupby(level=level).transform(\"mean\")\n",
      "/Users/nce8/opt/miniconda3/lib/python3.11/site-packages/linearmodels/panel/data.py:684: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.\n",
      "  mu = self._frame.groupby(level=level).mean()\n",
      "/Users/nce8/opt/miniconda3/lib/python3.11/site-packages/linearmodels/panel/data.py:684: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.\n",
      "  mu = self._frame.groupby(level=level).mean()\n",
      "/Users/nce8/opt/miniconda3/lib/python3.11/site-packages/linearmodels/panel/data.py:644: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.\n",
      "  out = self._frame.groupby(level=level).count()\n",
      "/Users/nce8/opt/miniconda3/lib/python3.11/site-packages/linearmodels/panel/data.py:684: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.\n",
      "  mu = self._frame.groupby(level=level).mean()\n",
      "/Users/nce8/opt/miniconda3/lib/python3.11/site-packages/linearmodels/panel/data.py:594: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.\n",
      "  group_mu = self._frame.groupby(level=level).transform(\"mean\")\n",
      "/Users/nce8/opt/miniconda3/lib/python3.11/site-packages/linearmodels/panel/data.py:684: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.\n",
      "  mu = self._frame.groupby(level=level).mean()\n",
      "/Users/nce8/opt/miniconda3/lib/python3.11/site-packages/linearmodels/panel/data.py:684: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.\n",
      "  mu = self._frame.groupby(level=level).mean()\n",
      "/Users/nce8/opt/miniconda3/lib/python3.11/site-packages/linearmodels/panel/data.py:594: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.\n",
      "  group_mu = self._frame.groupby(level=level).transform(\"mean\")\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>PanelOLS Estimation Summary</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>     <td>annual_earnings</td> <th>  R-squared:         </th>     <td>0.1414</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Estimator:</th>            <td>PanelOLS</td>     <th>  R-squared (Between):</th>    <td>0.2800</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>No. Observations:</th>      <td>122603</td>      <th>  R-squared (Within):</th>     <td>0.1414</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>             <td>Thu, Mar 14 2024</td> <th>  R-squared (Overall):</th>    <td>0.1683</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>                 <td>12:47:30</td>     <th>  Log-likelihood     </th>    <td>-1.43e+06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Cov. Estimator:</th>       <td>Clustered</td>    <th>                     </th>        <td></td>      \n",
       "</tr>\n",
       "<tr>\n",
       "  <th></th>                          <td></td>         <th>  F-statistic:       </th>     <td>6716.0</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Entities:</th>                <td>259</td>       <th>  P-value            </th>     <td>0.0000</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Avg Obs:</th>               <td>473.37</td>      <th>  Distribution:      </th>   <td>F(3,122341)</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Min Obs:</th>               <td>6.0000</td>      <th>                     </th>        <td></td>      \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Max Obs:</th>               <td>8244.0</td>      <th>  F-statistic (robust):</th>   <td>303.79</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th></th>                          <td></td>         <th>  P-value            </th>     <td>0.0000</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time periods:</th>          <td>122603</td>      <th>  Distribution:      </th>   <td>F(3,122341)</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Avg Obs:</th>               <td>1.0000</td>      <th>                     </th>        <td></td>      \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Min Obs:</th>               <td>1.0000</td>      <th>                     </th>        <td></td>      \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Max Obs:</th>               <td>1.0000</td>      <th>                     </th>        <td></td>      \n",
       "</tr>\n",
       "<tr>\n",
       "  <th></th>                          <td></td>         <th>                     </th>        <td></td>      \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<caption>Parameter Estimates</caption>\n",
       "<tr>\n",
       "          <td></td>          <th>Parameter</th> <th>Std. Err.</th> <th>T-stat</th>  <th>P-value</th>  <th>Lower CI</th>  <th>Upper CI</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>         <td>3.74e+04</td>   <td>747.33</td>   <td>50.041</td>  <td>0.0000</td>   <td>3.593e+04</td> <td>3.886e+04</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>female</th>           <td>-1.065e+04</td>  <td>581.53</td>   <td>-18.316</td> <td>0.0000</td>  <td>-1.179e+04</td>  <td>-9511.4</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>age</th>                <td>386.81</td>    <td>16.928</td>   <td>22.850</td>  <td>0.0000</td>    <td>353.63</td>    <td>419.99</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>has_college_educ</th>  <td>2.665e+04</td>  <td>1323.8</td>   <td>20.130</td>  <td>0.0000</td>   <td>2.405e+04</td> <td>2.924e+04</td>\n",
       "</tr>\n",
       "</table><br/><br/>F-test for Poolability: 54.365<br/>P-value: 0.0000<br/>Distribution: F(258,122341)<br/><br/>Included effects: Entity<br/>id: 0x2882e1fd0"
      ],
      "text/plain": [
       "                          PanelOLS Estimation Summary                           \n",
       "================================================================================\n",
       "Dep. Variable:        annual_earnings   R-squared:                        0.1414\n",
       "Estimator:                   PanelOLS   R-squared (Between):              0.2800\n",
       "No. Observations:              122603   R-squared (Within):               0.1414\n",
       "Date:                Thu, Mar 14 2024   R-squared (Overall):              0.1683\n",
       "Time:                        12:47:30   Log-likelihood                 -1.43e+06\n",
       "Cov. Estimator:             Clustered                                           \n",
       "                                        F-statistic:                      6716.0\n",
       "Entities:                         259   P-value                           0.0000\n",
       "Avg Obs:                       473.37   Distribution:                F(3,122341)\n",
       "Min Obs:                       6.0000                                           \n",
       "Max Obs:                       8244.0   F-statistic (robust):             303.79\n",
       "                                        P-value                           0.0000\n",
       "Time periods:                  122603   Distribution:                F(3,122341)\n",
       "Avg Obs:                       1.0000                                           \n",
       "Min Obs:                       1.0000                                           \n",
       "Max Obs:                       1.0000                                           \n",
       "                                                                                \n",
       "                                Parameter Estimates                                 \n",
       "====================================================================================\n",
       "                  Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept          3.74e+04     747.33     50.041     0.0000   3.593e+04   3.886e+04\n",
       "female           -1.065e+04     581.53    -18.316     0.0000  -1.179e+04     -9511.4\n",
       "age                  386.81     16.928     22.850     0.0000      353.63      419.99\n",
       "has_college_educ  2.665e+04     1323.8     20.130     0.0000   2.405e+04   2.924e+04\n",
       "====================================================================================\n",
       "\n",
       "F-test for Poolability: 54.365\n",
       "P-value: 0.0000\n",
       "Distribution: F(258,122341)\n",
       "\n",
       "Included effects: Entity\n",
       "PanelEffectsResults, id: 0x2882e1fd0"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from linearmodels import PanelOLS\n",
    "\n",
    "mod = PanelOLS.from_formula(\n",
    "    \"annual_earnings ~ 1 + female + age + has_college_educ + EntityEffects\",\n",
    "    data=cps_w_multiindex,\n",
    ")\n",
    "mod.fit(cov_type=\"clustered\", cluster_entity=True)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "base",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.8"
  },
  "vscode": {
   "interpreter": {
    "hash": "718fed28bf9f8c7851519acf2fb923cd655120b36de3b67253eeb0428bd33d2d"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}