---
title: "Poisson Pseudo-Maximum Likelihood (PPML) Model with Cluster-Robust Standard Errors"
output: rmarkdown::html_vignette
bibliography: "references.bib"
vignette: >
  %\VignetteIndexEntry{Poisson Pseudo-Maximum Likelihood (PPML) Model with Cluster-Robust Standard Errors}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

We will estimate a Poisson Pseudo-Maximum Likelihood (PPML) model using the
data available in this package with the idea of replicating the PPML results
from Table 3 in @yotov2016advanced.

This requires to include exporter-time and importer-time fixed effects, and to
cluster the standard errors by exporter-importer pairs.

The PPML especification corresponds to:
\begin{align}
X_{ij,t} =& \:\exp\left[\beta_1 \log(DIST)_{i,j} + \beta_2 BORDER_{i,j} +\right.\\
\text{ }& \:\left.\beta_3 COMLANG_{i,j} + \beta_4 COLONY_{i,j} + \pi_{i,t} + \chi_{i,t}\right] \times \varepsilon_{ij,t}.
\end{align}

Required packages:

```r
library(capybara)
```

We can use the `fepoisson()` function to obtain the estimated coefficients
and we add the fixed effects as `| exp_year + imp_year` in the formula.

Model estimation:

```r
ross2004_subset <- ross2004[ross2004$year %in% seq(1989, 1999, 5), ]
ross2004_subset$trade <- exp(ross2004_subset$ltrade)
ross2004_subset$exp_year <- paste0(ross2004_subset$ctry1, ross2004_subset$year)
ross2004_subset$imp_year <- paste0(ross2004_subset$ctry2, ross2004_subset$year)

fit <- fepoisson(
  trade ~ ldist + border + comlang + colony | exp_year + imp_year,
  data = ross2004_subset
)

summary(fit)
```

```r
Formula: trade ~ ldist + border + comlang + colony | exp_year + imp_year

Family: Poisson

Estimates:

|         | Estimate | Std. Error | z value     | Pr(>|z|)  |
|---------|----------|------------|-------------|-----------|
| ldist   |  -0.9800 |     0.0000 | -90771.8020 | 0.0000 ** |
| border  |   0.3200 |     0.0000 |  13707.7154 | 0.0000 ** |
| comlang |   0.2852 |     0.0000 |  12315.8981 | 0.0000 ** |
| colony  |   0.3958 |     0.0000 |  12508.5396 | 0.0000 ** |

Significance codes: ** p < 0.01; * p < 0.05; + p < 0.10

Pseudo R-squared: 0.9591 

Fixed effects:
  exp_year: 457
  imp_year: 457

Number of observations: Full 21450; Missing 0; Perfect classification 0 

Number of Fisher Scoring iterations: 10 
```

The coefficients are almost identical to those in Table 3 from
@yotov2016advanced that were obtained with Stata. The difference is attributed
to the different fitting algorithms used by the software. Capybara
uses the demeaning algorithm proposed by @stammann2018fast.

```r
fit <- fepoisson(
  trade ~ ldist + border + comlang + colony | exp_year + imp_year | pair,
  data = ross2004_subset
)

summary(fit, type = "clustered")
```

```r
Formula: trade ~ ldist + border + comlang + colony | exp_year + imp_year | 
    pair

Family: Poisson

Estimates:

|         | Estimate | Std. Error | z value  | Pr(>|z|)  |
|---------|----------|------------|----------|-----------|
| ldist   |  -0.9800 |     0.0476 | -20.5747 | 0.0000 ** |
| border  |   0.3200 |     0.1077 |   2.9719 | 0.0030 ** |
| comlang |   0.2852 |     0.0881 |   3.2362 | 0.0012 ** |
| colony  |   0.3958 |     0.1032 |   3.8344 | 0.0001 ** |

Significance codes: ** p < 0.01; * p < 0.05; + p < 0.10

Pseudo R-squared: 0.9591 

Fixed effects:
  exp_year: 457
  imp_year: 457

Number of observations: Full 21450; Missing 0; Perfect classification 0 

Number of Fisher Scoring iterations: 10
```

The slopes are identical but the standard errors differ from the previous exampke. Capybara clustering
algorithm is based on @cameron2011robust.

# References