---
title: "Getting Started with mpindex"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with mpindex}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment  = "#>"
)
```

## What is the MPI?

The **Multidimensional Poverty Index (MPI)** measures poverty not just by income, but by whether people are deprived across multiple areas of their lives — health, education, and living standards. A person or household is considered poor only if they fall short on *enough* of these areas at the same time.

The approach used here is the **Alkire-Foster (AF) method**, developed by Sabina Alkire and James Foster at the Oxford Poverty and Human Development Initiative (OPHI). It is the basis of the Global MPI published annually by OPHI and the UNDP.

The `mpindex` package makes it straightforward to compute the MPI from survey data in R.

---

## Installation

Install the released version from CRAN:

```{r, eval = FALSE}
install.packages("mpindex")
```

Or install the development version from GitHub:

```{r, eval = FALSE}
# install.packages("devtools")
devtools::install_github("yng-me/mpindex")
```

Then load the package:

```{r setup}
library(mpindex)
```

---

## The workflow at a glance

Computing the MPI with `mpindex` follows three steps:

1. **Define your indicators** — load a specification file that describes the dimensions, indicators, and weights.
2. **Specify deprivation cutoffs** — tell the package when a household is considered deprived on each indicator.
3. **Compute and explore the results** — run `compute_mpi()` to get the index, headcount ratio, intensity, and indicator contributions.

The sections below walk through each step using a built-in dataset.

---

## Step 1: Define your MPI specifications

### What goes in a specification file?

The specification file tells `mpindex` how your indicators are organized. It must contain these columns (column names are not case-sensitive):

| Column        | What it means |
|---------------|---------------|
| `Dimension`   | The broad domain (e.g. Health, Education) |
| `Indicator`   | The specific measure within that dimension |
| `Variable`    | The column name in your dataset that holds this measure |
| `Weight`      | How much this indicator contributes to the overall score |
| `Description` | *(optional)* A plain-language description of the indicator |

The package accepts `.csv`, `.xlsx`, `.json`, and `.txt` (tab-separated) files.

### Using the built-in Global MPI specification

`mpindex` ships with the Global MPI specification as an example. The `global_mpi_specs()` (formerly `global_mpi_specs()` which is now deprecated) shortcut loads it in one line:

```{r}
mpi_specs <- global_mpi_specs(uid = "uuid", unit_of_analysis = 'households')
```

The `uid` argument names the column in your dataset that uniquely identifies each household (the unit of analysis). In the built-in dataset, it is `"uuid"` (yours might be different).

Behind the scenes this is equivalent to:

```{r, eval = FALSE}
specs_file <- system.file("extdata", "global-mpi-specs.csv", package = "mpindex")
mpi_specs  <- define_mpi_specs(specs_file, uid = "uuid", unit_of_analysis = 'households')
```

The Global MPI covers 10 indicators across three dimensions, each weighted equally within its dimension:

```{r, echo = FALSE}
specs_file <- system.file("extdata", "global-mpi-specs.csv", package = "mpindex")
read.csv(specs_file) |>
  gt::gt() |>
  gt::tab_header(
    title    = "Global MPI — Dimensions, Indicators, and Weights",
    subtitle = "Source: OPHI MPI Methodological Note 49 (2020)"
  ) |>
  gt::tab_options(table.width = "100%", table.font.size = 12) |>
  gt::fmt_number(columns = 4, decimals = 3)
```

### Using your own specification file

If you are computing a custom MPI, with different sets of dimensions and indicators, create a CSV (or Excel) file that follows the same column structure and load it the same way:

```{r, eval = FALSE}
mpi_specs <- define_mpi_specs(
  mpi_specs_file = "path/to/my-specs.csv",
  uid = "household_id",
  poverty_cutoffs = 1/3 # default: a household needs >= 33% weighted deprivation score to be MPI poor
)
```

You can pass multiple cutoffs if you want to compare results across different poverty thresholds:

```{r, eval = FALSE}
mpi_specs <- define_mpi_specs(
  "path/to/my-specs.csv",
  uid = "household_id",
  poverty_cutoffs = c(0.20, 1/3, 0.50)   # 20%, 33%, 50%
)
```

---

## Step 2: Prepare your data

### The two built-in datasets

The package includes synthetic household survey data modelled on the Global MPI:

- **`df_household`** — one row per household, containing household-level variables.
- **`df_household_roster`** — one row per household member, containing individual-level variables (e.g. nutrition, school attendance).

```{r, message = FALSE, warning = FALSE}
library(dplyr)

glimpse(df_household)
```

```{r}
glimpse(df_household_roster)
```

> **Why two tables?** Some indicators (e.g. whether *any* child in the household is not attending school) come from individual-level data and must be collapsed to the household level. `mpindex` handles this automatically — you just tell it which table to use and how to collapse it.

---

## Step 3: Compute the MPI

### The main function: `compute_mpi()`

`compute_mpi()` takes your household data, your specifications, and a list of **deprivation cutoffs** — one per indicator. Each cutoff is a logical expression wrapped in `deprived()` that evaluates to `TRUE` when a household is deprived.

```{r}
mpi_result <- compute_mpi(
  df_household,
  mpi_specs = mpi_specs,
  deprivations = list(

    # --- Health ---
    nutrition = deprived(
      undernourished == 1 & age < 70,   # deprived if any member under 70 is undernourished
      .data        = df_household_roster,
      collapse_fn = max                 # household is deprived if any member is deprived
    ),
    child_mortality = deprived(with_child_died == 1),

    # --- Education ---
    year_schooling = deprived(
      completed_6yrs_schooling == 2,
      .data        = df_household_roster,
      collapse_fn = max
    ),
    school_attendance = deprived(
      attending_school == 2 & age %in% 5:24,
      .data        = df_household_roster,
      collapse_fn = max
    ),

    # --- Living Standards ---
    cooking_fuel   = deprived(cooking_fuel %in% c(4:6, 9)),
    sanitation     = deprived(toilet > 1),
    drinking_water = deprived(drinking_water == 2),
    electricity    = deprived(electricity == 2),
    housing        = deprived(
      roof %in% c(5, 7, 9) | walls %in% c(5, 8, 9, 99) == 2 | floor %in% c(5, 6, 9)
    ),
    assets = deprived(!(
      (asset_tv + asset_telephone + asset_mobile_phone + asset_computer +
         asset_animal_cart + asset_bicycle + asset_motorcycle +
         asset_refrigerator) > 1 &
        (asset_car + asset_truck) > 0
    ))
  )
)
```

The result is a named list:

```{r}
names(mpi_result)
```

### Disaggregating by a subgroup

Pass `by` to break results down by a grouping variable (e.g. urban vs. rural):

```{r, eval = FALSE}
compute_mpi(
  df_household,
  mpi_specs = mpi_specs,
  deprivations = list(...),
  by = class        # column in df_household
)
```

### The step-by-step alternative

If you prefer to build and inspect each deprivation indicator before combining them, use `define_deprivation()` to create each one separately, then pass the list to `compute_mpi()` via `deprivations`:

```{r, eval = FALSE}
dp <- list()

dp$nutrition <- df_household_roster |>
  define_deprivation(
    indicator = nutrition,
    cutoff = undernourished == 1 & age < 70,
    collapse_fn = max,
    mpi_specs = mpi_specs
  )

dp$drinking_water <- df_household |>
  define_deprivation(
    indicator = drinking_water,
    cutoff = drinking_water == 2,
    mpi_specs = mpi_specs
  )

# ... define all remaining indicators, then:
mpi_result <- compute_mpi(df_household, mpi_specs = mpi_specs, deprivations = dp)
```

This is useful when your data preparation for some indicators is complex and you want to examine intermediate results.

---

## Step 4: Explore the results

### Understanding the output

`mpi_result` is a named list with varying components. Each component is itself a named list keyed by the poverty cutoff used — e.g. `k_33` for the 33% cutoff. If `include_deprivation_matrix` is set to `TRUE`, `$deprivation_matrix` is added to the list. And if grouping is defined in the `by` argument of `compute_mpi()`, `$overall` is included in the list, showing the overall summary of the group. 

| Component | What it contains |
|-----------|-----------------|
| `$index` | The headline MPI value, headcount ratio (H), intensity (A), and sample size (n) |
| `$contribution` | Each indicator's percentage contribution to the overall MPI |
| `$headcount_ratio` | The share of households deprived on each indicator (uncensored and censored) |
| `$deprivation_matrix` | Row-level deprivation scores and indicator flags for every household if `include_deprivation_matrix` is set to `TRUE` (default is `FALSE`) |
| `$overall` | Overall summary if grouping is defined |

### The headline MPI

The three key numbers are:

- **H (headcount ratio)** — what share of the population is multidimensionally poor.
- **A (intensity)** — among the poor, what fraction of the weighted indicators are they deprived in on average.
- **MPI** — the product H × A, capturing both *how many* people are poor and *how poor* they are.

```{r, eval = FALSE}
mpi_result$index$k_33
```

```{r, echo = FALSE}
mpi_result$index$k_33 |>
  gt::gt() |>
  gt::tab_header(title = "MPI — 33% Poverty Cutoff") |>
  gt::fmt_number(columns = 2:4, decimals = 3) |>
  gt::tab_options(table.width = "100%", table.font.size = 12)
```

### Indicator contributions

How much does each indicator drive the overall MPI? The contribution table answers this — values sum to 100% across all indicators.

```{r, eval = FALSE}
mpi_result$contribution$k_33
```

```{r, echo = FALSE}
gtx <- function(.gt, .decimals = 1, .offset = 0) {
  d01_cp <- 2:3  + .offset
  d02_cp <- 4:5  + .offset
  d03_cp <- 6:11 + .offset
  .gt |>
    gt::tab_spanner(label = "Health",           columns = d01_cp) |>
    gt::tab_spanner(label = "Education",        columns = d02_cp) |>
    gt::tab_spanner(label = "Living Standards", columns = d03_cp) |>
    gt::fmt_number(columns = c(d01_cp, d02_cp, d03_cp), decimals = .decimals) |>
    gt::tab_options(table.font.size = 12)
}

mpi_result$contribution$k_33 |>
  gt::gt() |>
  gt::tab_header(title = "Contribution to MPI by Indicator — 33% Poverty Cutoff") |>
  gtx()
```

### Headcount ratios

The **uncensored** headcount ratio shows the raw deprivation rate on each indicator, regardless of whether a household crosses the poverty line. The **censored** version counts only those households also identified as multidimensionally poor.

```{r, eval = FALSE}
mpi_result$headcount_ratio$uncensored   # deprivation rate — all households
mpi_result$headcount_ratio$k_33         # deprivation rate — poor households only
```

```{r, echo = FALSE}
mpi_result$headcount_ratio$uncensored |>
  dplyr::ungroup() |>
  gt::gt() |>
  gt::tab_header(title = "Uncensored Headcount Ratio (all households)") |>
  gtx(.decimals = 3)
```

```{r, echo = FALSE}
mpi_result$headcount_ratio$k_33 |>
  dplyr::ungroup() |>
  gt::gt() |>
  gt::tab_header(title = "Censored Headcount Ratio (poor households only, k = 33%)") |>
  gtx(.decimals = 3)
```

### The deprivation matrix

The deprivation matrix (added when `include_deprivation_matrix = TRUE`) records each household's individual score and indicator flags. The first few rows:

```{r, echo=FALSE}
mpi_result <- compute_mpi(
  df_household,
  mpi_specs = mpi_specs,
  deprivations = list(

    # --- Health ---
    nutrition = deprived(
      undernourished == 1 & age < 70,   # deprived if any member under 70 is undernourished
      .data        = df_household_roster,
      collapse_fn = max                 # household is deprived if any member is deprived
    ),
    child_mortality = deprived(with_child_died == 1),

    # --- Education ---
    year_schooling = deprived(
      completed_6yrs_schooling == 2,
      .data        = df_household_roster,
      collapse_fn = max
    ),
    school_attendance = deprived(
      attending_school == 2 & age %in% 5:24,
      .data        = df_household_roster,
      collapse_fn = max
    ),

    # --- Living Standards ---
    cooking_fuel   = deprived(cooking_fuel %in% c(4:6, 9)),
    sanitation     = deprived(toilet > 1),
    drinking_water = deprived(drinking_water == 2),
    electricity    = deprived(electricity == 2),
    housing        = deprived(
      roof %in% c(5, 7, 9) | walls %in% c(5, 8, 9, 99) == 2 | floor %in% c(5, 6, 9)
    ),
    assets = deprived(!(
      (asset_tv + asset_telephone + asset_mobile_phone + asset_computer +
         asset_animal_cart + asset_bicycle + asset_motorcycle +
         asset_refrigerator) > 1 &
        (asset_car + asset_truck) > 0
    ))
  ),
  include_deprivation_matrix = TRUE
)
```


```{r, eval = FALSE}
mpi_result <- compute_mpi(
  df_household,
  mpi_specs = mpi_specs,
  deprivations = list(...),
  include_deprivation_matrix = TRUE
)

mpi_result$deprivation_matrix |> head()
```

```{r, echo = FALSE}
mpi_result$deprivation_matrix$uncensored |>
  dplyr::ungroup() |>
  head() |>
  gt::gt() |>
  gt::tab_header(title = "Deprivation Matrix — first 6 households (uncensored)") |>
  gtx(.decimals = 0, .offset = 1) |>
  gt::fmt_number(columns = 3, decimals = 3)
```

After applying the poverty cutoff, households with a deprivation score below the threshold have all their indicator flags set to zero — they are not counted as poor:

```{r, eval = FALSE}
mpi_result$deprivation_matrix$k_33 |> head()
```

```{r, echo = FALSE}
mpi_result$deprivation_matrix$k_33 |>
  dplyr::ungroup() |>
  head() |>
  gt::gt() |>
  gt::tab_header(title = "Deprivation Matrix — first 6 households (k = 33% cutoff)") |>
  gtx(.decimals = 0, .offset = 1) |>
  gt::fmt_number(columns = 3, decimals = 3)
```

> To save memory, you can exclude deprivation matrices from the output by setting `include_deprivation_matrix = FALSE` in `compute_mpi()`.

---

## Step 5: Save results to Excel

`save_mpi()` writes all results to a formatted Excel workbook:

```{r, eval = FALSE}
save_mpi(mpi_result, mpi_specs = mpi_specs, filename = "MPI Results")
```

Each component gets its own sheet. To also include the specification table as a reference:

```{r, eval = FALSE}
save_mpi(
  mpi_result,
  mpi_specs = mpi_specs,
  filename = "MPI Results",
  include_specs = TRUE
)
```

---

## Quick-reference script

Here is the complete workflow in one place:

```{r, eval = FALSE}
library(mpindex)

# 1. Load specifications
mpi_specs <- global_mpi_specs(uid = "uuid", unit_of_analysis = 'households')

# 2. Compute the MPI
mpi_result <- compute_mpi(
  df_household,
  mpi_specs = mpi_specs,
  deprivations = list(
    nutrition = deprived(undernourished == 1 & age < 70, .data = df_household_roster, collapse_fn = max),
    child_mortality = deprived(with_child_died == 1),
    year_schooling = deprived(completed_6yrs_schooling == 2, .data = df_household_roster, collapse_fn = max),
    school_attendance = deprived(attending_school == 2 & age %in% 5:24, .data = df_household_roster, collapse_fn = max),
    cooking_fuel = deprived(cooking_fuel %in% c(4:6, 9)),
    sanitation = deprived(toilet > 1),
    drinking_water = deprived(drinking_water == 2),
    electricity = deprived(electricity == 2),
    housing = deprived(roof %in% c(5, 7, 9) |walls %in% c(5, 8, 9, 99) == 2 | floor %in% c(5, 6, 9)),
    assets = deprived(
      !(asset_tv + 
        asset_telephone + 
        asset_mobile_phone + 
        asset_computer + 
        asset_animal_cart + 
        asset_bicycle + 
        asset_motorcycle + 
        asset_refrigerator) > 1 & 
      (asset_car + asset_truck) > 0
    )
  )
)

# 3. Inspect results
mpi_result$index$k_33          # headline MPI, H, A
mpi_result$contribution$k_33   # indicator contributions (%)
mpi_result$headcount_ratio$k_33
mpi_result$deprivation_matrix$k_33

# 4. Save to Excel
save_mpi(mpi_result, mpi_specs = mpi_specs, filename = "MPI Results")
```