This article describes a 2-phase process implemented by the American Board of Pediatrics in 2021 to investigate and remove potential bias on its General Pediatrics Certifying Examination at the item (question) level based on gender or race and ethnicity. Phase 1 used a statistical technique known as differential item functioning (DIF) analysis to identify items in which 1 subgroup of the population outperformed another subgroup after controlling for overall knowledge level. Phase 2 involved a review of items flagged for statistical DIF by the American Board of Pediatrics’ Bias and Sensitivity Review (BSR) panel, a diverse group of 12 voluntary subject matter experts tasked with identifying language or other characteristics of those items that may have contributed to the observed performance differences. Results indicated that no items on the 2021 examination were flagged for DIF by gender and 2.8% of the items were flagged for DIF by race and ethnicity. Of those items flagged for race and ethnicity, 14.3% (0.4% of total items administered) were judged by the BSR panel to contain biased language that may have undermined what the item was intending to measure and were therefore recommended to be removed from operational scoring. In addition to removing potentially biased items from the current pool of items, we hope that repeating the DIF/BSR process after each examination cycle will increase our understanding of how language nuances and other characteristics impact item performance so that we can improve our guidelines for developing future items.

You do not currently have access to this content.