Video Abstract

Video Abstract

Close modal
BACKGROUND:

Universal screening is recommended to reduce the age of diagnosis for autism spectrum disorder (ASD). However, there are insufficient data on children who screen negative and no study of outcomes from truly universal screening. With this study, we filled these gaps by examining the accuracy of universal screening with systematic follow-up through 4 to 8 years.

METHODS:

Universal, primary care-based screening was conducted using the Modified Checklist for Autism in Toddlers with Follow-Up (M-CHAT/F) and supported by electronic administration and integration into electronic health records. All children with a well-child visit (1) between 16 and 26 months, (2) at a Children’s Hospital of Philadelphia site after universal electronic screening was initiated, and (3) between January 2011 and July 2015 were included (N = 25 999).

RESULTS:

Nearly universal screening was achieved (91%), and ASD prevalence was 2.2%. Overall, the M-CHAT/F’s sensitivity was 38.8%, and its positive predictive value (PPV) was 14.6%. Sensitivity was higher in older toddlers and with repeated screenings, whereas PPV was lower in girls. Finally, the M-CHAT/F's specificity and PPV were lower in children of color and those from lower-income households.

CONCLUSIONS:

Universal screening in primary care is possible when supported by electronic administration. In this “real-world” cohort that was systematically followed, the M-CHAT/F was less accurate in detecting ASD than in previous studies. Disparities in screening rates and accuracy were evident in traditionally underrepresented groups. Future research should focus on the development of new methods that detect a greater proportion of children with ASD and reduce disparities in the screening process.

What’s Known on This Subject:

Universal screening for autism spectrum disorder is recommended in primary care to facilitate early detection. However, the US Preventive Services Task Force concluded that there is currently insufficient data from primary care and with longitudinal follow-up to recommend universal screening.

What This Study Adds:

We examined the accuracy of autism screening in a diverse cohort screened nearly universally (91%) and followed-up systematically. The M-CHAT/F had lower sensitivity and positive predictive value than in previous studies; disparities were observed in screening rates and accuracy.

Although autism spectrum disorder (ASD) manifests in the first few years of life, the average age of diagnosis remains older than 4 years of age1  and is even later for children of color and those from rural and lower-income backgrounds.2,3  Improving early diagnosis is critical because it affords children access to earlier intervention, which has been shown to significantly improve outcomes.46  The American Academy of Pediatrics (AAP)7  recommends universal screening for ASD at 18 and 24 months to facilitate earlier identification. However, the US Preventive Services Task Force concluded that there is insufficient evidence to recommend universal screening, in part because of limited data on outcomes for children who screen negative and from diverse samples. Coupled with the lack of data on truly universal screening (ie, all children are screened rather than a selected subset), there are critical gaps in knowledge about the short- and long-term benefits of universal ASD screening.8 

The most widely used and studied screening tool is the Modified Checklist for Autism in Toddlers with Follow-Up (M-CHAT/F),9,10  a 2-stage tool that includes a 23-item parent questionnaire and a follow-up interview designed to reduce false-positives. The Modified Checklist for Autism in Toddlers, Revised, with Follow-Up (M-CHAT–R/F), reworded and removed items and introduced new scoring criteria that recommend bypassing the follow-up interview for scores of 8+. M-CHAT/F and M-CHAT–R/F data are frequently combined because M-CHAT–R/F scoring criteria can be applied to M-CHAT/F administrations, and accuracy is comparable across versions.9  Estimates of positive predictive value (PPV) for the M-CHAT/F and M-CHAT–R/F have varied widely (2%–65%) depending in part on the sample’s ASD prevalence.9,1113  However, PPV is optimized when the follow-up interview is used.11  There are few sensitivity, specificity, or negative predictive value (NPV) estimates because these require systematic follow-up of all children, including those who screen negative.

To date, 2 large-scale studies have been used to conduct systematic, longitudinal follow-up of the Modified Checklist for Autism in Toddlers without Follow-Up (M-CHAT). However, both have important limitations that restrict generalizability. A study conducted in Malaysian maternal-child health clinics yielded only a 0.2% screen-positive rate in a sample with a 0.3% ASD prevalence rate.14  A large study of population screening in Norway yielded a higher screen-positive rate (7.4%), but the ASD prevalence was still low (0.3%), suggesting that the study’s methods failed to detect many children with ASD.15  Despite large sample sizes, neither study achieved universal screening (ie, 65% and ∼28%, respectively).14,15  Although there are limitations, these first estimates in samples that were systematically followed-up suggested low sensitivity and PPV (34%–36% and 2%–47%, respectively).

It is critical to assess screening accuracy within the intended population (eg, all children in a primary care population) to reduce bias and facilitate the generalization of findings to similar cohorts. Sensitivity and specificity are often presumed to be fixed and inherent to the measure, but in reality, these measures are strongly influenced by the sample and/or cohort in which they are estimated.16  Samples that are not ascertained through universal screening are likely to overrepresent children with parent and/or professional concern and/or underrepresent children of color and those from lower-income households. In addition, PPV and NPV are directly linked to the sample’s prevalence rate, such that PPV increases and NPV decreases with prevalence.16  Thus, without universal screening research and representative cohorts, we risk drawing incorrect conclusions about the accuracy of screening tools in real-world applications.

Our goal with this study was to examine the real-world accuracy of universal screening for ASD by using an epidemiological design and long-term follow-up through 4 to 8 years of age. Screening was conducted at the Children’s Hospital of Philadelphia (CHOP), a large pediatric network of primary and specialty care services with an integrated electronic health record (EHR). Our secondary goals were to examine the accuracy of repeated screenings and the effect of child and/or family characteristics on screening rates and accuracy.

The CHOP network includes 31 pediatric primary care sites that serve a diverse patient population in Pennsylvania and New Jersey. Of these, 4 sites in urban Philadelphia serve a racially and economically diverse patient population (88% children of color, 74% public insurance/Medicaid), wheras suburban sites are less diverse (35% children of color, 24% public insurance/Medicaid).

The M-CHAT/F is administered electronically at well-child visits between 16 and 26 months in accordance with Pennsylvania’s Early and Periodic Screening, Diagnostic, and Treatment program17  and is available in English and Spanish. Screening is automatically triggered at all well-child visits in this age range, regardless of previous screening results, to ensure that children are screened twice as recommended by the AAP. Given that not all children present for 18- and 24-month well-child visits, the M-CHAT/F can also be assessed manually at sick visits. Once the M-CHAT/F is completed, questionnaire results autopopulate into the child’s visit note along with a link for providers to complete the follow-up interview when children screen positive.

CHOP provides ASD diagnostic services through a multidisciplinary program that includes developmental pediatrics, psychology and psychiatry, and neurology clinics.

This study was approved by the CHOP Institutional Review Board with a waiver of consent.

The cohort included all children who presented for a well-child visit (1) between 16 and 26 months of age, (2) at a CHOP site where universal screening had been initiated, and (3) between January 2011 and July 2015 to allow for longitudinal follow-up through ≥4 years. If the child’s first screening occurred before initiation of universal screening and the second occurred after universal screening initiation, the first screening was included to accurately represent first and second screens. Screenings at sick visits were also included (0.3% of M-CHAT/F administrations). When multiple M-CHAT/Fs were completed, the first administration was used unless otherwise noted.

Preliminary analyses of screening rates and results included the entire cohort. Children without a primary care visit at ≥4 years of age were excluded from diagnostic outcome analyses because of insufficient length of follow-up. Four years was chosen given recent estimates of the median age of diagnosis for ASD.18  Those without a documented language of English or Spanish (n = 82; 0.004%) were also excluded because the M-CHAT/F was not available in other languages.

The subgroup whose visits closely followed AAP guidelines was also examined; these children were screened during primary care visits at 18 months (±2) and 24 months (±2) with ≥3 months between screenings.

Demographic Data

Demographics, gestational age, and insurance payer for the screening visit were extracted from the EHR. Federal information processing standard codes were linked to census tract–level data to generate estimates of median income, and a median split was performed. Language was coded as English only (ie, only English was documented) or other language (ie, documentation of any non-English language).

Diagnostic Data

Diagnoses were extracted from visit diagnoses and problem lists (a comprehensive list of active and relevant past diagnoses). All available data were used to determine diagnostic outcome. As a result, the length of follow-up period varied across children (although all children included in these analyses had follow-up through at least 4 years).

Children were considered to have ASD if an ASD diagnosis appeared in the EHR more than once or was provided by a specialist because these criteria have been associated with the greatest accuracy in other large health care systems.19,20  For example, when comparing EHR diagnostic codes and manual chart review, Coleman et al20  found that ≥2 ASD diagnoses yielded a PPV of 87%; specialist diagnoses were also associated with higher odds of a confirmed diagnosis.

Children were considered to have a non-ASD disorder and/or delay if they did not meet ASD classification criteria described above but had ≥1 code from 1 of the following categories: attention-deficit/hyperactivity disorder (ADHD) and related behaviors, anxiety disorder and related behaviors, disruptive behavior disorder and related behaviors, developmental delay, language disorder and/or delay, motor disorder and/or delay, sensory processing difficulty, and social delay without ASD. See Supplemental Table 5 for specific diagnostic codes used.

Screening rates and results were summarized with percentages; χ2 analyses and odds ratios (ORs) were calculated for subgroup comparisons. Sensitivity, specificity, PPV, and NPV were calculated from 2 × 2 contingency tables. Effect sizes for the proportion comparisons (Cohen’s h) and measures of statistical significance (2-sample tests of proportion) were used to estimate differences in M-CHAT/F accuracy across subgroups. Emphasis was placed on interpreting effect sizes rather than P values alone given the likelihood of statistical significance for trivial differences in this large cohort. Thus, only results with statistically significant comparisons and effect sizes ≥0.20 are reported as meaningful. Finally, Kaplan-Meier survival curves estimated the cumulative probability of ASD diagnosis across time since screening. The log-rank test was used to compare survival curves between children who screened negative and positive to detect differences in mean time to diagnosis by M-CHAT/F outcome.

A total of 25 999 children had 42 973 eligible visits during the study period (see Table 1 and Fig 1). A total of 23 634 children (90.9%) were screened and 50.4% were screened more than once. White children were screened more often than other racial groups (see Table 2). Children with English-only exposure, higher incomes, private insurance, and from suburban primary care sites were also screened more often, and premature children were screened less often. Only 47.8% were screened at 18 and 24 months largely because of failure to attend both well-child visits. Children who received 2 screenings according to the AAP schedule were more likely to be white and non-Hispanic, from a suburban site, and have English-only exposure, higher incomes, and private insurance (see Table 2).

TABLE 1

Cohort Demographics

Entire CohortN = 25 999
Sex, n (%)  
 Female 12 553 (48.3) 
 Male 13 446 (51.7) 
Gestational age, n (%)  
 Full term, 37+ wk 20 249 (77.9) 
 Premature, <37 wk 2870 (11.0) 
 Gestational age not documented in the EHR 2880 (11.1) 
Language documented in the EHR, n (%)  
 English only 24 371 (93.7) 
 Language other than English 1271 (4.9) 
 No language documented in the EHR 357 (1.4) 
Race, n (%)  
 White 11 118 (42.8) 
 Black 9497 (36.5) 
 Asian  1109 (4.3) 
 Other or multiple races 4232 (16.3) 
 Race not documented in the EHR 43 (0.2) 
Ethnicity, n (%)  
 Not Hispanic or Latino 23 983 (92.2) 
 Hispanic or Latino 1929 (7.4) 
 Ethnicity not documented in the EHR 87 (0.3) 
Insurance payer at screening visit  
 Private insurance 14 087 (54.2) 
 Public insurance/Medicaid 11 769 (45.3) 
 Self-pay or other 143 (0.6) 
Type of primary care site at first visit, n (%)  
 Urban, n = 4 sites 10 958 (42.1) 
 Suburban, n = 27 sites 15 041 (57.9) 
Median income, $, (SD)a 59 596.56 (32 579.09) 
Entire CohortN = 25 999
Sex, n (%)  
 Female 12 553 (48.3) 
 Male 13 446 (51.7) 
Gestational age, n (%)  
 Full term, 37+ wk 20 249 (77.9) 
 Premature, <37 wk 2870 (11.0) 
 Gestational age not documented in the EHR 2880 (11.1) 
Language documented in the EHR, n (%)  
 English only 24 371 (93.7) 
 Language other than English 1271 (4.9) 
 No language documented in the EHR 357 (1.4) 
Race, n (%)  
 White 11 118 (42.8) 
 Black 9497 (36.5) 
 Asian  1109 (4.3) 
 Other or multiple races 4232 (16.3) 
 Race not documented in the EHR 43 (0.2) 
Ethnicity, n (%)  
 Not Hispanic or Latino 23 983 (92.2) 
 Hispanic or Latino 1929 (7.4) 
 Ethnicity not documented in the EHR 87 (0.3) 
Insurance payer at screening visit  
 Private insurance 14 087 (54.2) 
 Public insurance/Medicaid 11 769 (45.3) 
 Self-pay or other 143 (0.6) 
Type of primary care site at first visit, n (%)  
 Urban, n = 4 sites 10 958 (42.1) 
 Suburban, n = 27 sites 15 041 (57.9) 
Median income, $, (SD)a 59 596.56 (32 579.09) 
a

Missing for 17 children because the median income was unavailable for the census tract.

FIGURE 1

Cohort flowchart.

FIGURE 1

Cohort flowchart.

Close modal
TABLE 2

Screening and Follow-up Rates by Child and Family Characteristics

Screened at Least OnceScreened at 18 and 24 moCHOP Patient Through at Least 4 y of Age (ie, Follow-Up Data Available)
No. (%)OR (95% CI)No. (%)OR (95% CI)No. (%)OR (95% CI)
Entire cohort (N = 25 999) 23 628 (90.9) — 12 418 (47.8) — 22 392 (86.1) — 
By sex       
 Female (n = 12 553) 11 375 (90.6) 0.94 (0.86–1.02) 5936 (47.3) 0.96 (0.92–1.01) 10 841 (86.4) 1.04 (0.97–1.11) 
 Male (n = 13 446) 12 253 (91.1) — 6482 (48.2) — 11 551 (85.9) — 
By gestational age       
 Full term, 37+ wk (n = 20 249) 18 567 (91.7) 2.00 (1.79–2.24)a 10 051 (49.6) 1.69 (1.56–1.83)a 17 658 (87.2) 1.15 (1.03–1.29)a 
 Premature, <37 wk (n = 2870) 2429 (84.6) — 1057 (36.8) — 2454 (85.5) — 
By language documented in the EHR       
 English only (n = 24 371) 22 221 (91.2) 2.02 (1.73–2.36)a 11 800 (48.4) 1.69 (1.50–1.90)a 21 244 (87.2) 1.27 (1.09–1.48)a 
 Language other than English (n = 1271) 1063 (83.6) — 454 (35.7) — 1071 (84.3) — 
By race       
 White (n = 11 118) 10 761 (96.8) — 7151 (64.3) — 9710 (87.3) — 
 Black (n = 9497) 7924 (83.4) 5.98 (5.32–6.74)a 2637 (27.8) 4.69 (4.42–4.98)a 8236 (86.7) 1.05 (0.97–1.15)a 
 Asian (n = 1109) 1016 (91.6) 2.76 (2.18–3.50)a 566 (51.0) 1.73 (1.53–1.96)a 888 (80.1) 1.72 (1.47–2.01)a 
 Other or multiple races (n = 4232) 3884 (91.8) 2.70 (2.32–3.14)a 2039 (48.2) 1.94 (1.81–2.08)a 3527 (83.3) 1.38 (1.25–1.52)a 
By ethnicity       
 Not Hispanic or Latino (n = 23 983) 21 811 (90.9) 1.07 (0.92–1.25) 11 544 (48.1) 1.21 (1.10–1.33)a 20 724 (86.4) 1.15 (1.01–1.31)a 
 Hispanic or Latino (n = 1929) 1743 (90.4) — 836 (43.3) — 1633 (84.7) — 
By insurance payer at screening visit       
 Private insurance (n = 14 087) 13 437 (95.4) 3.45 (3.79–3.13)a 8668 (61.5) 3.42 (3.26–3.61)a 12 212 (87.1) 1.19 (1.11–1.28)a 
 Public insurance/Medicaid (n = 11 769) 10 086 (85.7) — 3743 (31.8) — 9955 (85.0) — 
By primary care site type       
 Suburban (n = 15 041) 14 667 (97.5) 8.77 (7.81–9.80)a 2917 (26.6) 4.72 (4.48–5.00)a 13 101 (87.1) 1.21 (1.13–1.30)a 
 Urban (n = 10 958) 8961 (81.8) — 9501 (63.2) — 9291 (84.8) — 
By median income       
 Higher income (n = 12 991) 12 571 (96.8) 5.26 (4.70–5.86)a 4249 (32.7) 3.48 (3.31–3.67)a 11 306 (87.0) 1.16 (1.08–1.25)a 
 Lower income (n = 12 991) 11 049 (85.1) — 8167 (62.9) — 11 074 (85.2) — 
Screened at Least OnceScreened at 18 and 24 moCHOP Patient Through at Least 4 y of Age (ie, Follow-Up Data Available)
No. (%)OR (95% CI)No. (%)OR (95% CI)No. (%)OR (95% CI)
Entire cohort (N = 25 999) 23 628 (90.9) — 12 418 (47.8) — 22 392 (86.1) — 
By sex       
 Female (n = 12 553) 11 375 (90.6) 0.94 (0.86–1.02) 5936 (47.3) 0.96 (0.92–1.01) 10 841 (86.4) 1.04 (0.97–1.11) 
 Male (n = 13 446) 12 253 (91.1) — 6482 (48.2) — 11 551 (85.9) — 
By gestational age       
 Full term, 37+ wk (n = 20 249) 18 567 (91.7) 2.00 (1.79–2.24)a 10 051 (49.6) 1.69 (1.56–1.83)a 17 658 (87.2) 1.15 (1.03–1.29)a 
 Premature, <37 wk (n = 2870) 2429 (84.6) — 1057 (36.8) — 2454 (85.5) — 
By language documented in the EHR       
 English only (n = 24 371) 22 221 (91.2) 2.02 (1.73–2.36)a 11 800 (48.4) 1.69 (1.50–1.90)a 21 244 (87.2) 1.27 (1.09–1.48)a 
 Language other than English (n = 1271) 1063 (83.6) — 454 (35.7) — 1071 (84.3) — 
By race       
 White (n = 11 118) 10 761 (96.8) — 7151 (64.3) — 9710 (87.3) — 
 Black (n = 9497) 7924 (83.4) 5.98 (5.32–6.74)a 2637 (27.8) 4.69 (4.42–4.98)a 8236 (86.7) 1.05 (0.97–1.15)a 
 Asian (n = 1109) 1016 (91.6) 2.76 (2.18–3.50)a 566 (51.0) 1.73 (1.53–1.96)a 888 (80.1) 1.72 (1.47–2.01)a 
 Other or multiple races (n = 4232) 3884 (91.8) 2.70 (2.32–3.14)a 2039 (48.2) 1.94 (1.81–2.08)a 3527 (83.3) 1.38 (1.25–1.52)a 
By ethnicity       
 Not Hispanic or Latino (n = 23 983) 21 811 (90.9) 1.07 (0.92–1.25) 11 544 (48.1) 1.21 (1.10–1.33)a 20 724 (86.4) 1.15 (1.01–1.31)a 
 Hispanic or Latino (n = 1929) 1743 (90.4) — 836 (43.3) — 1633 (84.7) — 
By insurance payer at screening visit       
 Private insurance (n = 14 087) 13 437 (95.4) 3.45 (3.79–3.13)a 8668 (61.5) 3.42 (3.26–3.61)a 12 212 (87.1) 1.19 (1.11–1.28)a 
 Public insurance/Medicaid (n = 11 769) 10 086 (85.7) — 3743 (31.8) — 9955 (85.0) — 
By primary care site type       
 Suburban (n = 15 041) 14 667 (97.5) 8.77 (7.81–9.80)a 2917 (26.6) 4.72 (4.48–5.00)a 13 101 (87.1) 1.21 (1.13–1.30)a 
 Urban (n = 10 958) 8961 (81.8) — 9501 (63.2) — 9291 (84.8) — 
By median income       
 Higher income (n = 12 991) 12 571 (96.8) 5.26 (4.70–5.86)a 4249 (32.7) 3.48 (3.31–3.67)a 11 306 (87.0) 1.16 (1.08–1.25)a 
 Lower income (n = 12 991) 11 049 (85.1) — 8167 (62.9) — 11 074 (85.2) — 

The last group listed is the reference group for ORs, except for comparisons by race. Here, black, Asian, and other or multiple races are the reference groups, respectively, and compared with the white group. Some subgroups do not add to 25 999 participants because of missing data. CI, confidence interval; —, not applicable.

a

Indicates screening percentages that are different by subgroup at P < .05 after Bonferroni correction.

When considering the first M-CHAT/F, 9.5% (n = 2256) screened positive on the 23-item questionnaire, a rate comparable to other large-scale US-based screening studies.9,21  Of those that screened positive, 88.7% (n = 2002) required the follow-up interview (ie, scores of 3–7) and 41.2% (n = 825) were administered it. Almost all (n = 782; 94.8%) no longer screened positive after the follow-up interview. Of note, these numbers reflect all screened children (including those without follow-up data), so they differ somewhat from Fig 2, which only includes children included in accuracy analyses (ie, screened with follow-up data).

FIGURE 2

M-CHAT/F results for screened cohort with outcome data.

FIGURE 2

M-CHAT/F results for screened cohort with outcome data.

Close modal

For accuracy analyses, children who screened negative after the questionnaire or follow-up interview were considered screen negatives. Those who continued to screen positive after the interview were considered screen positives. Children who screened positive on the questionnaire but did not receive the follow-up interview were also considered screen positives. Excluding this group would introduce substantial bias to the cohort because there were demographic and clinical differences between children who did and did not receive the follow-up interview (see below). Furthermore, the positive questionnaire results were available to providers to base clinical action on (even in the absence of the follow-up interview), and many were referred after an incomplete M-CHAT/F screening (K.W., W.G., A.B., et al, unpublished data).

FIGURE 3

M-CHAT/F results for screened cohort with outcome diagnosis of ASD.

FIGURE 3

M-CHAT/F results for screened cohort with outcome diagnosis of ASD.

Close modal

This approach resulted in a final screen-positive rate of 6.2%. Children of color, those from lower-income households, with public insurance/Medicaid and non-English exposure, and those seen in urban practices screened positive more frequently, as did boys and premature children (see Table 2).

Older toddlers (21–26 months) screened positive more often (8.9%) than younger toddlers (16–20 months; 5.5%) on the first screening (see Table 3). However, screen-positive rates were somewhat higher at 18 months (4.2%) than at 24 months (3.5%) in the subgroup screened twice; 6.4% screened positive on 1 or both screenings.

TABLE 3

M-CHAT/F Screen-Positive Rates by Child and Family Characteristics

Screened Positive on M-CHAT/F, No. (%)OR (95% CI)
Screened cohort (n = 23 628) 1474 (6.2) — 
By age at first screening   
 16–20 mo (n = 18 585) 1027 (5.5) — 
 21–26 mo (n = 5043) 447 (8.9) 1.66 (1.48–1.87)a 
By multiple screenings   
 First screening at ∼18 mo (n = 11 624) 518 (4.2) — 
 Second screening at ∼24 mo (n = 11 624) 440 (3.5) 0.84 (0.74–0.96)a 
By sex   
 Female (n = 11 375) 635 (5.6) — 
 Male (n = 12 253) 839 (6.8) 1.24 (1.12–1.38)a 
By gestational age   
 Full term, 37+ wk (n = 18 567) 1022 (5.5) — 
 Premature, <37 wk (n = 2429) 302 (12.4) 2.44 (2.13–2.80)a 
By language documented in the EHR   
 English only (n = 22 221) 1301 (5.9) — 
 Language other than English (n = 1063) 159 (15.0) 2.83 (2.37–3.38)a 
By race   
 White (n = 10 761) 319 (3.0) — 
 Black (n = 7924) 752 (9.5) 3.45 (3.01–3.94)a 
 Asian (n = 1016) 108 (10.6) 3.90 (3.11–4.90)a 
 Other or multiple races (n = 3884) 295 (7.6)c 2.70 (2.29–3.16)a 
By ethnicity   
 Not Hispanic or Latino (n = 21 811) 1305 (6.0) — 
 Hispanic or Latino (n = 1743) 167 (9.6) 1.67 (1.41–1.97)a 
By insurance payer at screening visit   
 Private insurance (n = 13 437) 427 (3.2) — 
 Public insurance/Medicaid (n = 10 086) 1038 (10.3) 3.50 (3.11–3.93)a 
By primary care site type   
 Suburban (n = 14 667) 610 (4.2) — 
 Urban (n = 8961) 864 (9.6) 2.46 (2.21–2.74)a 
By median income   
 Higher income (n = 12 571) 496 (3.9) — 
 Lower income (n = 11 049) 977 (8.8) 2.36 (2.11–2.64)a 
Screened Positive on M-CHAT/F, No. (%)OR (95% CI)
Screened cohort (n = 23 628) 1474 (6.2) — 
By age at first screening   
 16–20 mo (n = 18 585) 1027 (5.5) — 
 21–26 mo (n = 5043) 447 (8.9) 1.66 (1.48–1.87)a 
By multiple screenings   
 First screening at ∼18 mo (n = 11 624) 518 (4.2) — 
 Second screening at ∼24 mo (n = 11 624) 440 (3.5) 0.84 (0.74–0.96)a 
By sex   
 Female (n = 11 375) 635 (5.6) — 
 Male (n = 12 253) 839 (6.8) 1.24 (1.12–1.38)a 
By gestational age   
 Full term, 37+ wk (n = 18 567) 1022 (5.5) — 
 Premature, <37 wk (n = 2429) 302 (12.4) 2.44 (2.13–2.80)a 
By language documented in the EHR   
 English only (n = 22 221) 1301 (5.9) — 
 Language other than English (n = 1063) 159 (15.0) 2.83 (2.37–3.38)a 
By race   
 White (n = 10 761) 319 (3.0) — 
 Black (n = 7924) 752 (9.5) 3.45 (3.01–3.94)a 
 Asian (n = 1016) 108 (10.6) 3.90 (3.11–4.90)a 
 Other or multiple races (n = 3884) 295 (7.6)c 2.70 (2.29–3.16)a 
By ethnicity   
 Not Hispanic or Latino (n = 21 811) 1305 (6.0) — 
 Hispanic or Latino (n = 1743) 167 (9.6) 1.67 (1.41–1.97)a 
By insurance payer at screening visit   
 Private insurance (n = 13 437) 427 (3.2) — 
 Public insurance/Medicaid (n = 10 086) 1038 (10.3) 3.50 (3.11–3.93)a 
By primary care site type   
 Suburban (n = 14 667) 610 (4.2) — 
 Urban (n = 8961) 864 (9.6) 2.46 (2.21–2.74)a 
By median income   
 Higher income (n = 12 571) 496 (3.9) — 
 Lower income (n = 11 049) 977 (8.8) 2.36 (2.11–2.64)a 

The first group listed is the reference group for ORs. Some subgroups do not add to 23 628 participants because of missing data. A comparison of the first and second screening was conducted in subsample with 2 screenings according to AAP guidelines (n = 11 624 children). CI, confidence interval; —, not applicable.

a

Indicates screen-positive rates that are different by subgroup at P < .05 after Bonferroni correction.

Most screened children (n = 20 437; 86.5%) continued to receive CHOP primary care at ≥4 years and were included in M-CHAT/F accuracy analyses because outcome diagnostic data were available. ASD prevalence was 2.2%, which is comparable to recent prevalence estimates in nearby New Jersey.1,22  A total 62.8% received an ASD diagnosis by a specialist (of these, 94.7% also had an ASD diagnosis documented by a primary care provider); the remaining 37.2% only had a diagnosis made or documented by a primary care provider.

Overall, 36.4% had other delays and/or concerns, which included codes related to development (10.6%), language (23.0%), behavior (8.4%), motor (7.9%), ADHD and related behaviors (2.9%), anxiety and related behaviors (1.8%), sensory processing (0.4%), and social delays without ASD (0.6%; categories are not mutually exclusive).

The M-CHAT/F’s sensitivity to detect ASD was 38.8%, and its specificity was 94.9%. PPV was 14.6% and NPV was 98.6% (see Table 4, Fig 2, and Fig 3). The M-CHAT/F’s accuracy in detecting any documented delay and/or concern (including ASD) was as follows: sensitivity was 11.8%, specificity was 97.4%, PPV was 72.4%, and NPV was 65.9%.

TABLE 4

M-CHAT/F Accuracy in Detecting ASD by Child and Family Characteristics

Sensitivity (95% CI)Specificity (95% CI)PPV (95% CI)NPV (95% CI)
Screened cohort with outcome data (n = 20 375) 38.8 (34.3–43.3) 94.9 (94.5–95.2) 14.6 (12.6–16.6) 98.6 (98.4–98.7) 
By age at first screening     
 16–20 mo 35.1 (30.0–40.3)a 95.4 (95.1–95.7) 13.9 (11.6–16.2) 98.6 (98.4–98.8) 
 21–26 mo 48.8 (39.6–57.7)a 92.3 (92.0–93.5) 16.4 (12.6–20.3) 98.4 (98.0–98.8) 
 Comparison of accuracy by age P = .009, h = 0.28a P < .001, h = 0.11 P = .25, h = 0.07 P = .41, h = 0.01 
By multiple screenings     
 First screening only at ∼18 mo without regard for results of second screening 31.8 (25.9–37.7)a 96.6 (96.3–97.0) 17.0 (13.5–20.4) 98.5 (98.3–98.7) 
 Second screening only at ∼24 mo without regard for results of second screening 39.8 (33.5–46.0) 97.4 (97.1–97.7) 24.7 (20.4–29.1) 98.7 (98.5–98.9) 
 Comparison of accuracy by screening, first vs second P = .07, h = 0.17 P = .001, h = 0.04 P = .006, h = 0.19 P = .25, h = 0.02 
 First or second screening positive 51.1 (44.7–57.4)a 94.8 (94.4–95.2) 17.6 (14.8–20.5) 98.9 (98.7–99.1) 
 Comparison of accuracy by no. of screenings, first screening vs both screenings P < .001, h = 0.39a P < .001, h = 0.09 P = .77, h = 0.02 P = .01, h = 0.04 
By sex     
 Female 39.6 (30.1–49.1) 95.1 (94.7–95.5) 7.7 (5.4–10.0)a 99.4 (99.2–99.5) 
 Male 38.5 (33.5–43.6) 94.6 (94.2–95.1) 19.9 (16.9–22.9)a 97.8 (97.5–98.1) 
 Comparison of accuracy by sex P = .85, h = 0.02 P = .15, h = 0.02 P < .001, h = 0.36a P < .001, h = 0.14 
By gestational age     
 Full term, 37+ wk 35.8 (30.8–40.8)a 95.4 (95.1–95.8)a 14.8 (12.4–17.2) 98.6 (98.4–98.7) 
 Premature, <37 wk 54.3 (42.6–66.0)a 89.3 (87.9–90.6)a 15.0 (10.6–19.4) 98.2 (97.6–98.8) 
 Comparison of accuracy by gestational age P = .004, h = 0.37a P < .001, h = 0.24a P = .90, h = 0.01 P = .31, h = 0.02 
By language documented in the EHR     
 English only 38.5 (33.9–43.1) 95.2 (94.9–95.5)a 15.3 (13.2–17.5)a 98.6 (98.4–98.7) 
 Language other than English 43.5 (23.2–63.7) 86.9 (84.6–89.2)a 8.5 (3.4–13.5)a 98.2 (97.3–99.2) 
 Comparison of accuracy by language P = .63, h = 0.10 P < .001, h = 0.30a P = .046, h = 0.21a P = .46, h = 0.03 
By race     
 White 37.7 (30.4–45.1) 97.9 (97.6–98.1)a 24.0 (18.9–29.2)a 98.9 (98.6–99.1) 
 Black 40.7 (33.5–47.8) 91.7 (91.0–92.3)a 11.7 (9.2–14.2)a 98.3 (98.0–98.6) 
 Comparison of accuracy by race vs white P = .58, h = 0.06 P < .001, h = 0.29a P < .001, h = 0.33a P = .002, h = 0.05 
 Asian  30.0 (13.6–46.4) 90.4 (88.3–92.4)a 10.8 (4.2–17.5)a 97.1 (95.8–98.3) 
 Comparison of accuracy by race vs white P = .42, h = 0.16 P < .001, h = 0.34a P = .01, h = 0.35a P < .001, h = 0.13 
 Other or multiple races 40.0 (28.9–51.1) 93.8 (93.0–94.7)a 13.4 (8.9–17.9)a 98.5 (98.1–98.9) 
 Comparison of accuracy by race vs white P = .73, h = 0.05 P < .001, h = 0.21a P = .003, h = 0.28a P = .11, h = 0.03 
By ethnicity     
 Hispanic or Latino 42.1 (26.4–57.8) 92.2 (90.8–93.6) 12.5 (6.8–18.2) 98.4 (97.7–99.0) 
 Not Hispanic or Latino 38.5 (33.8–43.1) 95.1 (94.7–95.4) 14.9 (12.8–17.0) 98.6 (98.4–98.7) 
 Comparison of accuracy by ethnicity P = .66, h = 0.07 P < .001, h = 0.12 P = .47, h = 0.07 P = .56, h = 0.02 
By insurance payer at screening visit     
 Public insurance/Medicaid 42.8 (36.3–49.3) 91.0 (90.4–91.6)a 11.3 (9.1–13.4)a 98.3 (98.1–98.6) 
 Private insurance 34.4 (28.2–40.5) 97.6 (97.3–97.9)a 22.1 (17.8–26.4)a 98.7 (98.5–98.9) 
 Comparison of accuracy by insurance P = .07, h = 0.17 P < .001, h = 0.30a P < .001 h = 0.29a P = .05, h = 0.03 
By primary care site type     
 Urban 40.2 (33.6–46.8) 91.5 (90.9–92.1)a 12.0 (9.6–14.4) 98.1 (97.8–98.5) 
 Suburban 37.5 (31.4–43.6) 96.8 (96.5–97.1)a 18.5 (15.0–21.9) 98.8 (98.6–99.0) 
 Comparison of accuracy by site type P = .56, h = 0.06 P < .001, h = 0.23a P = .002, h = 0.18 P < .001, h = 0.05 
By median income     
 Lower income 39.0 (32.8–45.2) 92.3 (91.8–92.9)a 11.8 (9.5–14.0)a 98.3 (98.0–98.6) 
 Higher income 38.7 (32.1–45.2) 97.0 (96.7–97.3)a 20.4 (16.5–24.4)a 98.8 (98.6–99.0) 
 Comparison of accuracy by income P = .94, h = 0.01 P < .001, h = 0.22a P < .001, h = 0.24a P = .006, h = 0.04 
Sensitivity (95% CI)Specificity (95% CI)PPV (95% CI)NPV (95% CI)
Screened cohort with outcome data (n = 20 375) 38.8 (34.3–43.3) 94.9 (94.5–95.2) 14.6 (12.6–16.6) 98.6 (98.4–98.7) 
By age at first screening     
 16–20 mo 35.1 (30.0–40.3)a 95.4 (95.1–95.7) 13.9 (11.6–16.2) 98.6 (98.4–98.8) 
 21–26 mo 48.8 (39.6–57.7)a 92.3 (92.0–93.5) 16.4 (12.6–20.3) 98.4 (98.0–98.8) 
 Comparison of accuracy by age P = .009, h = 0.28a P < .001, h = 0.11 P = .25, h = 0.07 P = .41, h = 0.01 
By multiple screenings     
 First screening only at ∼18 mo without regard for results of second screening 31.8 (25.9–37.7)a 96.6 (96.3–97.0) 17.0 (13.5–20.4) 98.5 (98.3–98.7) 
 Second screening only at ∼24 mo without regard for results of second screening 39.8 (33.5–46.0) 97.4 (97.1–97.7) 24.7 (20.4–29.1) 98.7 (98.5–98.9) 
 Comparison of accuracy by screening, first vs second P = .07, h = 0.17 P = .001, h = 0.04 P = .006, h = 0.19 P = .25, h = 0.02 
 First or second screening positive 51.1 (44.7–57.4)a 94.8 (94.4–95.2) 17.6 (14.8–20.5) 98.9 (98.7–99.1) 
 Comparison of accuracy by no. of screenings, first screening vs both screenings P < .001, h = 0.39a P < .001, h = 0.09 P = .77, h = 0.02 P = .01, h = 0.04 
By sex     
 Female 39.6 (30.1–49.1) 95.1 (94.7–95.5) 7.7 (5.4–10.0)a 99.4 (99.2–99.5) 
 Male 38.5 (33.5–43.6) 94.6 (94.2–95.1) 19.9 (16.9–22.9)a 97.8 (97.5–98.1) 
 Comparison of accuracy by sex P = .85, h = 0.02 P = .15, h = 0.02 P < .001, h = 0.36a P < .001, h = 0.14 
By gestational age     
 Full term, 37+ wk 35.8 (30.8–40.8)a 95.4 (95.1–95.8)a 14.8 (12.4–17.2) 98.6 (98.4–98.7) 
 Premature, <37 wk 54.3 (42.6–66.0)a 89.3 (87.9–90.6)a 15.0 (10.6–19.4) 98.2 (97.6–98.8) 
 Comparison of accuracy by gestational age P = .004, h = 0.37a P < .001, h = 0.24a P = .90, h = 0.01 P = .31, h = 0.02 
By language documented in the EHR     
 English only 38.5 (33.9–43.1) 95.2 (94.9–95.5)a 15.3 (13.2–17.5)a 98.6 (98.4–98.7) 
 Language other than English 43.5 (23.2–63.7) 86.9 (84.6–89.2)a 8.5 (3.4–13.5)a 98.2 (97.3–99.2) 
 Comparison of accuracy by language P = .63, h = 0.10 P < .001, h = 0.30a P = .046, h = 0.21a P = .46, h = 0.03 
By race     
 White 37.7 (30.4–45.1) 97.9 (97.6–98.1)a 24.0 (18.9–29.2)a 98.9 (98.6–99.1) 
 Black 40.7 (33.5–47.8) 91.7 (91.0–92.3)a 11.7 (9.2–14.2)a 98.3 (98.0–98.6) 
 Comparison of accuracy by race vs white P = .58, h = 0.06 P < .001, h = 0.29a P < .001, h = 0.33a P = .002, h = 0.05 
 Asian  30.0 (13.6–46.4) 90.4 (88.3–92.4)a 10.8 (4.2–17.5)a 97.1 (95.8–98.3) 
 Comparison of accuracy by race vs white P = .42, h = 0.16 P < .001, h = 0.34a P = .01, h = 0.35a P < .001, h = 0.13 
 Other or multiple races 40.0 (28.9–51.1) 93.8 (93.0–94.7)a 13.4 (8.9–17.9)a 98.5 (98.1–98.9) 
 Comparison of accuracy by race vs white P = .73, h = 0.05 P < .001, h = 0.21a P = .003, h = 0.28a P = .11, h = 0.03 
By ethnicity     
 Hispanic or Latino 42.1 (26.4–57.8) 92.2 (90.8–93.6) 12.5 (6.8–18.2) 98.4 (97.7–99.0) 
 Not Hispanic or Latino 38.5 (33.8–43.1) 95.1 (94.7–95.4) 14.9 (12.8–17.0) 98.6 (98.4–98.7) 
 Comparison of accuracy by ethnicity P = .66, h = 0.07 P < .001, h = 0.12 P = .47, h = 0.07 P = .56, h = 0.02 
By insurance payer at screening visit     
 Public insurance/Medicaid 42.8 (36.3–49.3) 91.0 (90.4–91.6)a 11.3 (9.1–13.4)a 98.3 (98.1–98.6) 
 Private insurance 34.4 (28.2–40.5) 97.6 (97.3–97.9)a 22.1 (17.8–26.4)a 98.7 (98.5–98.9) 
 Comparison of accuracy by insurance P = .07, h = 0.17 P < .001, h = 0.30a P < .001 h = 0.29a P = .05, h = 0.03 
By primary care site type     
 Urban 40.2 (33.6–46.8) 91.5 (90.9–92.1)a 12.0 (9.6–14.4) 98.1 (97.8–98.5) 
 Suburban 37.5 (31.4–43.6) 96.8 (96.5–97.1)a 18.5 (15.0–21.9) 98.8 (98.6–99.0) 
 Comparison of accuracy by site type P = .56, h = 0.06 P < .001, h = 0.23a P = .002, h = 0.18 P < .001, h = 0.05 
By median income     
 Lower income 39.0 (32.8–45.2) 92.3 (91.8–92.9)a 11.8 (9.5–14.0)a 98.3 (98.0–98.6) 
 Higher income 38.7 (32.1–45.2) 97.0 (96.7–97.3)a 20.4 (16.5–24.4)a 98.8 (98.6–99.0) 
 Comparison of accuracy by income P = .94, h = 0.01 P < .001, h = 0.22a P < .001, h = 0.24a P = .006, h = 0.04 

CI, confidence interval.

a

These comparisons are statistically significant (P < .05) and clinically meaningful (h ≥ 0.20).

Effect of Age and Multiple Screenings

Across all first-time screenings, screenings at older ages (21–26 months) were more sensitive than at younger ages (16–20 months; 48.8% vs 35.1%). In the AAP subgroup of children screened twice, the second screening at 24 months was marginally more sensitive (39.8% vs 31.8%) and had higher PPV (24.7% vs 16.4%) than the first screening at 18 months. Combining results from 18- and 24-month screenings yielded greater sensitivity than either screening alone (51.1%). See Table 4.

Effect of Child and/or Family Characteristics

Despite comparable sensitivities across racial groups, specificity and PPV were higher in white children (97.9%; 24.0%) compared with black children (91.7%; 11.7%), Asian children (90.4%; 10.8%), and those from other or multiple racial groups (93.8%; 13.4%). Differences were not observed between black, Asian, and other or multiple racial groups, or by ethnicity.

Higher specificity and PPV were observed in children with English-only exposure compared with children with non-English exposure (95.2% vs 86.9%; 15.3% vs 8.5%) as well as children from higher- versus lower-income families (97.0% vs 92.3%; 20.4% vs 11.8%). The same pattern was observed for insurance payer because children with private insurance had higher specificity (97.6% vs 91.0%) and PPV (22.1% vs 11.3%). This pattern was also observed for practice type because specificity was higher in children screened in suburban sites (96.8% vs 91.5%) as was PPV, although this effect size fell below the cutoff (h = 0.18; 18.5% vs 12.0%).

PPV was higher in boys than in girls (19.9% vs 7.7%). Sensitivity was higher in children born premature (54.3% vs 35.8%), but specificity was lower (89.3% vs 95.4%).

Children requiring the follow-up interview (ie, scores of 3–7) who received it were more likely to be full term (P = .007; OR = 1.41), have lower incomes (P < .001; OR = 1.65), and be from urban practices (P < .001; OR = 1.60) than those who did not receive the follow-up interview. Black children were also more likely than white (P = .001; OR = 1.48), Asian (P = .06; OR = 1.40), or children of other or multiple races (P < .001; OR = 1.64) to receive the follow-up interview. Children without ASD were also more likely to receive the follow-up interview than children with ASD (P = .02; OR = 1.54).

PPV was examined separately on the basis of the presence or absence of follow-up interview results. PPV was 34.8% in children with a questionnaire score of 8+ (ie, follow-up interview bypassed, n = 201). PPV was 38.2% in children who continued to screen positive after receiving the follow-up interview (n = 34). PPV was 9.6% in those that did not receive the interview (n = 967). Other metrics could not be calculated separately by follow-up results because they are calculated by using screen negatives, and it is not possible to know how many of the 967 children who did not receive the interview would have screened negative.

Excluding all follow-up interview data (ie, M-CHAT/F questionnaire results), sensitivity was 45.2%, specificity was 91.7%, PPV was 11.0%, and NPV was 98.7%.

Kaplan-Meier survival curves revealed that the mean time to diagnosis was significantly shorter for children with ASD who screened positive than for those who screened negative (mean difference = 7.45 months; P < .001).

In this study we examined the M-CHAT/F’s accuracy within a universally screened cohort. The results revealed high screening rates, achieved through robust EHR support for screening. Systematic follow-up of children who screened positive and negative allowed an estimation of sensitivity, specificity, PPV, and NPV. The M-CHAT/F’s sensitivity to detect ASD was just 39%, indicating that the majority of children later diagnosed with ASD screened negative. PPV was just 15%, an estimate consistent with recent large (but not universally screened) studies conducted in Norway and Malaysia.14,15  However, this estimate was much lower than that found in some US-based studies conducted in research settings, likely partially because of different prevalence rates across studies.9,21  PPV improved substantially when any diagnosis or concern was considered (72%). Specificity and NPV for ASD were high (95% and 99%, respectively), but it is important to remember that with low prevalence and screen-positive rates, specificity and NPV will tend toward high rates.

Although the M-CHAT/F identified fewer children with ASD than expected, those who did screen positive received an ASD diagnosis 7 months earlier than those who screened negative. This suggests that a positive M-CHAT/F screening may have contributed to an earlier diagnosis for children with ASD in this cohort. However, continued research is needed to understand the specific effect of a positive screen on age of diagnosis.

The M-CHAT/F was significantly more sensitive at older ages (49% at 21–26 months) than at younger ages (35% at 16–20 months). However, PPV did not improve with age (16% vs 14%), which is in contrast to findings from some previous studies.12  PPV did improve for repeated screenings; 25% of children who screened positive at the second screening had ASD (regardless of results of the first screening) compared with 17% for the first screen. These results highlight the potential importance of screening twice as well as the difficulty of accurate screening at ∼18 months.

Sensitivity was higher, but specificity was lower for premature children compared with those born full term, consistent with several other studies in very premature children.2325  PPV was lower in girls (8%) than in boys (20%), but it is unclear whether this was because of delayed diagnosis for some girls or poorer M-CHAT/F performance in girls, but future research should be used to examine sex differences in each stage of detection, from screening to diagnosis.

Although electronic screening yielded high screening rates, when children were missed, they were more likely to be children of color, from lower-income households, seen in an urban practice, receive public insurance/Medicaid, and be exposed to a language other than English. These same children were also less likely to present for 2 well-child visits, resulting in disparities in receiving care according to AAP guidelines.

Despite these disparities, screening rates among children from traditionally underrepresented groups were still relatively high (>80%). Children of color and those from lower-income households screened positive at 2 to 3 times the rates of white, higher-income, privately insured, and suburban children. Elevated screen-positive rates resulted in somewhat higher sensitivity to detect ASD but also higher false-positive rates (ie, lower PPV and specificity) in these groups. It is suggested by these data that disparities in age of diagnosis are likely preceded by disparities in screening rates and differential accuracy of screening tools for children from underrepresented and underresourced groups. Future research should attempt to disentangle the effects of race and/or ethnicity, income, language, and primary care setting to better understand the role of child-, family-, and practice-level variables on screening.

With regard to the follow-up interview, pediatricians did not complete the interview systematically but instead may have used clinical judgement when deciding when to administer. Children who received the follow-up interview were significantly less likely to have ASD, and when children received the follow-up, almost all (95%) screened negative. Thus, pediatricians may have used the interview to confirm a clinical opinion of “not ASD” and chosen to skip the interview and go straight to referral when they suspected ASD. Additional research is needed to disentangle differences in follow-up interview rates by race and income; this may have been a clinical adaptation to artificially high screen-positive rates or it may incorrectly delay diagnosis for these children. However, the follow-up interview did appear to reduce false-negatives (ie, improve PPV), underscoring the importance of this step of the screening process.

There are limitations to the current study, largely surrounding the real-world nature of this cohort. Although this epidemiological study represented all children within the CHOP primary care network, findings may not generalize to other populations, particularly those with less access to ASD specialty care. Diagnoses were given in real-world clinical settings rather than through rigorous research studies. Diagnostic information was also only available through 4 to 8 years of age, so the M-CHAT/F’s accuracy may differ as children in this cohort age.

The M-CHAT/F, rather than the M-CHAT–R/F, was used, although accuracy of these 2 versions is comparable.9  As indicated above, not all eligible children received the follow-up interview, which likely downwardly biased specificity and PPV and upwardly biased sensitivity and NPV in that some would have screened negative if they had received the follow-up interview. Thus, this study cannot estimate the accuracy of perfect M-CHAT/F administration but instead provides critical information on how well this tool detects ASD in a real-world, universally screened cohort.

These results suggest that universal screening in primary care is possible when supported by electronic administration and EHR integration. However, universal screening and systematic follow-up revealed low accuracy of the M-CHAT/F, particularly for children of color and those from lower-income households. Although some may interpret these findings as evidence against universal screening, we caution against this interpretation given the earlier age of diagnosis for screen positives. Instead, results suggest that augmentative screening methods should be developed to detect more children through universal screening efforts and reduce disparities. Promising new methods include parent-report tools that are supported by picture or video models26  and direct data gathering methods that leverage technological advances in computing and machine learning.27,28  However, any new method should be tested in cohorts that are universally screened and systematically followed-up to reduce the bias associated with screening and following selected populations.

We thank R. Christopher Sheldrick for his helpful feedback on this article, as well as the providers and families who contributed data to this project through clinical care at CHOP.

Dr Guthrie conceptualized and designed the study, performed data analysis and interpretation, and drafted the initial manuscript; Drs Wallis and Miller contributed substantially to the study design and helped to draft the initial manuscript; Ms Brooks and Ms Dudley contributed substantially to data acquisition; Dr Gerdes contributed substantially to the study design, data acquisition, and interpretation; Drs Bennett, Levy, Pandey, and Schultz contributed substantially to the study design and data interpretation; and all authors reviewed and revised the manuscript, approved the final manuscript as submitted, and agree to be accountable for all aspects of the work.

FUNDING: Funded by the Allerton Foundation, Eagles Charitable Foundation, and the National Institute of Mental Health (R03MH116356). Funded by the National Institutes of Health (NIH).

COMPANION PAPER: A companion to this article can be found online at www.pediatrics.org/cgi/doi/10.1542/peds.2019-0925.

AAP

American Academy of Pediatrics

ADHD

attention-deficit/hyperactivity disorder

ASD

autism spectrum disorder

CHOP

Children’s Hospital of Philadelphia

EHR

electronic health record

M-CHAT

Modified Checklist for Autism in Toddlers without Follow-Up

M-CHAT/F

Modified Checklist for Autism in Toddlers with Follow-Up

M-CHAT–R/F

Modified Checklist for Autism in Toddlers, Revised, with Follow-Up

NPV

negative predictive value

OR

odds ratio

PPV

positive predictive value

1
Baio
J
,
Wiggins
L
,
Christensen
DL
, et al
.
Prevalence of autism spectrum disorder among children aged 8 years - Autism and Developmental Disabilities Monitoring Network, 11 sites, United States, 2014 [published correction appears in MMWR Morb Mortal Wkly Rep. 2018;67(45):1279]
.
MMWR Surveill Summ
.
2018
;
67
(
6
):
1
21
2
Mandell
DS
,
Morales
KH
,
Xie
M
, et al
.
Age of diagnosis among Medicaid-enrolled children with autism, 2001-2004
.
Psychiatr Serv
.
2010
;
61
(
8
):
822
829
3
Mandell
DS
,
Listerud
J
,
Levy
SE
,
Pinto-Martin
JA
.
Race differences in the age at diagnosis among Medicaid-eligible children with autism
.
J Am Acad Child Adolesc Psychiatry
.
2002
;
41
(
12
):
1447
1453
4
Dawson
G
,
Rogers
S
,
Munson
J
, et al
.
Randomized, controlled trial of an intervention for toddlers with autism: the Early Start Denver Model
.
Pediatrics
.
2010
;
125
(
1
).
5
Wetherby
AM
,
Guthrie
W
,
Woods
J
, et al
.
Parent-implemented social intervention for toddlers with autism: an RCT
.
Pediatrics
.
2014
;
134
(
6
):
1084
1093
6
Warren
Z
,
McPheeters
ML
,
Sathe
N
,
Foss-Feig
JH
,
Glasser
A
,
Veenstra-Vanderweele
J
.
A systematic review of early intensive intervention for autism spectrum disorders
.
Pediatrics
.
2011
;
127
(
5
).
7
Johnson
CP
,
Myers
SM
;
American Academy of Pediatrics Council on Children With Disabilities
.
Identification and evaluation of children with autism spectrum disorders
.
Pediatrics
.
2007
;
120
(
5
):
1183
1215
8
Siu
AL
,
Bibbins-Domingo
K
,
Grossman
DC
, et al;
US Preventive Services Task Force (USPSTF)
.
Screening for autism spectrum disorder in young children: US Preventive Services Task Force recommendation statement
.
JAMA
.
2016
;
315
(
7
):
691
696
9
Robins
DL
,
Casagrande
K
,
Barton
M
, et al
.
Validation of the modified checklist for Autism in toddlers, revised with follow-up (M-CHAT-R/F)
.
Pediatrics
.
2014
;
133
(
1
):
37
45
10
Robins
DL
,
Fein
D
,
Barton
ML
,
Green
JA
.
The Modified Checklist for Autism in Toddlers: an initial study investigating the early detection of autism and pervasive developmental disorders
.
J Autism Dev Disord
.
2001
;
31
(
2
):
131
144
11
Kleinman
JM
,
Robins
DL
,
Ventola
PE
, et al
.
The modified checklist for autism in toddlers: a follow-up study investigating the early detection of autism spectrum disorders
.
J Autism Dev Disord
.
2008
;
38
(
5
):
827
839
12
Pandey
J
,
Verbalis
A
,
Robins
DL
, et al
.
Screening for autism in older and younger toddlers with the Modified Checklist for Autism in Toddlers
.
Autism
.
2008
;
12
(
5
):
513
535
13
Canal-Bedia
R
,
García-Primo
P
,
Martín-Cilleros
MV
, et al
.
Modified checklist for autism in toddlers: cross-cultural adaptation and validation in Spain
.
J Autism Dev Disord
.
2011
;
41
(
10
):
1342
1351
14
Toh
TH
,
Tan
VW
,
Lau
PS
,
Kiyu
A
.
Accuracy of Modified Checklist for Autism in Toddlers (M-CHAT) in detecting autism and other developmental disorders in community clinics
.
J Autism Dev Disord
.
2018
;
48
(
1
):
28
35
15
Stenberg
N
,
Bresnahan
M
,
Gunnes
N
, et al
.
Identifying children with autism spectrum disorder at 18 months in a general population sample
.
Paediatr Perinat Epidemiol
.
2014
;
28
(
3
):
255
262
16
Brenner
H
,
Gefeller
O
.
Variation of sensitivity, specificity, likelihood ratios and predictive values with disease prevalence
.
Stat Med
.
1997
;
16
(
9
):
981
991
17
Pennsylvania Department of Human Services
. Pennsylvania Early and Periodic Screening, Diagnosis, and Treatment (EPSDT) program periodicity schedule and coding matrix. Available at: https://www.keystonefirstpa.com/pdf/provider/resources/epsdt/periodicity-schedule.pdf. Accessed September 4, 2019
18
Christensen
DL
,
Baio
J
,
Van Naarden Braun
K
, et al;
Centers for Disease Control and Prevention (CDC)
.
Prevalence and characteristics of autism spectrum disorder among children aged 8 years--Autism and Developmental Disabilities Monitoring Network, 11 Sites, United States, 2012 [published correction appears in MMWR Morb Mortal Wkly Rep. 2016;65(15):404]
.
MMWR Surveill Summ
.
2016
;
65
(
3
):
1
23
19
Burke
JP
,
Jain
A
,
Yang
W
, et al
.
Does a claims diagnosis of autism mean a true case?
Autism
.
2014
;
18
(
3
):
321
330
20
Coleman
KJ
,
Lutsky
MA
,
Yau
V
, et al
.
Validation of autism spectrum disorder diagnoses in large healthcare systems with electronic medical records
.
J Autism Dev Disord
.
2015
;
45
(
7
):
1989
1996
21
Chlebowski
C
,
Robins
DL
,
Barton
ML
,
Fein
D
.
Large-scale use of the modified checklist for autism in low-risk toddlers
.
Pediatrics
.
2013
;
131
(
4
).
22
Christensen
DL
,
Maenner
MJ
,
Bilder
D
, et al
.
Prevalence and characteristics of autism spectrum disorder among children aged 4 years - Early Autism and Developmental Disabilities Monitoring Network, seven sites, United States, 2010, 2012, and 2014
.
MMWR Surveill Summ
.
2019
;
68
(
2
):
1
19
23
Guy
A
,
Seaton
SE
,
Boyle
EM
, et al
.
Infants born late/moderately preterm are at increased risk for a positive autism screen at 2 years of age
.
J Pediatr
.
2015
;
166
(
2
):
269
275.e3
24
Luyster
RJ
,
Kuban
KC
,
O’Shea
TM
, et al;
ELGAN Study investigators
.
The Modified Checklist for Autism in Toddlers in extremely low gestational age newborns: individual items associated with motor, cognitive, vision and hearing limitations
.
Paediatr Perinat Epidemiol
.
2011
;
25
(
4
):
366
376
25
Kuban
KC
,
O’Shea
TM
,
Allred
EN
,
Tager-Flusberg
H
,
Goldstein
DJ
,
Leviton
A
.
Positive screening on the Modified Checklist for Autism in Toddlers (M-CHAT) in extremely low gestational age newborns
.
J Pediatr
.
2009
;
154
(
4
):
535
540
.
e1
26
Janvier
YM
,
Coffield
CN
,
Harris
JF
,
Mandell
DS
,
Cidav
Z
.
The Developmental Check-In: development and initial testing of an autism screening tool targeting young children from underserved communities
.
Autism
.
2019
;
23
(
3
):
689
698
27
Campbell
K
,
Carpenter
KL
,
Hashemi
J
, et al
.
Computer vision analysis captures atypical attention in toddlers with autism
.
Autism
.
2019
;
23
(
3
):
619
628
28
Kanne
SM
,
Carpenter
LA
,
Warren
Z
.
Screening in toddlers and preschoolers at risk for autism spectrum disorder: evaluating a novel mobile-health screening tool
.
Autism Res
.
2018
;
11
(
7
):
1038
1049

Competing Interests

POTENTIAL CONFLICT OF INTEREST: The authors have indicated they have no potential conflicts of interest to disclose.

FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.

Supplementary data