Correct diagnosis is essential for the appropriate clinical management of attention-deficit/hyperactivity disorder (ADHD) in children and adolescents.
This systematic review provides an overview of the available diagnostic tools.
We identified diagnostic accuracy studies in 12 databases published from 1980 through June 2023.
Any ADHD tool evaluation for the diagnosis of ADHD, requiring a reference standard of a clinical diagnosis by a mental health specialist.
Data were abstracted and critically appraised by 1 reviewer and checked by a methodologist. Strength of evidence and applicability assessments followed Evidence-based Practice Center standards.
In total, 231 studies met eligibility criteria. Studies evaluated parental ratings, teacher ratings, youth self-reports, clinician tools, neuropsychological tests, biospecimen, EEG, and neuroimaging. Multiple tools showed promising diagnostic performance, but estimates varied considerably across studies, with a generally low strength of evidence. Performance depended on whether ADHD youth were being differentiated from neurotypically developing children or from clinically referred children.
Studies used different components of available tools and did not report sufficient data for meta-analytic models.
A valid and reliable diagnosis of ADHD requires the judgment of a clinician who is experienced in the evaluation of youth with and without ADHD, along with the aid of standardized rating scales and input from multiple informants across multiple settings, including parents, teachers, and youth themselves.
Attention-deficit/hyperactivity disorder (ADHD) is one of the most prevalent neurodevelopmental conditions in youth. Its prevalence has remained constant at ∼5.3% worldwide over the years, and diagnostic criteria have remained constant when based on rigorous diagnostic procedures.1 Clinical diagnoses, however, have increased steadily over time,2 and currently, ∼10% of US children receive an ADHD diagnosis.3 Higher rates of clinical compared with research-based diagnoses are because of an increasing clinician recognition of youth who have ADHD symptoms that are functionally impairing but do not fully meet formal diagnostic criteria.4 The higher diagnostic rates over time in clinical samples also results from youth receiving a diagnosis incorrectly. Some youth, for example, are misdiagnosed as having ADHD when they have symptoms of other disorders that overlap with ADHD symptoms, such as difficulty concentrating, which occurs in many other conditions.5 Moreover, ADHD is more than twice as likely to be diagnosed in boys than in girls,3 in lower-income families,6 and in white compared with nonwhite youth7 ; differences that derive at least in part from diagnostic and cultural biases.8 –11
Improving clinical diagnostic accuracy is essential to ensure that youth who truly have ADHD benefit from receiving treatment without delay. Similarly, youth who do not have ADHD should not be diagnosed since an incorrect diagnosis risks exposing them to unbeneficial treatments.12,13 Clinician judgement alone, however, especially by nonspecialist clinicians, is poor in diagnosing ADHD14 compared with expert, research-grade diagnoses made by mental health clinicians.15 Accurately diagnosing ADHD is difficult because diagnoses are often made using subjective clinical impressions, and putative diagnostic tools have a confusing, diverse, and poorly described evidence base that is not widely accessible. The availability of valid diagnostic tools would especially help to reduce misdiagnoses from cultural biases and symptom overlap with ADHD.12,16 –19
This review summarizes evidence for the performance of tools for children and adolescents with ADHD. We did not restrict to a set of known diagnostic tools but instead explored the range of available diagnostic tools, including machine-learning assisted and virtual reality-based tools. The review aimed to assess how diagnostic performance varies by clinical setting and patient characteristics.
Methods
The review aims were developed in consultation with the Agency for Healthcare Research and Quality (AHRQ), the Patient-Centered Outcomes Research Institute, the topic nominator American Academy of Pediatrics, key informants, a technical expert panel (TEP), and public input. The TEP reviewed the protocol and advised on key outcomes. Subgroup analyses and key outcomes were prespecified. The review is registered in PROSPERO (CRD42022312656) and the protocol is available on the AHRQ Web site as part of a larger evidence report on ADHD. The systematic review followed Methods of the (AHRQ) Evidence-based Practice Center Program.20
Selection Criteria
Population: age <18 years.
Interventions: any ADHD tool for the diagnosis of ADHD.
Comparators: diagnosis by a mental health specialist, such as a psychologist, psychiatrist, or other provider, who often used published scales or semistructured diagnostic interviews to ensure a reliable DSM-based diagnosis of ADHD.
Key outcomes: diagnostic accuracy (eg, sensitivity, specificity, area under the curve).
Setting: any.
Study design: diagnostic accuracy studies.
Other: English language, published from 1980 to June 2023.
Search Strategy
We searched PubMed, Embase, PsycINFO, ERIC, and ClinicalTrials.gov. We identified reviews for reference-mining through PubMed, Cochrane Database of Systematic Reviews, Campbell Collaboration, What Works in Education, PROSPERO, ECRI Guidelines Trust, G-I-N, and ClinicalKey. The peer reviewed strategy is in the Supplemental Appendix. All citations were screened by trained literature reviewers supported by machine learning (Fig 1). Two independent reviewers assessed full text studies for eligibility. The TEP reviewed studies to ensure all were captured. Publications reporting on the same participants were consolidated into 1 record.
Data Extraction
The data abstraction form included extensive guidance to aid reproducibility and standardization in recording study details, results, risk of bias, and applicability. One reviewer abstracted data and a methodologist checked accuracy and completeness. Data are publicly available in the Systematic Review Data Repository.
Risk of Bias and Applicability
We assessed characteristics pertaining to patient selection, index test, reference standard, flow and timing that may have introduced bias, and evaluated applicability of study results, such as whether the test, its conduct, or interpretation differed from how the test is used in clinical practice.21,22
Data Synthesis and Analysis
We differentiated parent, teacher, and youth self-report ratings; tools for clinicians; neuropsychological tests; biospecimens; EEG; and neuroimaging. We organized analyses according to prespecified outcome measures. A narrative overview summarized the range of diagnostic performance for key outcomes. Because lack of reported detail in many individual studies hindered use of meta-analytic models, we created summary figures to document the diagnostic performance reported in each study. We used meta-regressions across studies to assess the effects of age, comorbidities, racial and ethnic composition, and diagnostic setting (differentiating primary care, specialty care, school settings, mixed settings, and not reported) on diagnostic performance. One researcher with experience in use of specified standardized criteria23 initially assessed the overall strength of evidence (SoE) (see Supplemental Appendix) for each study, then discussed it with the study team to communicate our confidence in each finding.
Results
We screened 23 139 citations and 7534 publications retrieved as full text against the eligibility criteria. In total, 231 studies reported in 290 publications met the eligibility criteria (see Fig 1).
Methodological quality of the studies varied. Selection bias was likely in two-thirds of studies; several were determined to be problematic in terms of reported study flow and timing of assessments (eg, not stating whether diagnosis was known before the results of the index test); and several lacked details on diagnosticians or diagnostic procedures (Supplemental Fig 1). Applicability concerns limited the generalizability of findings (Supplemental Fig 2), usually because youth with comorbidities were excluded. Many different tools were assessed within the broader categories (eg, within neuropsychological tests), and even when reporting on the same diagnostic tool, studies often used different components of the tool (eg, different subscales of rating scales), or they combined components in a variety of ways (eg, across different neuropsychological test parameters).
The evidence table (Supplemental Table 10, Supplemental Appendix) shows each study’s finding. The following highlights key findings across studies.
Parent Ratings
Fifty-nine studies used parent ratings to diagnose ADHD (Fig 2). The most frequently evaluated tool was the CBCL (Child Behavior Checklist), alone or in combination with other tools, often using different score cutoffs for diagnosis, and evaluating different subscales (most frequently the attention deficit/hyperactivity problems subscale). Sensitivities ranged from 38% (corresponding specificity = 96%) to 100% (specificity = 4% to 92%).24,25
Area under the curve (AUC) for receiver operator characteristic curves ranged widely from 0.55 to 0.95 but 3 CBCL studies reported AUCs of 0.83 to 0.84.26 –28 Few studies reported measurement of reliability. SoE was downgraded for study limitation (lack of detailed reporting), imprecision (large performance variability), and inconsistent findings (Supplemental Table 1).
Teacher Ratings
Twenty-three studies used teacher ratings to diagnose ADHD (Fig 2). No 2 studies reported on rater agreement, internal consistency, or test-retest reliability for the same teacher rating scale. The highest sensitivity was 97% (specificity = 26%).25 The Teacher Report Form, alone or in combination with Conners teacher rating scales, yielded sensitivities of 72% to 79%29 and specificities of 64% to 76%.30,32 reported AUCs ranged from 0.65 to 0.84.32 SoE was downgraded to low for imprecision (large performance variability) and inconsistency (results for specific tools not replicated), see Supplemental Table 2.
Youth Self-Reports
Six studies used youth self-reports to diagnose ADHD. No 2 studies used the same instrument. Sensitivities ranged from 53% (specificity = 98%) to 86% (specificity = 70%).35 AUCs ranged from 0.56 to 0.85.36 We downgraded SoE for domain inconsistency (only 1 study reported on a given tool and outcome), see Supplemental Table 3.
Combined Rating Scales
Thirteen studies assessed diagnostic performance of ratings combined across informants, often using machine learning for variable selection. Only 1 study compared performance of combined data to performance from single informants, finding negligible improvement (AUC youth = 0.71; parent = 0.85; combined = 0.86).37 Other studies reported on limited outcome measures and used ad hoc methods to combine information from multiple informants. The best AUC was reported by a machine learning supported study combining parent and teacher ratings (AUC = 0.98).38
Additional Clinician Tools
Twenty-four studies assessed additional tools, such as interview guides, that can be used by clinicians to aid diagnosis of ADHD. Sensitivities varied, ranging from 67% (specificity = 65%) to 98% (specificity = 100%); specificities ranged from 36% (sensitivity = 89%) to 100% (sensitivity = 98%).39 Some of the tools measured activity levels objectively using an actometer or commercially available activity tracker, either alone or as part of a diagnostic test battery. Reported performance was variable (sensitivity range 25% to 100%,40 specificity range 66% to 100%,40 AUCs range 0.75–0.999641 ). SoE was downgraded for imprecision (large performance variability) and inconsistency (outcomes and results not replicated), see Supplemental Table 4.
Neuropsychological Tests
Seventy-four studies used measures from various neuropsychological tests, including continuous performance tests (CPTs). Four of these included 3- and 4-year-old children.42 –44 A large majority used a CPT, which assessed omission errors (reflecting inattention), commission errors (impulsivity), and reaction time SD (response time variability). Studies varied in use of traditional visual CPTs, such as the Test of Variables of Attention, more novel, multifaceted “hybrid” CPT paradigms, and virtual reality CPTs built upon environments designed to emulate real-world classroom distractibility. Studies used idiosyncratic combinations of individual cognitive measures to achieve the best performance, though many reported on CPT attention and impulsivity measures.
Sensitivity for all neuropsychological tests ranged from 22% (specificity = 96%) to 100% (specificity = 100%)45 (Fig 3), though the latter study reported performance for unique composite measures without replication. Specificities ranged from 22% (sensitivity = 91%)46 to 100% (sensitivity = 100% to 75%).45,47 AUCs ranged from 0.59 to 0.93.48 Sensitivity for all CPT studies ranged from 22% ( specificity = 96) to 100% (specificity = 75%).49 Specificities for CPTs ranged from 22% (sensitivity = 91%) to 100% (sensitivity = 89%)47 (Fig 3). AUCs ranged from 0.59 to 0.93.50,51 SoE was deemed low for imprecise studies (large performance variability), see Supplemental Table 5.
Biospecimen
Seven studies assessed blood or urine biomarkers to diagnose ADHD. These measured erythropoietin or erythropoietin receptor, membrane potential ratio, micro RNA levels, or urine metabolites. Sensitivities ranged from 56% (specificity = 95%) to 100% (specificity = 100% for erythropoietin and erythropoietin receptors levels).52 Specificities ranged from 25% (sensitivity = 79%) to 100% (sensitivity = 100%).52 AUCs ranged from 0.68 to 1.00.52 Little information was provided on reliability of markers or their combinations. SoE was downgraded for inconsistent and imprecise studies (Supplemental Table 6).
EEG
Forty-five studies used EEG markers to diagnose ADHD. EEG signals were obtained in a variety of patient states, even during neuropsychological test performance. Two-thirds used machine learning algorithms to select classification parameters. Several combined EEG with demographic variables or rating scales. Sensitivity ranged widely from 46% to 100% (corresponding specificities 74 and 71%).53,54 One study that combined EEG with demographics data supported by machine learning reported perfect sensitivity and specificity.54 Specificity was also variable and ranged from 38% (sensitivity = 95%) to 100% (specificities = 71% or 100%).53 –56 Reported AUCs ranged from 0.63 to 1.0.57,58 SoE was downgraded for study imprecision (large performance variability) and limitations (diagnostic approaches poorly described), see Supplemental Table 7.
Neuroimaging
Nineteen studies used neuroimaging for diagnosis. One public data set (ADHD-200) produced several analyses. All but 2 used MRI: some functional MRI (fMRI), some structural, and some in combination, with or without magnetic resonance spectroscopy (2 used near-infrared spectroscopy). Most employed machine learning to detect markers that optimized diagnostic classifications. Some combined imaging measures with demographic or other clinical data in the prediction model. Sensitivities ranged from 42% (specificity = 95%) to 99% (specificity = 100%) using resting state fMRI and a complex machine learning algorithm56 to differentiate ADHD from neurotypical youth. Specificities ranged from 55% (sensitivity = 95%) to 100%56 using resting state fMRI data. AUCs ranged from 0.58 to over 0.99,57 SoE was downgraded for imprecision (large performance variability) and study limitations (diagnostic models are often not well described, and the number and type of predictor variables entering the model were unclear). Studies generally did not validate diagnostic algorithms or assess performance measures in an independent sample (Supplemental Table 8).
Variation in Diagnostic Accuracy With Clinical Setting or Patient Subgroup
Regression analyses indicated that setting was associated with both sensitivity (P = .03) and accuracy (P = .006) but not specificity (P = .68) or AUC (P = .28), with sensitivities lowest in primary care (Fig 4). Sensitivity, specificity, and accuracy were also lower when differentiating youth with ADHD from a clinical sample than from typically developing youth (sensitivity P = .04, specificity P < .001, AUC P < .001) (Fig 4), suggesting that clinical population is a source of heterogeneity in diagnostic performance. Findings should be interpreted with caution, however, as they were not obtained in meta-analytic models and, consequently, do not take into account study size or quality.
Supplemental Figs 3–5 in the Supplemental Appendix document effects by age and gender. We did not detect statistically significant associations of age with sensitivity (P = .54) or specificity (P = .37), or associations of the proportion of girls with sensitivity (P = .63), specificity (P = .80), accuracy (P = .34), or AUC (P = .90).
Discussion
We identified a large number of publications reporting on ADHD diagnostic tools. To our knowledge, no prior review of ADHD diagnostic tools has been as comprehensive in the range of tools, outcomes, participant ages, and publication years. Despite the large number of studies, we deemed the strength of evidence for the reported performance measures across all categories of diagnostic tools to be low because of large performance variability across studies and various limitations within and across studies.
Measures for Diagnostic Performance
We required that studies report diagnoses when using the tool compared with diagnoses made by expert mental health clinicians. Studies most commonly reported sensitivity (true-positive rate) and specificity (true-negative rate) when a study-specific diagnostic threshold was applied to measures from the tool being assessed. Sensitivity and specificity depend critically on that study-specific threshold, and their values are inherently a trade-off, such that varying the threshold to increase either sensitivity or specificity reduces the other. Interpreting diagnostic performance in terms of sensitivity and specificity, and comparing those performance measures across studies, is therefore challenging. Consequently, researchers more recently often report performance for sensitivity and specificity in terms of receiver operating characteristics (ROC) curves, a plot of sensitivity versus specificity across the entire range of possible diagnostic thresholds. The area under this ROC curve (AUC) provides an overall, single index of performance that ranges from 0.5 (indicating that the tool provides no information above chance for classification) to 1.0 (indicating a perfect test that can correctly classify all participants as having ADHD and all non-ADHD participants as not having it). AUC values of 90 to 100 are commonly classified as excellent performance; 80 to 90 as good; 70 to 80 as fair; 60 to 70 as poor; and 50 to 60 failed performance.
Available Tools
Most research is available on parental ratings. Overall, AUCs for parent rating scales ranged widely from “poor”58 to “excellent.”59 Analyses restricted to the CBCL, the most commonly evaluated scale, yielded more consistent “good” AUCs for differentiating youth with ADHD from others in clinical samples, but the number of studies contributing data were small. Internal consistency for rating scale items was generally high across most rating scales. Test-retest reliability was good, though only 2 studies reported it. One study reported moderate rater agreement between mothers and fathers for inattention, hyperactivity, and impulsivity symptoms. Few studies included youth under 7 years of age.
AUCs for teacher rating scales ranged from “failed”33 to “good.”34 Internal consistency for scale items was generally high. Teacher ratings demonstrated very low rater agreement with corresponding parent scales, suggesting either a problem with the instruments or a large variability in symptom presentation with environmental context (home or school).
Though data were limited, self-reports from youth seemed to perform less well than corresponding parent and teacher reports, with AUCs ranging from “failed” for CBCL or ASEBA when distinguishing ADHD from other patients33 to “good” for the SWAN in distinguishing ADHD from neurotypical controls.36,37
Studies evaluating neuropsychological tests yielded AUCs ranging from “poor”60,61 to “excellent.”50 Many used idiosyncratic combinations of cognitive measures, which complicates interpretation of the results across studies. Nevertheless, extracting specific, comparable measures of inattention and impulsivity from CPTs yielded diagnostic performance ranging from “poor” to “excellent” in differentiating ADHD youth from neurotypical controls and “fair” in differentiating ADHD youth from other patients.42,60,62 No studies provided an independent replication of diagnosis using the same measure.
Blood biomarkers yielded AUCs ranging from “poor” (serum miRNAs)63 to “excellent” (erythropoietin and erythropoietin receptors levels)52 in differentiating ADHD from neurotypical youth. None have been independently replicated, and test-retest reliability was not reported. Most EEG studies used machine learning for diagnostic classification. AUCs ranged from “poor”64 to “excellent” when differentiating ADHD youth from neurotypical controls.65 Diagnostic performance was not prospectively replicated in any independent samples.
Most neuroimaging studies relied on machine learning to develop diagnostic algorithms. AUCs ranged from “poor”66 to “excellent” for distinguishing ADHD youth from neurotypically developing controls.57 Most studies used pre-existing data sets or repositories to retrospectively discriminate youths with ADHD from neurotypical controls, not from other clinical populations and not prospectively, and none assessed test-retest reliability or the independent reproducibility of findings. Reporting of final mathematical models or algorithms for diagnosis was limited. Activity monitors have the advantage of providing inexpensive, objective, easily obtained, and quantified measures that can potentially be widely disseminated and scaled.
Studies of combined approaches, such as integrating diagnostic tools with clinician impressions, were limited. One study reported increased sensitivity and specificity when an initial clinician diagnosis combined EEG indicators (the reference standard was a consensus diagnosis from a panel of ADHD experts).67 These findings were not independently replicated, however, and no test-retest reliability was reported.
Importance of the Comparator Sample
Many studies aimed to distinguish ADHD youth from neurotypical controls, which is a distinction of limited clinical relevance. In clinically referred youth, most parents, teachers, and clinicians are reasonably confident that something is wrong, even if they are unsure whether the cause of their concern is ADHD. To be informed by a tool that the child is not typically developing is not particularly helpful. Moreover, we cannot know whether diagnostic performance for tools that discriminate ADHD youth only from neurotypical controls is determined by the presence of ADHD or by the presence of any other characteristics that accompany clinical “caseness,” such as the presence of comorbid illnesses or symptoms shared or easily confused with those of other conditions, or the effects of chronic stress or current or past treatment. The clinically more relevant and difficult question is, therefore, how well the tool distinguishes youth with ADHD from those who have other emotional and behavioral problems. Consistent with these conceptual considerations that argue for assessing diagnostic performance in differentiating youth with ADHD from those with other clinical conditions, we found significant evidence that, across all studies, sensitivity, specificity, and AUC were all lower when differentiating youth with ADHD from a clinical sample than when differentiating them from neurotypical youth. These findings also suggest that the comparison population was a significant source of heterogeneity in diagnostic performance.
Clinical Implications
Despite the large number of studies on diagnostic tools, a valid and reliable diagnosis of ADHD ultimately still requires the judgement of a clinician who is experienced in the evaluation of youth with and without ADHD, along with the aid of standardized rating scales and input from multiple informants across multiple settings, including parents, teachers, and youth themselves. Diagnostic tools perform best when the clinical question is whether a youth has ADHD or is healthy and typically developing, rather than when the clinical question is whether a youth has ADHD or another mental health or behavioral problem. Diagnostic tools yield more false-positive and false-negative diagnoses of ADHD when differentiating youth with ADHD from youth with another mental health problem than when differentiating them from neurotypically developing youth.
Scores for rating scales tended to correlate poorly across raters, and ADHD symptoms in the same child varied across settings, indicating that no single informant in a single setting is a gold-standard for diagnosis. Therefore, diagnosis using rating scales will likely benefit from a more complete representation of symptom expression across multiple informants (parents, school personnel, clinicians, and youth) across more than 1 setting (home, school, and clinic) to inform clinical judgement when making a diagnosis, thus, consistent with current guidelines.68 –70 Unfortunately, methods for combining scores across raters and settings that improve diagnosis compared with scores from single raters have not been developed or prospectively replicated.
Despite the widespread use of neuropsychological testing to “diagnose” youth with ADHD, often at considerable expense, indirect comparisons of AUCs suggest that performance of neuropsychological test measures in diagnosing ADHD is comparable to the diagnostic performance of ADHD rating scales from a single informant. Moreover, the diagnostic accuracy of parent rating scales is typically better than neuropsychological test measures in head-to-head comparisons.44,71 Furthermore, the overall SoE for estimates of diagnostic performance with neuropsychological testing is low. Use of neuropsychological test measures of executive functioning, such as the CPT, may help inform a clinical diagnosis, but they are not definitive either in ruling in or ruling out a diagnosis of ADHD. The sole use of CPTs and other neuropsychological tests to diagnose ADHD, therefore, cannot be recommended. We note that this conclusion regarding diagnostic value is not relevant to any other clinical utility that testing may have.
No independent replication studies have been conducted to validate EEG, neuroimaging, or biospecimen to diagnose ADHD, and no clinical effectiveness studies have been conducted using these tools to diagnose ADHD in the real world. Thus, these tools do not seem remotely close to being ready for clinical application to aid diagnosis, despite US Food and Drug Administration approval of 1 EEG measure as a purported diagnostic aid.67,72
Future Research
All studies of diagnostic tools should report data in more detail (ie, clearly report false-positive and -negative rates, the diagnostic thresholds used, and any data manipulation undertaken to achieve the result) to support meta-analytic methods. Studies should include ROC analyses to support comparisons of test performance across studies that are independent of the diagnostic threshold applied to measures from the tool. They should also include assessment of test-retest reliability to help discern whether variability in measures and test performance is a function of setting or of measurement variability over time. Future studies should address the influence of co-occurring disorders on diagnostic performance and how well the tools distinguish youth with ADHD from youth with other emotional and behavioral problems, not simply from healthy controls. More studies should compare the diagnostic accuracy of different test modalities, head-to-head. Independent, prospective replication of performance measures of diagnostic tools in real-world settings is essential before US Food and Drug Administration approval and before recommendations for widespread clinical use.
Research is needed to identify consensus algorithms that combine rating scale data from multiple informants to improve the clinical diagnosis of ADHD, which at present is often unguided, ad hoc, and suboptimal. Diagnostic studies using EEG, neuroimaging, and neuropsychological tests should report precise operational definitions and measurements of the variable(s) used for diagnosis, any diagnostic algorithm employed, the selected statistical cut-offs, and the number of false-positives and false-negatives the diagnostic tool yields to support future efforts at synthetic analyses.
Conclusions
Objective, quantitative neuropsychological test measures of executive functioning correlate only weakly with the clinical symptoms that define ADHD.73 Thus, many youth with ADHD have normal executive functioning profiles on neuropsychological testing, and many who have impaired executive functioning on testing do not have ADHD.74 Future research is needed to understand how test measures of executive functioning and the real-world functional problems that define ADHD map on to one another and how that mapping can be improved.
One of the most important potential uses of systematic reviews and meta-analyses in improving the clinical diagnosis of ADHD and treatment planning would be identification of effect modifiers for the performance of diagnostic tools: determining, for example, whether tools perform better in patients who are younger or older, in ethnic minorities, or those experiencing material hardship, or who have a comorbid illness or specific ADHD presentation. Future studies of ADHD should more systematically address the modifier effects of these patient characteristics. They should make available in public repositories the raw, individual-level data and the algorithms or computer code that will aid future efforts at replication, synthesis, and new discovery for diagnostic tools across data sets and studies.
Finally, no studies meeting our inclusion criteria assessed the consequences of being misdiagnosed or labeled as either having or not having ADHD, the diagnosis of ADHD specifically in preschool-aged children, or the potential adverse consequences of youth being incorrectly diagnosed with or without ADHD. This work is urgently needed.
Acknowledgments
We thank Cynthia Ramirez, Erin Tokutomi, Jennifer Rivera, Coleman Schaefer, Jerusalem Belay, Anne Onyekwuluje, and Mario Gastelum for help with data acquisition. We thank Kymika Okechukwu, Lauren Pilcher, Joanna King, and Robyn Wheatley from the American Academy of Pediatrics (AAP), Jennie Dalton and Paula Eguino Medina from PCORI, Christine Chang and Kim Wittenberg from AHRQ, and Mary Butler from the Minnesota Evidence-based Practice Center. We thank Glendy Burnett, Eugenia Chan, MD, MPH, Matthew J. Gormley, PhD, Laurence Greenhill, MD, Joseph Hagan, Jr, MD, Cecil Reynolds, PhD, Le'Ann Solmonson, PhD, LPC-S, CSC, and Peter Ziemkowski, MD, FAAFP who served as key informants. We thank Angelika Claussen, PhD, Alysa Doyle, PhD, Tiffany Farchione, MD, Matthew J. Gormley, PhD, Laurence Greenhill, MD, Jeffrey M. Halperin, PhD, Marisa Perez-Martin, MS, LMFT, Russell Schachar, MD, Le'Ann Solmonson, PhD, LPC-S, CSC, and James Swanson, PhD who served as a technical expert panel. Finally, we thank Joel Nigg, PhD, and Peter S. Jensen, MD for their peer review of the data.
Drs Peterson and Hempel conceptualized and designed the study, collected data, conducted the analyses, drafted the initial manuscript, and critically reviewed and revised the manuscript; Dr Trampush conducted the critical appraisal; Ms Brown, Ms Maglione, Drs Bolshakova and Padkaman, and Ms Rozelle screened citations and abstracted the data; Dr Miles conducted the analyses; Ms Yagyu designed and executed the search strategy; Ms Motala served as data manager; and all authors provided critical input for the manuscript, approved the final manuscript as submitted, and agree to be accountable for all aspects of the work.
This trial has been registered at PROSPERO (identifier CRD42022312656).
COMPANION PAPER: A companion to this article can be found online at www.pediatrics.org/cgi/doi/10.1542/peds.2024-065787.
Data Sharing: Data are available in SRDRPlus.
FUNDING: The work is based on research conducted by the Southern California Evidence-based Practice Center under contract to the Agency for Healthcare Research and Quality (AHRQ), Rockville, MD (Contract 75Q80120D00009). The Patient-Centered Outcomes Research Institute (PCORI) funded the research (PCORI Publication No. 2023-SR-03). The findings and conclusions in this manuscript are those of the authors, who are responsible for its contents; the findings and conclusions do not necessarily represent the views of AHRQ or PCORI, its Board of Governors, or Methodology Committee. Therefore, no statement in this report should be construed as an official position of PCORI, AHRQ or of the US Department of Health and Human Services.
CONFLICT OF INTEREST DISCLOSURES: The authors have indicated they have no conflicts of interest to disclose.
Comments