CONTEXT:

In the last few decades, data acquisition and processing has seen tremendous amount of growth, thus sparking interest in machine learning (ML) within the health care system.

OBJECTIVE:

Our aim for this review is to provide an evidence map of the current available evidence on ML in pediatrics and adolescent medicine and provide insight for future research.

DATA SOURCES:

A literature search was conducted by using Medline, the Cochrane Library, the Cumulative Index to Nursing and Allied Health Literature Plus, Web of Science Library, and EBSCO Dentistry & Oral Science Source.

STUDY SELECTION:

Articles in which an ML model was assessed for the diagnosis, prediction, or management of any condition in children and adolescents (0–18 years) were included.

DATA EXTRACTION:

Data were extracted for year of publication, geographical location, age range, number of participants, disease or condition under investigation, study methodology, reference standard, type, category, and performance of ML algorithms.

RESULTS:

The review included 363 studies, with subspecialties such as psychiatry, neonatology, and neurology having the most literature. A majority of the studies were from high-income (82%; n = 296) and upper middle-income countries (15%; n = 56), whereas only 3% (n = 11) were from low middle-income countries. Neural networks and ensemble methods were most commonly tested in the 1990s, whereas deep learning and clustering emerged rapidly in the current decade.

LIMITATIONS:

Only studies conducted in the English language could be used in this review.

CONCLUSIONS:

The interest in ML has been growing across various subspecialties and countries, suggesting a potential role in health service delivery for children and adolescents in the years to come.

Artificial intelligence (AI) is a branch of computer science that attempts to emulate human intelligence, whereas machine learning (ML) is a subset in which computers have the ability to learn from data without being programmed.1  The first real-life use of such technology dates back to the 1960s, when the first ever expert system, DENDRAL,2  was created. Although primarily for application within biochemistry, it later paved the way for the use of the foremost computerized decision-support system, MYCIN, to be employed within medicine.3  Since then, the use of ML has increased manifolds in a multitude of sectors.

With increasing computational efficiency and a large amount of data being produced in industries, such as health care, the use of big data analytics has gained momentum.4  ML can play a critical role in analytics because it allows for the ingestion and interpretation of a massive volume of structured and unstructured data to support evidence-based decision-making and action-taking.5  With this new-found interest in the prospects of ML in health care, it has been stated to be the next cornerstone in health care delivery.

Synergistically, academic institutions have become increasingly engaged in ML-related research in all sectors, including health, with ∼212 ML-related publications indexed globally in 1990, which increased to 1153 in the year 2014.6  One of the first articles presenting the use of rule-based systems in pediatrics was published in 1984. This publication introduced a computer-aided decision-making tool called SHELP,7  which was used to diagnose inborn errors of metabolism. Recently, a large amount of data from 1 362 559 outpatient visits from electronic health records (EHRs) from China were used to diagnose common pediatric illnesses with an accuracy of 0.98, outperforming junior physicians.8  Further, IBM Watson’s cognitive platform (IBM Corporation, Armonk, NY) has the ability to augment the skills and knowledge of health care professionals to help clinicians diagnose and treat rare pediatric diseases in instances in which diagnosis can be resource intensive.9 

On the basis of the upsurge in available literature in this domain in recent years and increasing diversity in its applications in health care, we aim to map the current implications of ML algorithms in various subspecialties of child health across various world regions and time periods. In this review of the existing literature, we will highlight current applications of ML algorithms that have been tested and/or deployed and will identify areas for further improvement in both application and reporting.

Our aim for this review is to map the current evidence on the application of ML for child and adolescent health and to provide insight for future research.

The review included studies conducted in children and adolescents by using the World Health Organization definition, in which children are defined as those <9 years of age, whereas adolescents are defined as those aged 10 to 19 years.10  There were no restrictions to the type of study designs, and we only included studies in the English language. For this review, we excluded all studies that were related to any surgical procedure and/or its outcome. We followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines for analysis in this publication.11 

ML, which is one of the most rapidly growing fields in AI, is a domain in which machines are emulated to gain human-like intelligence without explicit programming.12  Conventional computing uses a logical series of steps to arrive at a conclusion, whereas ML has the advantage that it assesses linear as well as nonlinear patterns in the data, develops rules with minimal human interference, and generates an output.13  We included all primary studies in which at least 1 ML technique was compared against a reference standard or current practice and its outcome measures were available. A broad classification was used for the type of ML algorithms used, with studies being grouped into the following categories: deep learning, ensemble methods, neural networks, regularization, rule system, regression, dimensionality reduction, instance based, clustering, Bayesian methods, and decision tree.14  Each of these models have been described in Table 1. Most unstructured textual data in health care can be converted into a machine-readable format by using natural language processing and further processing via ML models.15 

TABLE 1

Brief Description of ML Algorithms

Algorithm CategoryBrief Descriptor
Neural networks Mimics the biological neural network residing within the human brain to analyze data 
Deep learning Uses a combination of artificial neural networks in a computationally efficient manner 
Ensemble methods An amalgamation of predictions of multiple weaker models to strengthen overall prediction 
Regression algorithms Map the relationship between the input and output variable using a measure of error 
Regularization methods It is an extension of regression models but favors simpler models that are generalizable 
Clustering methods An unsupervised ML technique that uses the inherent structures in the data to best organize the data into groups of maximum commonality 
Dimensionality reduction Similar to clustering but summarizes data using less information 
Rule system Extract rules between variables in the existing data set to explain observed relationships 
Bayesian methods Explicitly applies Bayes’ theorem for the problem 
Decision tree methods Uses actual values of features in the data to build a model 
Instance-based models Compares new data to the example database (built by the model) using a similarity measure to make a prediction 
Natural language processing Covert textual data to a machine-readable format 
Algorithm CategoryBrief Descriptor
Neural networks Mimics the biological neural network residing within the human brain to analyze data 
Deep learning Uses a combination of artificial neural networks in a computationally efficient manner 
Ensemble methods An amalgamation of predictions of multiple weaker models to strengthen overall prediction 
Regression algorithms Map the relationship between the input and output variable using a measure of error 
Regularization methods It is an extension of regression models but favors simpler models that are generalizable 
Clustering methods An unsupervised ML technique that uses the inherent structures in the data to best organize the data into groups of maximum commonality 
Dimensionality reduction Similar to clustering but summarizes data using less information 
Rule system Extract rules between variables in the existing data set to explain observed relationships 
Bayesian methods Explicitly applies Bayes’ theorem for the problem 
Decision tree methods Uses actual values of features in the data to build a model 
Instance-based models Compares new data to the example database (built by the model) using a similarity measure to make a prediction 
Natural language processing Covert textual data to a machine-readable format 

ML was categorized into diagnostic models if it was used to diagnose a medical condition, prognostic models if it was used to predict a clinical outcome (ie, morbidity or mortality), and management models if it was used to recommend care pathways and therapy options.

A literature search was conducted on January 2, 2020, in PubMed Medline, the Cochrane Library, the Cumulative Index to Nursing and Allied Health Literature Plus, Web of Science Library, and EBSCO Dentistry & Oral Science Source; the relevant medical Subject Headings and relevant keywords included “artificial intelligence,” “machine learning,” “deep learning,” “supervised learning,” “unsupervised learning,” “neural networks,” “clinical decision support,” “ensemble methods,” “machine learning algorithms,” “healthcare,” “medicine,” “disease,” “diagnosis,” “prediction,” “screening,” “precision medicine,” “evidence-based care,” “risk factors,” “child,” “adolescents,” “neonate,” and “infant.” The detailed search strategy is mentioned in the Supplemental Information.

The title and abstract screening was independently performed by 2 authors (Z.H. and S.M.J.). Full-text review and data extraction was performed by 6 reviewers working in pairs (Z.H., S.M.J., A.A., M.I.H., B.I., and W.A.). The data were extracted for baseline characteristics, including year of publication, geographical location of where the study was performed, age range and number of participants, disease or condition under investigation, study methodology, reference standard, type of ML model used, category for model use (ie, diagnostics, prediction, or management), and performance metrics. Any conflicts were resolved by mutual discussion of the reviewers within each pair. Any conflicts not resolved by pair discussion were then reviewed by Z.H. and/or S.M.J., and a unanimous decision was made. Risk of bias assessment was done by using the modified 7-item Methodological Index for Nonrandomized Studies criteria, which includes disclosures, study aim, eligibility criteria, determination of ground truth, data set distribution, reporting of performance metrics, and explanation of the used model.16 

We performed a descriptive synthesis and categorized the published studies into each subspecialty. We categorized each subspecialty data into the decades in which the studies were published (defined as 1990–2000, 2001–2010, and 2011–2019), age (defined as the following categories: up to the age of 1 month, >1 month to 10 years of age, between 10 and 18 years of age, and a combined category of >1 month to 18 years of age), sample size (grouped as 0–10, 11–100, 101–500, 501–1000, 1001–5000, and >5000 participants), disease conditions, and ML model assessed in each group. Studies were grouped against the region of publication according to the World Health Organization world regions (defined as Africa, the Americas, Southeast Asia, Europe, Eastern Mediterranean, and Western Pacific17 ) and the World Bank income group status.18 

The search strategy identified 7623 articles initially, and 390 studies met the inclusion criteria; however, 26 articles were not accessible for full-length screening and thus were excluded from the final analysis, so 363 studies were included in this review for evidence mapping (refer to Table 2 for summary of included studies; details of these included studies are available in Supplemental Table 1). We grouped the included studies among 11 different core subspecialties, including, cardiology, emergency medicine, endocrinology, hematology and oncology, infectious disease, intensive care, neonatology, neurology, psychiatry, pulmonology and radiology. Studies in which the condition being assessed was not attributable to a specific specialty were grouped in the “others” category and included the use of ML in assessing conditions such as nephrology, gastroenterology, immunology, ophthalmology, and pathologic processes, such as abnormal gait detection, assessment of infant cry, prediction of childhood obesity, etc (Fig 1).

TABLE 2

Summary of Included Studies

No.SubspecialtyNo. Articles (N = 363), n (%)Economic Region, n (%)Sample Size, Minimum–MaximumCategory of the Model, n (%)Type of ML Model, n (%)
Neonatal medicine 87 (24) HIC = 79 (91); UMIC = 7 (8); LMIC = 1 (1) 6–347 312 Diagnosis = 59 (68); prediction = 24 (28); management = 3 (4) Ensemble methods = 31 (33); neural networks = 33 (35); regularization = 14 (15); rule system = 1 (1); dimensionality reduction = 7 (8); instance based = 2 (2); Bayesian = 2 (2); decision tree = 2 (2); natural language processing = 2 (2) 
Psychiatry 73 (20) HIC = 61 (84); UMIC = 11(15); LMIC = 1 (1) 10–21 563 Diagnosis = 69 (95); prediction = 1 (1); management = 3 (4) Deep learning = 1 (1); ensemble methods = 44 (46); neural networks = 13 (14); regularization = 8 (8); rule system = 1 (1); regression = 7 (7); dimensionality reduction = 7 (7); instance based = 3 (3); Bayesian = 7 (7); decision tree = 6 (6) 
Neurology 39 (11) HIC = 31 (80); UMIC = 6 (15); LMIC = 2 (5) 5–29 557 Diagnosis = 33 (85); prediction = 5 (13); management = 1 (2) Ensemble methods = 21 (41); neural networks = 14 (28); regularization = 1 (1); regression = 7 (14); dimensionality reduction = 2 (4); clustering = 2 (4); Bayesian = 2 (4); decision tree = 2 (4) 
Pulmonology 33 (9) HIC = 30 (91); UMIC = 3 (9) 10–29 362 Diagnosis = 27 (82); prediction= 6 (18) Deep learning = 2 (4); ensemble methods = 10 (20); neural networks = 9 (18); regularization = 4 (8); rule system = 5 (10); regression = 4 (8); dimensionality reduction = 2 (4); instance based = 2 (4); clustering = 2 (4); Bayesian = 8 (15); decision tree = 3 (5) 
Radiology 33 (9) HIC = 28 (85); UMIC = 5 (15) 33–15 149 Diagnosis = 32 (97); prediction= 1 (3) Deep learning = 8 (19); ensemble methods = 5 (12); neural networks = 12 (29); regularization = 5 (12); rule system = 1 (3); regression = 3 (7); dimensionality reduction = 2 (5); instance based = 1 (3); Bayesian = 2 (5); decision tree = 2 (5) 
Others 31 (9) HIC = 19 (61); UMIC = 10(33); LMIC = 2 (6) 14–51 008 Diagnosis = 28 (90); prediction= 3 (10) Deep learning = 3 (7); ensemble methods = 10 (23); neural networks = 11 (25); regularization = 2 (5); rule system = 1 (2); regression = 2 (5); dimensionality reduction = 1 (2); instance based = 5 (12); Bayesian = 3 (7); decision tree = 5 (12) 
Infectious diseases 18 (5) HIC = 10 (56); UMIC = 4 (22); LMIC = 4 (22) 39–10 687 Diagnosis = 12 (66); prediction = 5 (28); management = 1 (6) Ensemble methods = 9 (35); neural networks = 4 (15); rule system = 1 (4); regression = 4 (15); dimensionality reduction = 4 (15); clustering = 1 (4); decision tree = 3 (12) 
Cardiology 16 (4) HIC = 9 (56); UMIC = 6 (38); LMIC = 1 (6) 24–33 831 Diagnosis = 12 (75); prediction = 2 (12.5); management= 2 (12.5) Ensemble methods = 6 (26); neural networks = 8 (35); regression = 2 (9); dimensionality reduction = 3 (13); instance based = 1 (4); Bayesian = 1 (4); decision tree = 2 (9) 
Hematology and oncology 11 (3) HIC = 8 (73); UMIC = 3(27) 54–1571 Diagnosis = 8 (73); prediction= 3 (27) Ensemble methods = 3 (15); neural networks = 7 (37); regression = 2 (11); dimensionality reduction = 2 (11); instance based = 2 (11); Bayesian = 1 (5); decision tree = 1 (5); natural language processing = 1 (5) 
10 Emergency medicine 9 (2) HIC = 8 (89);UMIC = 1 (11) 119–12 5940 Diagnosis = 4 (44); prediction = 5 (56) Ensemble methods = 5 (26); neural networks = 6 (32); regularization = 1 (5); rule system = 1 (5); regression = 3 (16); Bayesian = 1 (5); decision tree = 2 (11) 
11 Endocrinology 7 (2) HIC = 7 (100) 5–245 Diagnosis = 6 (86); prediction = 1 (14) Ensemble methods = 3 (15); neural networks = 7 (37); regression = 2 (11); dimensionality reduction = 2 (11); instance based = 2 (11); Bayesian = 1 (5); decision tree = 1 (5); natural language processing = 1 (5) 
12 Intensive care 6 (2) HIC = 6 (100) 493–150 000 Diagnosis = 3 (50); prediction = 2 (33); management= 1 (17) Deep learning = 1 (12.5); ensemble methods = 1 (12.5); neural networks = 3 (37.5); regression = 2 (25); decision tree = 1 (12.5) 
No.SubspecialtyNo. Articles (N = 363), n (%)Economic Region, n (%)Sample Size, Minimum–MaximumCategory of the Model, n (%)Type of ML Model, n (%)
Neonatal medicine 87 (24) HIC = 79 (91); UMIC = 7 (8); LMIC = 1 (1) 6–347 312 Diagnosis = 59 (68); prediction = 24 (28); management = 3 (4) Ensemble methods = 31 (33); neural networks = 33 (35); regularization = 14 (15); rule system = 1 (1); dimensionality reduction = 7 (8); instance based = 2 (2); Bayesian = 2 (2); decision tree = 2 (2); natural language processing = 2 (2) 
Psychiatry 73 (20) HIC = 61 (84); UMIC = 11(15); LMIC = 1 (1) 10–21 563 Diagnosis = 69 (95); prediction = 1 (1); management = 3 (4) Deep learning = 1 (1); ensemble methods = 44 (46); neural networks = 13 (14); regularization = 8 (8); rule system = 1 (1); regression = 7 (7); dimensionality reduction = 7 (7); instance based = 3 (3); Bayesian = 7 (7); decision tree = 6 (6) 
Neurology 39 (11) HIC = 31 (80); UMIC = 6 (15); LMIC = 2 (5) 5–29 557 Diagnosis = 33 (85); prediction = 5 (13); management = 1 (2) Ensemble methods = 21 (41); neural networks = 14 (28); regularization = 1 (1); regression = 7 (14); dimensionality reduction = 2 (4); clustering = 2 (4); Bayesian = 2 (4); decision tree = 2 (4) 
Pulmonology 33 (9) HIC = 30 (91); UMIC = 3 (9) 10–29 362 Diagnosis = 27 (82); prediction= 6 (18) Deep learning = 2 (4); ensemble methods = 10 (20); neural networks = 9 (18); regularization = 4 (8); rule system = 5 (10); regression = 4 (8); dimensionality reduction = 2 (4); instance based = 2 (4); clustering = 2 (4); Bayesian = 8 (15); decision tree = 3 (5) 
Radiology 33 (9) HIC = 28 (85); UMIC = 5 (15) 33–15 149 Diagnosis = 32 (97); prediction= 1 (3) Deep learning = 8 (19); ensemble methods = 5 (12); neural networks = 12 (29); regularization = 5 (12); rule system = 1 (3); regression = 3 (7); dimensionality reduction = 2 (5); instance based = 1 (3); Bayesian = 2 (5); decision tree = 2 (5) 
Others 31 (9) HIC = 19 (61); UMIC = 10(33); LMIC = 2 (6) 14–51 008 Diagnosis = 28 (90); prediction= 3 (10) Deep learning = 3 (7); ensemble methods = 10 (23); neural networks = 11 (25); regularization = 2 (5); rule system = 1 (2); regression = 2 (5); dimensionality reduction = 1 (2); instance based = 5 (12); Bayesian = 3 (7); decision tree = 5 (12) 
Infectious diseases 18 (5) HIC = 10 (56); UMIC = 4 (22); LMIC = 4 (22) 39–10 687 Diagnosis = 12 (66); prediction = 5 (28); management = 1 (6) Ensemble methods = 9 (35); neural networks = 4 (15); rule system = 1 (4); regression = 4 (15); dimensionality reduction = 4 (15); clustering = 1 (4); decision tree = 3 (12) 
Cardiology 16 (4) HIC = 9 (56); UMIC = 6 (38); LMIC = 1 (6) 24–33 831 Diagnosis = 12 (75); prediction = 2 (12.5); management= 2 (12.5) Ensemble methods = 6 (26); neural networks = 8 (35); regression = 2 (9); dimensionality reduction = 3 (13); instance based = 1 (4); Bayesian = 1 (4); decision tree = 2 (9) 
Hematology and oncology 11 (3) HIC = 8 (73); UMIC = 3(27) 54–1571 Diagnosis = 8 (73); prediction= 3 (27) Ensemble methods = 3 (15); neural networks = 7 (37); regression = 2 (11); dimensionality reduction = 2 (11); instance based = 2 (11); Bayesian = 1 (5); decision tree = 1 (5); natural language processing = 1 (5) 
10 Emergency medicine 9 (2) HIC = 8 (89);UMIC = 1 (11) 119–12 5940 Diagnosis = 4 (44); prediction = 5 (56) Ensemble methods = 5 (26); neural networks = 6 (32); regularization = 1 (5); rule system = 1 (5); regression = 3 (16); Bayesian = 1 (5); decision tree = 2 (11) 
11 Endocrinology 7 (2) HIC = 7 (100) 5–245 Diagnosis = 6 (86); prediction = 1 (14) Ensemble methods = 3 (15); neural networks = 7 (37); regression = 2 (11); dimensionality reduction = 2 (11); instance based = 2 (11); Bayesian = 1 (5); decision tree = 1 (5); natural language processing = 1 (5) 
12 Intensive care 6 (2) HIC = 6 (100) 493–150 000 Diagnosis = 3 (50); prediction = 2 (33); management= 1 (17) Deep learning = 1 (12.5); ensemble methods = 1 (12.5); neural networks = 3 (37.5); regression = 2 (25); decision tree = 1 (12.5) 
FIGURE 1

Search flow diagram.

FIGURE 1

Search flow diagram.

The risk of bias assessment revealed that the study aim was clearly stated in 318 studies (87%). The eligibility criteria for input features were described in only 52% of the studies (n = 192). However, the ground truth for labeling of the output conditions was clearly described in all studies. Distribution of the data set (training, validation, and testing phases), explanation of the ML model, and the performance metrics were reported in a majority of the studies (Fig 2).

FIGURE 2

Risk of bias assessment.

FIGURE 2

Risk of bias assessment.

The literature review revealed that ML algorithms have been most frequently tested in conditions attributable to neonatal medicine (n = 87; 23.9%), followed by psychiatry (n = 73; 20.1%) and neurology (n = 39; 10.7%). Interest in AI use in neonatology was seen persistently high throughout all the last 3 decades. The 3 most common conditions in neonates that were explored by using ML algorithms included prematurity (n = 12; 13.7%), neonatal seizures (n = 8; 9.2%), and mortality in neonatal intensive care (n = 6; 7.0%). The most commonly investigated conditions in psychiatry included autism spectrum disorder (n = 49; 67%), attention-deficit/hyperactivity disorder (n = 16; 22%), and mood disorders (n = 6; 8%), whereas in neurology, epilepsy (n = 12; 30.7%), cerebral palsy (n = 6; 15.3%), and cognitive disability (n = 5; 12.8%) were the most commonly explored conditions. Other specialties, such as pulmonology and radiology, had <5 articles each in 1990–2000 but increased nearly sixfold in the period of 2011–2019, with asthma and traumatic brain injuries being the most common pathologies that were explored. The most common pediatric physiologic events in the “others” category were sleep and infant cry pattern analysis.

Most of the studies were conducted in participants aged 1 month to 18 years (n = 161; 44.4%), followed by studies specifically conducted on neonates (n = 102; 28.1%). The sample size for most of the studies was below 100 (n = 160; 44.1%), whereas only 8.5% (n = 31) used a sample size >5000. There were a small number of studies (n = 5; 1.3%) in which the sample size was not mentioned clearly.

Globally, a majority of the studies were from the region of the Americas (n = 158; 43.5%), followed by the European region (n = 120; 33.0%). The regions of Southeast Asia and Africa contributed the least (n = 7 [1.9%] and n = 3 [0.8%], respectively) to the available literature. The contributions of various countries to ML in pediatrics literature are displayed in Fig 3. Among the economic regions, high-income countries (HICs) (n = 296; 82%) have been the biggest contributors to all the categories of ML models (ie, diagnostics [n = 232; 78%], prediction [n = 53; 18%], and management [n = 11; 4%]). Upper middle-income countries (UMICs) have contributed to 15% (n = 56) of the literature, mainly comprising ML work for diagnostics (n = 51; 91%) and prediction (n = 5; 9%). Low middle-income countries (LMICs) have published the least (n = 11; 3%), mainly in ML use in diagnostics (n = 10; 91%). With the progression of time, not only had other specialties explored the use of ML for diagnostics but also the use of ML algorithms for prediction (n = 59; 16%) and management (n = 11; 3%) of pediatric conditions started expanding; however, this was limited to work from HICs only.

FIGURE 3

Global distribution of publications in pediatrics in which ML is used.

FIGURE 3

Global distribution of publications in pediatrics in which ML is used.

In Fig 4, we summarize the ML model used in the subspecialties, the category of model use, and the source of the published literature over the 3 decades.

FIGURE 4

Time trend of use of ML for diagnosis, management, and prognosis across world economic regions.

FIGURE 4

Time trend of use of ML for diagnosis, management, and prognosis across world economic regions.

During the initial decade (1990–2000), neural networks and ensemble techniques were the most preferred ML algorithms being used in pediatric research. It was between 2001 and 2010 that, along with the algorithms used earlier, researchers started using dimensionality reduction and other techniques (natural language processing). However, in the current decade (2011–2019), algorithms such as neural networks and ensemble methods continued to rise, whereas newer techniques, such as deep learning and clustering, started to emerge rapidly (Fig 5A).

FIGURE 5

A, Use of different ML algorithms over time. B, Use of different ML algorithms according to subspecialties.

FIGURE 5

A, Use of different ML algorithms over time. B, Use of different ML algorithms according to subspecialties.

When analyzed by using the economic regions, it was seen that ensemble methods and neural networks dominated the choice of algorithms from HICs, UMICs, and LMICs. However, HICs and UMICs explored other ML algorithms, such as regression and dimensionality reduction techniques. Deep learning techniques, which were seen solely in the current decade, were more commonly reported from UMICs than from HICs.

When analyzed by using subspecialties, decision trees, regression algorithms, neural networks, and ensemble methods were commonly used across the board. However, unsupervised learning techniques, such as clustering, were explored in only a few specialties, such as neurology. Because deep learning requires large amounts of data, it was most commonly deployed on radiologic and cardiology data (Fig 5B).

The reference standard against which these models were tested mainly included experts (92.6%; n = 336), whereas laboratory investigations were used in 7.4% (n = 27) of the studies. Except for the models developed for the hematology and oncology subspecialty (73%), most of the models in other subspecialties lacked testing on data sets from different geographic locations (ie, external validation).

Because most of the disease or condition categorization was binary, the most commonly reported performance metric for all studies was sensitivity and specificity, along with accuracy of the ML algorithm. The most common range of these metrics was between 70% and 95%; however, some of the values were as low as 20%, thus indicating room for improvement (Supplemental Table 2).

With this evidence review, we highlight the use of ML across various subspecialties for child and adolescent health. A majority of the studies were conducted in HICs, where ML was most commonly tested for diagnostic indications, with relatively less evidence emerging from LMICs.

Within the included studies, the first reported publication was on the use of ML in cardiology in 1992.19  In the 3 decades since the evidence on the use of ML in children and adolescents has been available, specialties such as neonatology, psychiatry, and neurology have been the major contributors. The reasons for such preferential explorations are not clear, but the interest in use of ML in neonatology may be contributed by the large amount of observational data obtained from NICU monitoring. It is estimated that a well-functioning NICU generates ∼1 terabyte of data per bed per year,20  and the multitude of monitoring parameters and diagnostics can be analyzed efficiently by using ML models to aid clinicians in decision-making in critical situations, consequently impacting neonatal outcomes. The psychiatry subspecialty faces a tremendous human resource challenge, with <1 psychiatrist for every 100 000 people in half the world.21  The common conditions studied in psychiatry, such as autism and attention-deficit/hyperactivity disorder, require the use of diagnostic tools that are time and resource intensive.22  By using ML, conventional diagnostic tools can be funneled to include the minimum features required for accurate diagnosis of these conditions.23 

ML has been widely used for assisting with diagnosis of various clinical conditions in adult as well as children. However, longitudinal data tracking through EHRs has now provided a fertile ground for applications to make accurate estimations regarding length of stay, rehospitalization, and morbidity and mortality.24  The ML models in this review were mainly used for diagnosing various conditions. This preponderance of diagnostic ML models is also seen in the list of approved models by the US Food and Drug Administration, albeit mainly for adult use.25  These algorithms range from automated detection of ejection fraction, with a root mean square deviation of 8% on echocardiographic images, to detection of pleural effusion on chest radiographs by using HealthCXR, with an accuracy of >98%.25  However, ML models on prognosis and management of disease conditions, in both adult and pediatric medicine, are relatively fewer. This could be due to the inherent complexity of clinical management and outcomes pathways along with vast amounts of unstructured data. Another possible explanation for this may be the relatively recent emphasis on investing in systematic data tracking to estimate cost of care, study effectiveness of interventions, and monitor patient outcomes.26  This phenomenon may have contributed to the increase in prognostic and care management ML models over the current decade and is likely to grow as the ML literature expands.

In this review, we also highlighted that ML publications were most commonly reported from HICs. One of the reasons for this may be due to the way these countries store and handle health-related data. HICs have invested significantly in EHRs because of their potential to help improve quality of care.27  EHRs include longitudinally tracked multidimensional data from the history, investigations, and care plans of individual patients. With this vast amount of data, analytical tools, such as ML, have the ability to make accurate predictions, outperforming the reference standards.28  The priorities and requirements for EHRs in LMICs are still under development, and hence these nations have considerably lagged behind in producing ML literature.27  The time is ripe to build a collaborative approach between HICs and LMICs for data sharing and local capacity building for ML to address the global health research inequity gap between regions.29 

With the shortage of health workers in LMICs, coupled with the rising rates of mobile devices and Internet usage,30  advancements in computational techniques, such as ML, can play a pivotal role in revolutionizing health care. In the recent report by the US Agency for International Development, the role of ML in low-resource settings is emphasized. According to this report, techniques such as ML can be used for population-based surveillance, for frontline health worker assistance, for virtual assistance for patients, and as a clinical decision-support system to impact health outcomes in these regions.31  One such example is from Honduras, where an ML-enabled automated voice message system significantly improved home monitoring and glycemic control for patients with diabetes.32  ML techniques have achieved a high sensitivity (97.3%) and specificity (100%) for automated tuberculosis detection by using chest radiographs when an expert reader may not be available.33  CheXNet is a diagnostic tool for lung pathologies (mainly pneumonia) that has been shown to improve workflow by assisting the radiologist to focus on suspicious areas on the image, thus increasing the radiologist’s ability to manage work effectively and efficiently.34  These advantages have the potential for high returns once adopted and deployed in a clinical management workflow that may be distinctly seen in LMICs, where there is a dearth of high-skilled health workers and where care delivery is predominantly through task sharing with frontline community health workers.35 

ML is represented by mathematical algorithms that improve learning through experience. Technical advances in database development and cloud-based applications have enabled handling of large data sets that can be used by ML models and have the potential to transform patient risk stratification and management.36  The exponential surge in technological advances, computational power, and data storage techniques has provided an environment for deep learning models to flourish.37  This technique thus has the ability to input big data in its true sense and generate models that can have an impact on human health effectively and efficiently.37  With the ability to process big data and the increase in magnitude, velocity, and variety of data, deep learning techniques have also become favorable techniques in the field of genomics and new drug design.38,39  This was also noted in our review, in which such models were seen to emerge and flourish in the current decade, especially on imaging data in fields such as radiology and cardiology. The performance metrics of the algorithms have a wide range, from measures being extremely poor (<20% sensitivity) to ideal (100% accurate on small data sets). Unsupervised ML techniques also have the ability to identify clinical groups that share similarities to existing diagnostic criteria with improved accuracy.40  It is likely that with the rapid pace of advancements in ML, the performance of ML algorithms will continue to improve, and the number of studies using computationally complex techniques will continue to dominate.

To the best of our knowledge this is the first comprehensive review on the use of ML in child and adolescent health across the world over time. In this review, we highlight the small number of publications on AI in LMICs despite evidence of it being beneficial in low-resource settings. We only included studies that were published in the English language, and studies published in other languages (eg, from countries such as China where use of computational techniques, such as AI and ML, is rapidly expanding) were not included. A detailed quality assessment was not performed because it was beyond the scope of this review. However, it is noteworthy that the risk of selection bias (lack of clearly stated eligibility criteria) and outcome bias (poorly defined aim) across these studies may vary significantly.

Despite the vast number of ML algorithms used across different specialties over time, there are few models that have become commercially available and are being used to influence clinical practice. To improve the generalizability of these models, there is a need to have heterogeneous data from various settings to create more robust yet contextual models. Local optimization may require collaborations from leading institutes and the transfer of skill and technology to the developing world together with improving the quality of data.

In this review, we consolidate the existing evidence of ML in pediatric subspecialties. The trends suggest encouraging signs because the field of ML has been growing and expanded to various models, subspecialties, and countries, especially during the last decade. There is now a need to assess these algorithms in real-world clinical and community-based practice on a larger scale and in varying contexts. This could then help deduce the usefulness and impact of this technology and help draft recommendations for routine use of ML in pediatric health care delivery.

We thank Mr Khawaja Mustafa for assistance with the literature search and Mr Uzair Ansari for the development of figures for this article.

Ms Hoodbhoy and Ms Masroor Jeelani were involved in the conceptualization, screening, extraction, and write up of the manuscript; Mr Aziz, Mr Habib, Mr Iqbal, and Mr Akmal were involved in screening, extraction, and review of the analysis; Mr Das, Drs Leeflang and Siddiqui, and Mr Hasan were involved in the conceptualization and review of the results and manuscript; and all authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.

FUNDING: No external funding.

     
  • AI

    artificial intelligence

  •  
  • EHR

    electronic health record

  •  
  • HIC

    high-income country

  •  
  • LMIC

    low middle-income country

  •  
  • ML

    machine learning

  •  
  • UMIC

    upper middle-income country

1
Dreyer
K
,
Allen
B
.
Artificial intelligence in health care: brave new world or golden opportunity?
J Am Coll Radiol
.
2018
;
15
(
4
):
655
657
2
Lindsay
RK
,
Buchanan
BG
,
Feigenbaum
EA
,
Lederberg
J
.
DENDRAL: a case study of the first expert system for scientific hypothesis formation
.
Artif Intell
.
1993
;
61
(
2
):
209
261
3
Shortliffe
EH
,
Davis
R
,
Axline
SG
,
Buchanan
BG
,
Green
CC
,
Cohen
SN
.
Computer-based consultations in clinical therapeutics: explanation and rule acquisition capabilities of the MYCIN system
.
Comput Biomed Res
.
1975
;
8
(
4
):
303
320
4
Wang
Y
,
Kung
L
,
Byrd
TA
.
Big data analytics: understanding its capabilities and potential benefits for healthcare organizations
.
Technol Forecast Soc Change
.
2018
;
126
:
3
13
5
Neves
J
,
Vicente
H
,
Esteves
M
, et al
.
A deep-big data approach to health care in the AI age
.
Mobile Networks and Applications
.
2018
;
23
:
1123
1128
6
Niu
J
,
Tang
W
,
Xu
F
,
Zhou
X
,
Song
Y
.
Global research on artificial intelligence from 1990–2014: spatially-explicit bibliometric analysis
.
ISPRS Int J Geoinf
.
2016
;
5
(
5
):
66
7
Sugiyama
K
,
Hasegawa
Y
.
Computer assisted medical diagnosis system for inborn errors of metabolism [in Japanese]
.
JMEBE
.
1984
;
22
:
942
943
8
Liang
H
,
Tsui
BY
,
Ni
H
, et al
.
Evaluation and accurate diagnoses of pediatric diseases using artificial intelligence
.
Nat Med
.
2019
;
25
(
3
):
433
438
9.
IBM
Corporation
.
Boston Children’s Hospital to tap IBM Watson to tackle rare pediatric diseases.
2015
. Available at: https://www.prnewswire.com/news-releases/boston-childrens-hospital-to-tap-ibm-watson-to-tackle-rare-pediatric-diseases-300175419.html. Accessed November 10, 2015
10.
World Health Organization
.
Definition of key terms.
2013
. Available at: https://www.who.int/hiv/pub/guidelines/arv2013/intro/keyterms/en/. Accessed June 24, 2013
11
Moher
D
,
Liberati
A
,
Tetzlaff
J
,
Altman
DG
;
PRISMA Group
.
Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement
.
Ann Intern Med
.
2009
;
151
(
4
):
264
269, W64
12
Das
S
,
Dey
A
,
Pal
A
,
Roy
N
.
Applications of artificial intelligence in machine learning: review and prospect
.
Int J Comput Appl
.
2015
;
115
(
9
):
31
41
13
Glymour
C
,
Scheines
R
,
Spirtes
P
,
Kelly
K
.
Discovering Causal Structure: Artificial Intelligence, Philosophy of Science, and Statistical Modeling
.
Orlando, FL
:
Academic Press
;
2014
14
Browlee
J
.
A tour of machine learning algorithms
.
2019
.
15
Iroju
OG
,
Olaleke
JO
.
A systematic review of natural language processing in healthcare
.
International Journal of Information Technology and Computer Science
.
2015
;
7
(
8
):
44
50
16
Langerhuizen
DWG
,
Janssen
SJ
,
Mallee
WH
, et al
.
What are the applications and limitations of artificial intelligence for fracture detection and classification in orthopaedic trauma imaging? A systematic review
.
Clin Orthop Relat Res
.
2019
;
477
(
11
):
2482
2491
17
World Health Organization
.
ANNEX C WHO regional groupings
.
2017
.
18
World Bank Data Team
.
New country classifications by income level: 2019-2020
.
2019
.
19
Shono
H
,
Oga
M
,
Shimomura
K
, et al
.
Application of fuzzy logic to the Apgar scoring system
.
Int J Biomed Comput
.
1992
;
30
(
2
):
113
123
20
Khazaei
H
,
Mench-Bressan
N
,
McGregor
C
,
Pugh
JE
.
Health informatics for neonatal intensive care units: an analytical modeling perspective
.
IEEE J Transl Eng Health Med
.
2015
;
3
:
3000109
21
Lovejoy
CA
,
Buch
V
,
Maruthappu
M
.
Technology and mental health: the role of artificial intelligence
.
Eur Psychiatry
.
2019
;
55
:
1
3
22
Randall
M
,
Egberts
KJ
,
Samtani
A
, et al
.
Diagnostic tests for autism spectrum disorder (ASD) in preschool children
.
Cochrane Database Syst Rev
.
2018
;(
7
):
CD009044
23
Levy
S
,
Duda
M
,
Haber
N
,
Wall
DP
.
Sparsifying machine learning models identify stable subsets of predictive features for behavioral detection of autism
.
Mol Autism
.
2017
;
8
:
65
24
Kelly
CJ
,
Karthikesalingam
A
,
Suleyman
M
,
Corrado
G
,
King
D
.
Key challenges for delivering clinical impact with artificial intelligence
.
BMC Med
.
2019
;
17
(
1
):
195
25
American College of Radiology Data Science Institute
.
FDA cleared AI algorithms
.
26
Cox
JL
.
The challenge with tracking health outcomes
.
Can J Clin Pharmacol
.
2001
;
8
(
suppl A
):
10A
16A
27
Fraser
HS
,
Biondich
P
,
Moodley
D
,
Choi
S
,
Mamlin
BW
,
Szolovits
P
.
Implementing electronic medical record systems in developing countries
.
Inform Prim Care
.
2005
;
13
(
2
):
83
95
28
Harerimana
G
,
Kim
JW
,
Yoo
H
,
Jang
B
.
Deep learning for electronic health records analytics
.
IEEE Access
.
2019
;
7
:
101245
101259
29
Vidyasagar
D
.
Global notes: the 10/90 gap disparities in global health research
.
J Perinatol
.
2006
;
26
(
1
):
55
56
30
Poushter
J
,
Stewart
R
.
Smartphone Ownership and Internet Usage Continues to Climb in Emerging Economies
.
Washington, DC
:
Pew Research Center
;
2016
31
The Rockefeller Foundation
;
US Agency for International Development
.
Artificial Intelligence in Global Health: Defining a Collective Path Forward
.
Washington, DC
:
US Agency for International Development
;
2019
32
Piette
JD
,
Mendoza-Avelares
MO
,
Ganser
M
,
Mohamed
M
,
Marinec
N
,
Krishnan
S
.
A preliminary study of a cloud-computing model for chronic illness self-care support in an underdeveloped country
.
Am J Prev Med
.
2011
;
40
(
6
):
629
632
33
Lakhani
P
,
Sundaram
B
.
Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks
.
Radiology
.
2017
;
284
(
2
):
574
582
34
Rajpurkar
P
,
Irvin
J
,
Zhu
K
, et al
.
CheXNet: radiologist-level pneumonia detection on chest x-rays with deep learning [preprint published online November 14, 2017]
.
arXiv
. doi:
35
World Health Organization
;
President’s Emergency Plan For AIDS Relief
;
Joint United Nations Programme on HIV and AIDS
.
Task Shifting: Rational Redistribution of Tasks Among Health Workforce Teams: Global Recommendations and Guidelines
.
Geneva, Switzerland
:
World Health Organization
;
2007
36
Wiens
J
,
Shenoy
ES
.
Machine learning for healthcare: on the verge of a major shift in healthcare epidemiology
.
Clin Infect Dis
.
2018
;
66
(
1
):
149
153
37
Miotto
R
,
Wang
F
,
Wang
S
,
Jiang
X
,
Dudley
JT
.
Deep learning for healthcare: review, opportunities and challenges
.
Brief Bioinform
.
2018
;
19
(
6
):
1236
1246
38
Grapov
D
,
Fahrmann
J
,
Wanichthanarak
K
,
Khoomrung
S
.
Rise of deep learning for genomic, proteomic, and metabolomic data integration in precision medicine
.
OMICS
.
2018
;
22
(
10
):
630
636
39
Zhang
L
,
Tan
J
,
Han
D
,
Zhu
H
.
From machine learning to deep learning: progress in machine intelligence for rational drug discovery
.
Drug Discov Today
.
2017
;
22
(
11
):
1680
1685
40
Sanchez-Martinez
S
,
Duchateau
N
,
Erdei
T
, et al
.
Machine learning analysis of left ventricular function to characterize heart failure with preserved ejection fraction
.
Circ Cardiovasc Imaging
.
2018
;
11
(
4
):
e007138

Competing Interests

POTENTIAL CONFLICT OF INTEREST: The authors have indicated they have no potential conflicts of interest to disclose.

FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.

Supplementary data