OBJECTIVES

Delayed child skill development is a common phenomenon in low- and middle-income countries. Effective and low-cost strategies suitable for application to less-developed countries are needed. We summarize empirical findings from recent papers that study a replication of the Jamaica Reach Up and Learn home visiting program in China, China REACH, and compare child skill growth profiles in the China Reach Up and Jamaica interventions.

METHODS

Different interventions often use different measures for assessing early childhood skill development. To estimate the growth of underlying skills across programs, we address the challenge that different programs use different assessments. We use a modified version of the Rasch model to anchor scores on common items to estimate skill development.

RESULTS

Language skill growth curves are comparable for both interventions. This pattern is consistent for the treatment and control groups across the interventions. Skill growth curves are not statistically significantly different between China REACH and Jamaican interventions. We find evidence of the importance of early investment.

CONCLUSIONS

The China REACH intervention significantly improves the development of multiple skills. At the same ages, treatment effect sizes and skill growth curves are comparable across the Jamaica and China REACH interventions, despite differences in scale and cultural settings. The scale of the program is much greater in China than in Jamaica, showing that the Jamaican curriculum can be effectively expanded to larger populations. Annual costs per child are roughly $500 (2015 US dollars).

The study of early childhood investment and its consequences is an active field. Many consider early childhood investment in developing countries to be a valuable strategy for promoting national skill development.1,2  The search is on for effective, low-cost strategies that are adaptable to less-developed countries. Jamaica Reach Up and Learn, established ∼35 years ago, is a successful home-visiting program emulated worldwide.35 

This paper studies a replication of the original Jamaica Reach Up and Learn program, China Rural Education and Child Health (China REACH), which was brought to scale in an impoverished region of Western China. There are more than 1500 participants compared with the roughly 100 participants in the original Jamaica study. Zhou et al6  show that the program can be successfully implemented at scale. The unique implementation and data collection of China REACH make it possible to examine the mechanisms of Reach Up programs in greater depth than is possible with previous samples.

We compare the treatment effects and skill growth curves for the China REACH and Jamaica Reach Up and Learn programs of young children at the same ages. We find comparable treatment effect sizes and very similar skill growth curves during the intervention. The implementation costs of China REACH and Jamaica Reach Up and Learn are low, facilitating their application in less-developed environments. We develop and apply a method for comparing distinct tests by linking common items.

A growing body of research establishes the effectiveness of home-visiting programs targeted to the early years in developing the skills of disadvantaged children. Some promising home-visiting programs are relatively inexpensive, especially those established in developing countries. As a result, they are more cost-effective than other more intensive programs, such as child care. They often place less demanding training requirements on home visitors, which simplifies the infrastructure needed to support them. The Jamaica Reach program, established ∼35 years ago, is a successful prototype that is widely emulated around the world.5 

Little is known about the mechanisms of home-visiting interventions producing treatment effects and whether or not the program can be successfully implemented at scale. This paper addresses these two key issues. To do so, we study China REACH, a replication of the original Jamaica Reach Up and Learn program that was launched in 2015 and brought to scale in China. The program we analyze closely resembles the original Jamaica program and was indeed designed by its creators. Like the parent Jamaica program, China REACH seeks to improve the health, cognition, and engagement of children, caregivers, and associated communities. As with the original Jamaica program, it is evaluated by a randomized control trial. Unlike the Jamaica program, China REACH does not focus exclusively on stunted children.

The children in the China REACH experimental treatment group are more likely to have higher language and cognitive skills, both at midline and endline, than controls (Table 1). The first row shows that at midline (about 9 months after the intervention), the language and cognitive skills for children in the treatment group are about 0.7 SDs higher than the controls. At the end of the intervention, the treatment effects on language and cognitive skills have effect sizes higher than 1.1 SDs. Using comparable tests, the treatment size is comparable to that of the source Jamaica Reach Up and Learn intervention (ie, about 0.75 SDs). The intervention significantly improves treated children’s language and cognitive skills. Treatment effects increase for children in the treatment group who have longer exposure to the program (Table 1, columns 3 and 5).

TABLE 1

Treatment Effects on Standardized Scores for China REACH

Denver Tasks AllAllChildren Aged ≤2 Years at EnrollmentAllChildren Aged ≤2 Years at Enrollment
Midline      
 Language and Cognitive 0.589*** 0.631*** 0.674*** 0.714*** 0.741*** 
 (0.234 to 0.965) (0.237 to 1.036) (0.279 to 1.067) (0.319 to 1.093) (0.350 to 1.144) 
 Fine Motor 0.334 0.559 0.629* 0.633* 0.703* 
 (–0.140 to 0.787) (–0.032 to 1.174) (0.023 to 1.324) (0.003 to 1.313) (0.057 to 1.375) 
 Socioemotional 0.690** 0.865*** 0.624*** 0.879*** 0.620*** 
 (0.260 to 1.117) (0.421 to 1.312) (0.129 to 1.118) (0.467 to 1.289) (0.204 to 1.067) 
 Gross Motor −0.051 −0.004 0.054 −0.015 0.010 
 (–0.598 to 0.478) (–0.564 to 0.577) (–0.514 to 0.640) (–0.567 to 0.554) (–0.559 to 0.584) 
Endline      
 Language and Cognitive 0.979*** 0.914*** 1.016*** 1.036*** 1.113*** 
 (0.585 to 1.402) (0.495 to 1.347) (0.637 to 1.408) (0.644 to 1.458) (0.723 to 1.510) 
 Fine Motor 0.585** 0.574** 0.561** 0.676*** 0.645** 
 (0.006–0.956) (0.067–1.091) (0.030–1.095) (0.180–1.170) (0.139–1.158) 
 Socioemotional −0.201 −0.276 −0.167 −0.222 −0.115 
 (–0.596 to 0.202) (–0.688 to 0.123) (–0.553 to 0.215) (–0.636 to 0.194) (–0.491 to 0.275) 
 Gross Motor 0.067 0.125 0.155 0.173 0.219 
 (–0.479 to 0.632) (–0.392 to 0.645) (–0.406 to 0.732) (–0.322 to 0.668) (–0.294 to 0.775) 
Pretreatment Covariates No No No Yes Yes 
IPW No Yes Yes Yes Yes 
Denver Tasks AllAllChildren Aged ≤2 Years at EnrollmentAllChildren Aged ≤2 Years at Enrollment
Midline      
 Language and Cognitive 0.589*** 0.631*** 0.674*** 0.714*** 0.741*** 
 (0.234 to 0.965) (0.237 to 1.036) (0.279 to 1.067) (0.319 to 1.093) (0.350 to 1.144) 
 Fine Motor 0.334 0.559 0.629* 0.633* 0.703* 
 (–0.140 to 0.787) (–0.032 to 1.174) (0.023 to 1.324) (0.003 to 1.313) (0.057 to 1.375) 
 Socioemotional 0.690** 0.865*** 0.624*** 0.879*** 0.620*** 
 (0.260 to 1.117) (0.421 to 1.312) (0.129 to 1.118) (0.467 to 1.289) (0.204 to 1.067) 
 Gross Motor −0.051 −0.004 0.054 −0.015 0.010 
 (–0.598 to 0.478) (–0.564 to 0.577) (–0.514 to 0.640) (–0.567 to 0.554) (–0.559 to 0.584) 
Endline      
 Language and Cognitive 0.979*** 0.914*** 1.016*** 1.036*** 1.113*** 
 (0.585 to 1.402) (0.495 to 1.347) (0.637 to 1.408) (0.644 to 1.458) (0.723 to 1.510) 
 Fine Motor 0.585** 0.574** 0.561** 0.676*** 0.645** 
 (0.006–0.956) (0.067–1.091) (0.030–1.095) (0.180–1.170) (0.139–1.158) 
 Socioemotional −0.201 −0.276 −0.167 −0.222 −0.115 
 (–0.596 to 0.202) (–0.688 to 0.123) (–0.553 to 0.215) (–0.636 to 0.194) (–0.491 to 0.275) 
 Gross Motor 0.067 0.125 0.155 0.173 0.219 
 (–0.479 to 0.632) (–0.392 to 0.645) (–0.406 to 0.732) (–0.322 to 0.668) (–0.294 to 0.775) 
Pretreatment Covariates No No No Yes Yes 
IPW No Yes Yes Yes Yes 

The 95% confidence intervals in parentheses are constructed by the wild bootstrap clustered at the village level. The standardized score is estimated from the pooled control group children of the Denver test.

IPW, inverse probability weight.

*

P < .05,

**

P < .01,

***

P < .001.

Zhou et al6  developed and estimated a nonlinear factor model to assess program treatment effects. Although not yet used in Pediatrics, it is a valuable tool for exploring program impacts on skills and their development. Their method isolates the impact of the intervention on skills and identifies individual-level latent skills for each participant. It accounts for the progression of item difficulty in the program. Table 2 presents the treatment effects for the four skill factors they identify. Except for gross motor skills, all latent skill factors, including social-emotional skills in the treatment group, are significantly enhanced compared to those in the control group. Figure 1A shows that the distribution of language and cognitive skills in the treatment group shifts right and has a fatter upper tail than the one in the control group. Figure 1B shows that the treated group has higher values of language and cognitive skills.

FIGURE 1

Language and cognitive skill distributions (endline) and dominance curves reported in Zhou et al (2022).

FIGURE 1

Language and cognitive skill distributions (endline) and dominance curves reported in Zhou et al (2022).

Close modal
TABLE 2

China REACH: Treatment Effects on Latent Skill Factors

SocioemotionalFine MotorLanguage and CognitiveGross Motor
Treatment 0.495*** 0.726*** 0.753*** −0.095 
 (0.208 to 0.583) (0.551 to 0.899) (0.459 to 1.051) (–0.280 to 0.089) 
SocioemotionalFine MotorLanguage and CognitiveGross Motor
Treatment 0.495*** 0.726*** 0.753*** −0.095 
 (0.208 to 0.583) (0.551 to 0.899) (0.459 to 1.051) (–0.280 to 0.089) 

Source: Zhou et al.6 

The 95% confidence intervals in parentheses are constructed by wild bootstrap clustered at the village level.

*

P < .05,

**

P < .01,

***

P < .001.

In this section, we compare treatment effects and skill growth curves for the China REACH and Jamaica Reach Up and Learn programs. Table 3 shows the treatment effects for multiple skills. We conduct tests of equality of treatment effects for the 2 programs using the data available for each. We cannot reject that the treatment effect sizes are not significantly different from each other.

TABLE 3

Treatment Effects on China REACH and Jamaica Reach Up and Learn

Socioemotional or PerformanceFine MotorLanguage and Cognition or Hearing and SpeachGross Motor
China REACH latent skill factors after 21 mo of intervention     
 Treatment 0.40***
(0.21 to 0.58) 
0.73***
(0.55 to 0.90) 
0.75***
(0.46 to 1.05) 
−0.10
(–0.28 to 0.09) 
Jamaica Griffiths test after 24 mo of intervention 
 

 

 

 
 Treatment 0.63***
(0.30 to 0.95) 
0.67***
(0.34 to 1.00) 
0.50***
(0.15 to 0.84) 
0.34***
(0.01 to 0.67) 
P .35 .78 .39 .15 
Socioemotional or PerformanceFine MotorLanguage and Cognition or Hearing and SpeachGross Motor
China REACH latent skill factors after 21 mo of intervention     
 Treatment 0.40***
(0.21 to 0.58) 
0.73***
(0.55 to 0.90) 
0.75***
(0.46 to 1.05) 
−0.10
(–0.28 to 0.09) 
Jamaica Griffiths test after 24 mo of intervention 
 

 

 

 
 Treatment 0.63***
(0.30 to 0.95) 
0.67***
(0.34 to 1.00) 
0.50***
(0.15 to 0.84) 
0.34***
(0.01 to 0.67) 
P .35 .78 .39 .15 

Source: Zhou et al.6 

For the China REACH program, the 95% confidence intervals in brackets are constructed by wild bootstrap clustered at the village level. For the Jamaica Reach Up and Learn program, the 95% confidence intervals are presented in parenthese. The P values in the last row correspond to the null of equality of treatment effects across the programs.

*

P < .05,

**

P < .01,

***

P < .001.

However, the two interventions use different tools for measuring skill development: children in China REACH were evaluated by the Denver II test, and the Griffiths test was used to evaluate children in the Jamaican program. The two tests are different. Luiz et al7  compared the Denver and Griffiths tests and find that “there was a significant relationship between the overall performance of the Denver II and the Griffiths Scales. However, the Personal-Social Scale of the Denver II appeared to have items that were culturally biased. Further, the Denver II further identified a higher percentage of the sample to have abnormal or questionable protocols than the Griffiths Scales did.” Elliman et al8  compared both tests for premature children. Rubio-Codina et al9  compared the Bayley test with Denver II and other tests, such as Ages and Stages Questionnaire-3, Battelle Developmental Inventory, the MacArthur-Bates short forms I and II), and World Health Organization motor development milestones and conclude that Denver II was the most feasible and valid multidimensional test. We build on this work to develop a more reliable method to make valid comparisons of the latent skills of the children in these two programs.

To conduct more reliable comparisons, we list the items in the Denver II and Griffiths tests that have the exact same content and examination criteria (Table 4). Because these items have the same content, we use them to link the two programs.

TABLE 4

List of Items With Same Content in Denver and Griffiths Tests

Language Combine words, say two opposites 
Fine Motor Copy circle, copy cross 
Gross Motor Walk alone well, walk backward, jump off a step, go downstairs alone, throw ball 
Language Combine words, say two opposites 
Fine Motor Copy circle, copy cross 
Gross Motor Walk alone well, walk backward, jump off a step, go downstairs alone, throw ball 

To estimate the underlying unobserved skills across programs, we address the challenge that different programs use different assessment tools. We use a modified version of the Rasch model to separately estimate individual unobserved skill factors and item difficulty levels for each program.10  To convert the assessment outcomes from different instruments and link the different programs, we choose the items with the same content and examination criteria as anchors.

There are two types of measures in the Denver II and Griffiths tests: ordered measures mMo and unordered measures mMno. The ordered test items are designed to reflect the fact that if children cannot perform a task with a lower requirement, they cannot achieve a harder task. For example, in the Denver II test, the items “speak one word,” “speak two words,” and “speak three words” are clearly stated in order.

Denoting unobserved skill by scalar θi,t for each type of measured skill, for unordered measures mMno and individual i, the latent skill θi,t is assumed to generate latent index ymi,t* as follows:
(1)
If the latent skill index ymi,t* is larger than 0, we observe the child can pass the corresponding unordered task measure m. Otherwise, we observe the child fails the task. For ordered measures mjMgo, g{1,,G}, j{1,,J}, and Mo = {M1o,,MG0},
(2)
where j{1,,J} and ηg,1<ηg,2<<ηg,J1<ηg,J.

For ordered task measures, there are different cutoff values that correspond to the minimum requirement to pass the tasks by order. For example, for 3 ordered tasks (eg, “speak one word,” “speak two words,” “speak three words”), there are three cutoff values ηg,1<ηg,2<ηg,3. If the latent skill index is ymj,i,tg*<ηg,1, it is equivalent to the case where child fails all three tasks; if ηg,1<ymj,i,tg*<ηg,2, it is corresponding to the child can speak one word but cannot speak two words. Similarly, if ηg,3<ymj,i,tg*, it corresponds to the child that can speak at least three words.

We distinguish between ordered and unordered items because the Rasch model assumes that error terms are independent across items, which means that a child who fails an easier task has a positive probability of passing a harder task. However, this assumption does not hold for ordered items. Therefore, we model ordered items using an ordered probit model. We use a probit and ordered probit model to link all the items in both the Griffiths and Denver tests. In principle, we could control for family background in analyzing the China REACH data, but Zhou et al6  showed that baseline family background did not significantly improve treatment effects on skills, and home environment measures are not available for the Jamaican intervention.

If item mk in the Denver II test and item mh in the Griffiths test examine the same content under the same examination criteria, we define those items as anchor items. For the anchor items, we require that the difficulty parameters be the same (ie, βmk=βmh) and that the factor loadings between the two interventions be the same (ie, αmk=αmh). Here, the factor loadings measure how effectively the children use their existed latent skills for achieving the goal of each task.

We estimate Equations (1)(2) pooling Jamaica Reach Up and China REACH data jointly. For each item in both the Denver and Griffiths tests, we get estimates of difficulty level parameters βm and latent factor loadings αm. In forming our estimates, we assume that the latent factor distribution is normal and estimate the parameters of mean (μθ) and variance (σθ) for the latent factor in Equations (1)(2). We report our model estimates in Tables 58.

TABLE 5

Denver Language Skill Items

βm or Cut (βmg)SEαSE
Items based on Probit model     
 Combine words 5.374 0.494 1.000 — 
 Dada/mama specific 8.730 1.698 0.872 0.196 
 Body parts 6 3.795 0.271 0.661 0.074 
 Name 1 color −1.158 0.081 0.399 0.042 
 Count 1 block −2.186 0.139 0.502 0.055 
 Understand 4 prepositions −4.053 0.321 0.439 0.058 
 Opposites 2 −4.040 0.336 0.291 0.044 
Item based on ordered Probit model     
 3 words −8.292 0.749 1.084 0.133 
 6 words −7.233 0.671 1.084 0.133 
 Name 1 picture −2.934 0.156 0.640 0.065 
 Name 4 pictures 0.203 0.097 0.640 0.065 
 Speech half understandable −4.194 0.244 0.802 0.084 
 Speech all understandable 1.428 0.141 0.802 0.084 
 Use 2 objects 2.925 0.283 1.085 0.131 
 Use 3 objects 4.199 0.348 1.085 0.131 
 Point 2 pictures −4.395 0.245 0.733 0.077 
 Point 4 pictures −1.947 0.149 0.733 0.077 
 Know 2 adjectives 2.809 0.195 0.647 0.073 
 Know 3 adjectives 5.275 0.306 0.647 0.073 
βm or Cut (βmg)SEαSE
Items based on Probit model     
 Combine words 5.374 0.494 1.000 — 
 Dada/mama specific 8.730 1.698 0.872 0.196 
 Body parts 6 3.795 0.271 0.661 0.074 
 Name 1 color −1.158 0.081 0.399 0.042 
 Count 1 block −2.186 0.139 0.502 0.055 
 Understand 4 prepositions −4.053 0.321 0.439 0.058 
 Opposites 2 −4.040 0.336 0.291 0.044 
Item based on ordered Probit model     
 3 words −8.292 0.749 1.084 0.133 
 6 words −7.233 0.671 1.084 0.133 
 Name 1 picture −2.934 0.156 0.640 0.065 
 Name 4 pictures 0.203 0.097 0.640 0.065 
 Speech half understandable −4.194 0.244 0.802 0.084 
 Speech all understandable 1.428 0.141 0.802 0.084 
 Use 2 objects 2.925 0.283 1.085 0.131 
 Use 3 objects 4.199 0.348 1.085 0.131 
 Point 2 pictures −4.395 0.245 0.733 0.077 
 Point 4 pictures −1.947 0.149 0.733 0.077 
 Know 2 adjectives 2.809 0.195 0.647 0.073 
 Know 3 adjectives 5.275 0.306 0.647 0.073 
TABLE 6

Griffiths Language Skill Items

βm or Cut (βmg)SEαSE
Items based on Probit model     
 Uses word combinations 5.374 0.494 1.000 — 
 Shakes head for no 3.089 0.453 0.217 0.053 
 Short, babbled sentences of 6+ syllables 5.496 1.451 0.383 0.134 
 Looks at pictures for a few seconds 3.358 0.543 0.241 0.061 
 Tries definitely to sing 2.799 0.368 0.201 0.045 
 Knows own name 5.092 1.055 0.439 0.112 
 Likes rhymes and jingles 2.505 0.288 0.154 0.037 
 Picture vocabulary (12) −1.395 0.185 0.320 0.046 
 Talks well in sentences of 6+ syllables (record) −0.827 0.228 0.546 0.088 
 Names 6 or more objects in large picture −1.119 0.237 0.504 0.080 
 Opposites 2 −4.040 0.336 0.291 0.044 
 Names 12 objects in large picture −3.579 0.535 0.439 0.085 
Items based on ordered Probit model     
 One object in box identified −6.862 0.444 0.733 0.077 
 Two objects in box identified −6.221 0.423 0.733 0.077 
 Four objects in box identified −5.188 0.390 0.733 0.077 
 Eight objects in box identified −3.755 0.344 0.733 0.077 
 Says three clear words −11.725 1.013 1.084 0.133 
 Uses 4 clear words −10.924 0.966 1.084 0.133 
 Uses 5 clear words −9.970 0.920 1.084 0.133 
 Uses 6 or 7 clear words −9.516 0.896 1.084 0.133 
 Uses 9+ clear words −8.513 0.836 1.084 0.133 
 Uses 12+ clear words −7.609 0.778 1.084 0.133 
 Uses 20+ clear words −6.351 0.691 1.084 0.133 
βm or Cut (βmg)SEαSE
Items based on Probit model     
 Uses word combinations 5.374 0.494 1.000 — 
 Shakes head for no 3.089 0.453 0.217 0.053 
 Short, babbled sentences of 6+ syllables 5.496 1.451 0.383 0.134 
 Looks at pictures for a few seconds 3.358 0.543 0.241 0.061 
 Tries definitely to sing 2.799 0.368 0.201 0.045 
 Knows own name 5.092 1.055 0.439 0.112 
 Likes rhymes and jingles 2.505 0.288 0.154 0.037 
 Picture vocabulary (12) −1.395 0.185 0.320 0.046 
 Talks well in sentences of 6+ syllables (record) −0.827 0.228 0.546 0.088 
 Names 6 or more objects in large picture −1.119 0.237 0.504 0.080 
 Opposites 2 −4.040 0.336 0.291 0.044 
 Names 12 objects in large picture −3.579 0.535 0.439 0.085 
Items based on ordered Probit model     
 One object in box identified −6.862 0.444 0.733 0.077 
 Two objects in box identified −6.221 0.423 0.733 0.077 
 Four objects in box identified −5.188 0.390 0.733 0.077 
 Eight objects in box identified −3.755 0.344 0.733 0.077 
 Says three clear words −11.725 1.013 1.084 0.133 
 Uses 4 clear words −10.924 0.966 1.084 0.133 
 Uses 5 clear words −9.970 0.920 1.084 0.133 
 Uses 6 or 7 clear words −9.516 0.896 1.084 0.133 
 Uses 9+ clear words −8.513 0.836 1.084 0.133 
 Uses 12+ clear words −7.609 0.778 1.084 0.133 
 Uses 20+ clear words −6.351 0.691 1.084 0.133 
TABLE 7

Griffiths Language Skill Items: Items Based on Ordered Probit Model

Cut (βmg)SEαSE
Names 4 objects in box −1.490 0.197 0.454 0.056 
Names 12 of 18 objects in box −0.044 0.170 0.454 0.056 
Names 17–18 objects in box 3.758 0.285 0.454 0.056 
Repeats one 6-syllable sentence 1.449 0.183 0.330 0.045 
Repeats sentences of 10+ syllables 2.930 0.253 0.330 0.045 
Comprehends 2+ items 2.844 0.358 0.306 0.057 
Comprehends 4+ items 4.516 0.513 0.306 0.057 
Picture vocabulary (1) −2.587 0.407 0.794 0.124 
Picture vocabulary (2) −1.952 0.364 0.794 0.124 
Picture vocabulary (4) −0.880 0.302 0.794 0.124 
Picture vocabulary (18+) 9.108 1.012 0.794 0.124 
Uses sentences of 4+ syllables, clear speech −1.999 0.271 0.573 0.078 
Defines by use (2+) 1.103 0.228 0.573 0.078 
Babbled monologue when alone −6.829 0.811 0.596 0.093 
Long, babbled sentences, some words clear −4.503 0.609 0.596 0.093 
Picture description (1+ sentences) 3.187 0.444 0.464 0.080 
Picture Description (3+ sentences) 5.160 0.610 0.464 0.080 
Uses 2 descriptive words 1.075 0.184 0.398 0.054 
Uses 6+ descriptive words 3.416 0.300 0.398 0.054 
Looks at pictures with interest −3.578 0.351 0.385 0.052 
Enjoys picture book −2.632 0.293 0.385 0.052 
Uses 2+ personal pronouns 0.382 0.195 0.510 0.072 
Uses 6+ personal pronouns 3.891 0.388 0.510 0.072 
Cut (βmg)SEαSE
Names 4 objects in box −1.490 0.197 0.454 0.056 
Names 12 of 18 objects in box −0.044 0.170 0.454 0.056 
Names 17–18 objects in box 3.758 0.285 0.454 0.056 
Repeats one 6-syllable sentence 1.449 0.183 0.330 0.045 
Repeats sentences of 10+ syllables 2.930 0.253 0.330 0.045 
Comprehends 2+ items 2.844 0.358 0.306 0.057 
Comprehends 4+ items 4.516 0.513 0.306 0.057 
Picture vocabulary (1) −2.587 0.407 0.794 0.124 
Picture vocabulary (2) −1.952 0.364 0.794 0.124 
Picture vocabulary (4) −0.880 0.302 0.794 0.124 
Picture vocabulary (18+) 9.108 1.012 0.794 0.124 
Uses sentences of 4+ syllables, clear speech −1.999 0.271 0.573 0.078 
Defines by use (2+) 1.103 0.228 0.573 0.078 
Babbled monologue when alone −6.829 0.811 0.596 0.093 
Long, babbled sentences, some words clear −4.503 0.609 0.596 0.093 
Picture description (1+ sentences) 3.187 0.444 0.464 0.080 
Picture Description (3+ sentences) 5.160 0.610 0.464 0.080 
Uses 2 descriptive words 1.075 0.184 0.398 0.054 
Uses 6+ descriptive words 3.416 0.300 0.398 0.054 
Looks at pictures with interest −3.578 0.351 0.385 0.052 
Enjoys picture book −2.632 0.293 0.385 0.052 
Uses 2+ personal pronouns 0.382 0.195 0.510 0.072 
Uses 6+ personal pronouns 3.891 0.388 0.510 0.072 
TABLE 8

Variance of Latent Language Skill

VarianceSE
θ 39.423 7.317 
VarianceSE
θ 39.423 7.317 
We then use an empirical Bayes procedure (eg, Efron11 ) to form the empirical conditional posterior distribution (ie, g(θY,X;β,α)) of the latent factor as follows: eg.,
(3)
where the latent factor θ’s distribution parameters (ie, ϕ()) is a normal density of the latent factor formed from estimates of the latent factor’s mean (μθ) and variance (σθ), α is factor loadings, and β is the parameters of difficulty levels in Equations (1)(2). μ() is the empirical density given the estimates of the factor model (β, α, and ϕ()), ϕ(), and μ(YX,θ;β,α,ϕ(θ))ϕ(θ)dθ is likelihood of task outcome. We then calculate the empirical posterior density (g()) by Equation (3). The predicted individual latent factors are calculated by θ^=θg(θY,X;β,α)dθ.

Figure 2 plots the scatter of θi^ for a model that pools language and cognitive skills for both the Jamaica Reach Up and China REACH interventions. Figure 3 plots a fitted curve based on polynomial terms of monthly ages based on θi^.

FIGURE 2

Language skill growth curve comparison.

FIGURE 2

Language skill growth curve comparison.

Close modal
FIGURE 3

Language skill growth curve comparison by treatment status.

FIGURE 3

Language skill growth curve comparison by treatment status.

Close modal
We run two separate regressions: one for the treatment group and one for the control group. Then, we estimate the growth process for each program using θi^ by treatment status as follows:
(4)
where d indicates the treatment status and 1China is the indicator of whether an observation comes from the China REACH sample.

In Table 9, we provide estimates of the language skill growth curves by treatment status based on Equation (4). Our estimates imply that we cannot reject the null hypothesis that the growth curves are not significantly different between the China REACH and Jamaican interventions. For example, all the China REACH interaction indicator coefficients are statistically insignificant. This pattern is consistent for both the treatment group and the control group, which means that the skill growth curves are not statistically significantly different between the China REACH and Jamaican interventions.

TABLE 9

Estimates of Language Growth Curves by Treatment Status

TreatmentControl
Age 0.978 1.085 
 (0.394 − 1.563) (0.406 − 1.763) 
Age × 1China −0.364 −0.545 
 (−0.972 to 0.243) (−1.214 to 0.125) 
Age2 −0.008 −0.009 
 (−0.016 to 0.002) (−0.018 to 0.001) 
Age2 × 1China 0.007 0.009 
 (−0.002 to 0.015) (−0.001 to 0.018) 
Constant −21.123 −24.703 
 (−31.573 to −10.672) (−36.537 to −12.869) 
Constant × 1China 3.264 7.410 
 (−7.353 to 13.883) (−4.305 to 19.125) 
TreatmentControl
Age 0.978 1.085 
 (0.394 − 1.563) (0.406 − 1.763) 
Age × 1China −0.364 −0.545 
 (−0.972 to 0.243) (−1.214 to 0.125) 
Age2 −0.008 −0.009 
 (−0.016 to 0.002) (−0.018 to 0.001) 
Age2 × 1China 0.007 0.009 
 (−0.002 to 0.015) (−0.001 to 0.018) 
Constant −21.123 −24.703 
 (−31.573 to −10.672) (−36.537 to −12.869) 
Constant × 1China 3.264 7.410 
 (−7.353 to 13.883) (−4.305 to 19.125) 

Figure 3 compares the language skill growth curves for China REACH and Jamaica Reach Up and Learn based on the estimates in Table 9. There is close agreement between the language skill development processes of each program. If children in the China REACH program continue on course, the China REACH will reproduce the effects of the successful Jamaica program documented in Gertler et al.3,4 

An important question is whether investment at later ages can substitute for early childhood investment. In the China REACH program, the uniqueness of the implementation strategy makes it possible for us to examine this question. Between the ages of 10 and 24 months, children enter the program more or less randomly with respect to age because of administrative constraints (Fig 4). Because the intervention curriculum is designed based on children’s weekly ages, children have the same intervention at the same weekly ages. This means that if the child is enrolled at age 20 months, he or she starts the intervention with the content for 20-month-old children without exposure to previous trainings designed for those younger than 20 months in the curriculum. Similarly, if the child is enrolled at age 10 months, he or she starts with the tasks for 10-month-old children. Children who enroll at earlier ages get more investment than those who enroll at later ages.

FIGURE 4

The distribution of monthly age when enrolled into the program.

FIGURE 4

The distribution of monthly age when enrolled into the program.

Close modal

Heckman and Zhou (dynamic complementarity; J.J.H, J.Z., unpublished data) test this hypothesis using China REACH data. Table 10 compares language passing rates at different ages for children of different ability levels who enroll early in the program with those who enroll late. In the P-value rows, they report the single null hypothesis test results at each difficulty level between the earlier enrolled group and the group enrolled at later ages.

TABLE 10

Language Passing Rate by Enrollment Age and Ability

Language Difficulty Level 
Mean (passing rate) 10 11 10 11 10 11 
 High ability Medium ability Low ability 
 Enroll (10–15) vs (21–25) Enroll (10–15) vs (21–25) Enroll (10–15) vs (21–25) 
Mean (age 10–15 y) 0.937 0.903 0.955 0.920 0.956 0.722 0.741 0.767 0.766 0.762 0.344 0.517 0.499 0.566 0.445 
Mean (age 16–20 y) 0.892 0.919 0.897 0.911 0.979 0.629 0.673 0.748 0.802 0.784 0.232 0.402 0.323 0.399 0.369 
P 0.080* 0.684 0.148 0.901 0.369 0.000* 0.005* 0.651 0.463 0.535 0.008* 0.021* 0.031* 0.084* 0.250 
N 74 73 62 42 69 247 245 217 175 232 98 95 87 63 89 
 Enroll (10–15) vs (21–25) Enroll (10–15) vs (21–25) Enroll (10–15) vs (21–25) 
Mean (age 10–15 y) 0.937 0.903 0.955 0.920 0.956 0.722 0.741 0.767 0.766 0.762 0.344 0.517 0.499 0.566 0.445 
Mean (age 21–25y) 0.938 0.935 0.949 0.938 0.922 0.656 0.726 0.628 0.856 0.695 0.290 0.376 0.320 0.556 0.253 
P 0.896 0.447 0.876 0.697 0.344 0.006* 0.524 0.004* 0.041* 0.065* 0.217 0.005* 0.030* 0.907 0.002* 
N 61 62 54 42 58 222 221 197 169 210 98 95 86 70 88 
 Enroll (16–20) vs (21–25) Enroll (16–20) vs (21–25) Enroll (16–20) vs (21–25) 
Mean (age 16–20 y) 0.892 0.919 0.897 0.911 0.979 0.629 0.673 0.748 0.802 0.784 0.232 0.402 0.323 0.399 0.369 
Mean (age 21–25 y) 0.938 0.935 0.949 0.938 0.922 0.656 0.726 0.628 0.856 0.695 0.290 0.376 0.320 0.556 0.253 
P 0.151 0.587 0.190 0.596 0.028* 0.232 0.032* 0.010* 0.144 0.010* 0.128 0.619 0.959 0.061* 0.065* 
N 69 71 64 54 67 211 210 198 180 206 84 84 77 63 79 
Language Difficulty Level 
Mean (passing rate) 10 11 10 11 10 11 
 High ability Medium ability Low ability 
 Enroll (10–15) vs (21–25) Enroll (10–15) vs (21–25) Enroll (10–15) vs (21–25) 
Mean (age 10–15 y) 0.937 0.903 0.955 0.920 0.956 0.722 0.741 0.767 0.766 0.762 0.344 0.517 0.499 0.566 0.445 
Mean (age 16–20 y) 0.892 0.919 0.897 0.911 0.979 0.629 0.673 0.748 0.802 0.784 0.232 0.402 0.323 0.399 0.369 
P 0.080* 0.684 0.148 0.901 0.369 0.000* 0.005* 0.651 0.463 0.535 0.008* 0.021* 0.031* 0.084* 0.250 
N 74 73 62 42 69 247 245 217 175 232 98 95 87 63 89 
 Enroll (10–15) vs (21–25) Enroll (10–15) vs (21–25) Enroll (10–15) vs (21–25) 
Mean (age 10–15 y) 0.937 0.903 0.955 0.920 0.956 0.722 0.741 0.767 0.766 0.762 0.344 0.517 0.499 0.566 0.445 
Mean (age 21–25y) 0.938 0.935 0.949 0.938 0.922 0.656 0.726 0.628 0.856 0.695 0.290 0.376 0.320 0.556 0.253 
P 0.896 0.447 0.876 0.697 0.344 0.006* 0.524 0.004* 0.041* 0.065* 0.217 0.005* 0.030* 0.907 0.002* 
N 61 62 54 42 58 222 221 197 169 210 98 95 86 70 88 
 Enroll (16–20) vs (21–25) Enroll (16–20) vs (21–25) Enroll (16–20) vs (21–25) 
Mean (age 16–20 y) 0.892 0.919 0.897 0.911 0.979 0.629 0.673 0.748 0.802 0.784 0.232 0.402 0.323 0.399 0.369 
Mean (age 21–25 y) 0.938 0.935 0.949 0.938 0.922 0.656 0.726 0.628 0.856 0.695 0.290 0.376 0.320 0.556 0.253 
P 0.151 0.587 0.190 0.596 0.028* 0.232 0.032* 0.010* 0.144 0.010* 0.128 0.619 0.959 0.061* 0.065* 
N 69 71 64 54 67 211 210 198 180 206 84 84 77 63 79 

Source: J.J.H, J.Z (unpublished data, 2022).

Group (10–15) represents children whose monthly ages are between 10 and 15 at enrollment. Group (16–20) represents children whose monthly ages are between 16 and 20 at enrollment. Group (21–25) represents children whose monthly ages are between 21 and 25 at enrollment. High ability: the child passes the first task at more than 80% of the difficulty levels, and the average passing rate at that level is greater than 80%. Medium ability: the child does not pass the first task, and the passing rate is greater than 50%; or the child passes the first task, and the passing rate is between 50% and 80%. Low ability: the average passing rate is less than 50%. The columns report the average passing rate from difficulty levels 7 to 11, at which all 3 age enrollment groups are trained during the intervention.

*

P less than 0.1.

We find a general pattern that early starters do better at the same task difficulty and child ability levels. Those who start learning earlier have persistent advantages in later life learning. This effect does not operate uniformly across ability groups. Medium- and low-ability children display strong effects of early initial training, but high-ability children do not. We measure ability using the speed of mastery of well-defined tasks.12  Early investment improves skills at later ages, especially for medium- and low-ability children. High-ability children without early investment catch up quickly.

This section discusses the per-pupil costs of the China REACH program and compares them with those of the Jamaican program. Table 11 presents the cost comparison between China REACH and Jamaica Reach Up and Learn. Personnel costs are the largest part for both programs. They constitute 83% for China REACH and 67% for the Jamaican program. In terms of the annual per-child cost, China REACH is approximately 70% of the cost of the Jamaican program. China REACH maintains a home visitor–child ratio that is very close to the Jamaican program (ie, the home visitor–child ratio is approximately 8 for the China REACH program and approximately 10 for the Jamaican program). This is promising for the scaled program.

China REACH shows that the beneficial impacts of the Jamaican program can be reproduced in a program at scale at least through the early ages. Skill requirements for being a trained home visitor are low. Visitors are residents of the villages with the same (relatively low) levels of education as the other village residents. There is an ample supply of such women. Initial training took 2 weeks and was conducted by relatively few, more highly trained program teachers who generally have advanced degrees (eg, Master’s degree). After training, while they are in the field, local supervisors regularly monitor each home visitor. There was at least monthly field supervision of each visitor in the Jamaican intervention. Weekly group meetings and monthly supervisors’ observation visits were conducted for the China REACH intervention.

TABLE 11

Program Cost per Child (Annual) Comparison Across Interventions

CategoryChina REACH (Huachi)Jamaica Home Visiting
Annual cost per child 527.69 751.60 
 Fixed cost 91.08 251.47 
  Expert fee 37.54 193.10 
  Supplies and facilities 53.54 58.37 
 Variable cost 436.61 500.13 
  Personnel cost 391.64 467.26 
  Toy-making and relevant 44.97 32.87 
Teacher/child ratio 93/718 ≃ 1/8 6/63 ≃ 1/10 
CategoryChina REACH (Huachi)Jamaica Home Visiting
Annual cost per child 527.69 751.60 
 Fixed cost 91.08 251.47 
  Expert fee 37.54 193.10 
  Supplies and facilities 53.54 58.37 
 Variable cost 436.61 500.13 
  Personnel cost 391.64 467.26 
  Toy-making and relevant 44.97 32.87 
Teacher/child ratio 93/718 ≃ 1/8 6/63 ≃ 1/10 

China REACH cost data are collected by the program. The Jamaican program’s costs are based on interviews with the original home-visiting program members and the expenditure statements in historical Ford Foundation grant files. The original files presented expenditures in 1988 Jamaican dollars. For both programs, after adjusting for inflation and exchange rate, we report the costs in 2015 US dollars.

Visits are approximately one hour per week. They are adapted to conditions in the village and do not require elaborate infrastructure. The county government and the county-town-village three-tier mother and child health care system support the management of the China REACH program in Huachi.

This paper summarizes findings from China REACH, a replication of the original Jamaica Reach Up and Learn program, which was brought to scale in an impoverished region of Western China (more than 1500 participants compared with the roughly 100 participants in the original Jamaica study). We develop and implement a method for comparing diverse test scores. Using this approach, we find that skill growth curves are comparable for China REACH and Jamaican Reach Up and Learn programs at early childhood age range. Because of data limitations, our paper provides the comparison during the intervention age range only. Further examination of the comparability of the long-term skill growth profiles across these two interventions is warranted.

We compare treatment effects and skill growth curves of the China REACH and Jamaica Reach Up and Learn programs. We find evidence for the importance of early enrollment for final learning for low- and medium-ability groups in the replication program, but not for high-ability students. We investigate the mechanisms behind the original Jamaica program in Heckman and Zhou.12  We quantify the evidence that higher interaction quality between home visitors and caregivers significantly improves treated children’s skill development.

Our method can be used for comparing different interventions or the same intervention at different ages. It will be meaningful to investigate the common mechanisms that promote child skill development.

Drs Zhou and Heckman conceptualized and designed the study, drafted the initial manuscript, and reviewed and revised the manuscript; Dr Liu and Mr Lu supported the fieldwork, coordinated and supervised data collection, and reviewed the manuscript; Drs Chang and Grantham-McGregor conceptualized and designed the study, supported the fieldwork, and reviewed the manuscript; and all authors approved the final manuscript as submitted and agreed to be accountable for all aspects of the work.

FUNDING: Research reported in this publication was supported by the Eunice Kennedy Shriver National Institute of Child Health and Human Development of the National Institutes of Health under award number R37HD065072, the Institute for New Economic Thinking, and a grant from an anonymous donor. The authors thank our partner China Development Research Foundation for both financial and scholarly support.

CONFLICT OF INTEREST DISCLOSURES: The authors have indicated they have no potential conflicts of interest relevant to disclose.

China REACH

China Rural Education and Child Health

1
Britto
PR
,
Engle
PL
,
Super
CM
, eds.
Handbook of Early Childhood Development Research and Its Impact on Global Policy
.
Oxford, UK
:
Oxford University Press
;
2013
2
Engle
PL
,
Fernald
LCH
,
Alderman
H
, et al;
Global Child Development Steering Group
.
Strategies for reducing inequalities and improving developmental outcomes for young children in low-income and middle-income countries
.
Lancet
.
2011
;
378
(
9799
):
1339
1353
3
Gertler
P
,
Heckman
JJ
,
Pinto
R
, et al
.
Effect of the Jamaica early childhood stimulation intervention on labor market outcomes at age 31
.
Available at: https://openknowledge.worldbank.org/handle/10986/36335. Accessed February 27, 2023
4
Gertler
P
,
Heckman
J
,
Pinto
R
, et al
.
Labor market returns to an early childhood stimulation intervention in Jamaica
.
Science
.
2014
;
344
(
6187
):
998
1001
5
Grantham-McGregor
S
,
Smith
JA
.
Extending the Jamaican early childhood development intervention
.
J Appl Res Child
.
2016
;
7
(
2
):
article 4
6
Zhou
J
,
Heckman
JJ
,
Liu
B
,
Lu
M
.
The impacts of a prototypical home visiting program on child skills
.
7
Luiz
DM
,
Foxcroft
CD
,
Tukulu
AN
.
The Denver II scales and the Griffiths scales of mental development: a correlational study
.
J Child Adolesc Ment Health
.
2004
;
16
(
2
):
77
81
8
Elliman
AM
,
Bryan
EM
,
Elliman
AD
,
Palmer
P
,
Dubowitz
L
.
Denver developmental screening test and preterm infants
.
Arch Dis Child
.
1985
;
60
(
1
):
20
24
9
Rubio-Codina
M
,
Araujo
MC
,
Attanasio
O
,
Muñoz
P
,
Grantham-McGregor
S
.
Concurrent validity and feasibility of short tests currently used to measure early childhood development in large scale studies
.
PLoS One
.
2016
;
11
(
8
):
e0160962
10
van der Linden
WJ
. (
2016
).
Handbook of Item Response Theory: Volume 1: Models
.
CRC Press
.
11
Efron
B
.
Large-scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction
.
Vol. 1
.
Cambridge, MA
:
Cambridge University Press
;
2012
12
Heckman
JJ
,
Zhou
J
.
Measuring knowledge and learning
.
Available at: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4098311. Accessed February 27, 2023