OBJECTIVES

To review and meta-analyze existing evidence regarding the impact of school start times (SSTs) on youth sleep and developmental outcomes considering the moderating effects of youth and school characteristics. Scopus, ScienceDirect, JSTOR, Pubmed, PsychInfo, ERIC, Proquest, EBSCO, and Google Scholar were used through 2019 to select studies measuring (1) school start time and (2) sleep or other developmental outcomes. Data from 28 studies and 1 774 509 participants were extracted and analyzed using random-effects models with robust variance estimation.

RESULTS

Later SSTs were associated with better overall developmental outcomes, longer sleep duration, and less negative mood. Specifically, new SSTs between 8:30 and 8:59 were associated with better outcomes than 8:00 to 8:29 start times. Later SSTs were more strongly associated with lower levels of sleepiness for high school (versus middle school) youth, and youth in private (versus public) schools reported better sleep and later wake times with later SSTs. Although this meta-analysis suggests an overall benefit of later SSTs, there was limited research to test outcomes such as sleep hygiene, naps, and behavioral and physical health outcomes.

CONCLUSIONS

There is converging evidence that later SSTs are associated with better overall developmental outcomes, longer sleep duration, and less negative mood. More research needs to consider student and school characteristics to obtain reliable estimates related to possible differences by sex, race, school size, percent free/reduced lunch, and percent minority.

What’s Known on This Subject:

Sleep is a biological imperative linked to pediatric development, socioemotional functioning, academic achievement, and health. Adolescents undergo a sleep phase delay, which may not correspond with early school start times, resulting in compromised development and outcomes.

What This Study Adds:

While some evidence supports later school start times as beneficial for pediatric outcomes, less research has considered possible differences by individual characteristics such as age, sex, race, and school characteristics such as region, socioeconomic status, and ethnic/racial composition.

Sleep is a biological imperative shaped by physiologic needs and social and environmental factors.1  The importance of sleep for youth development, socioemotional functioning, academic achievement, and health is well documented.2,3  Aside from targeting individual habits and sleep-hygiene behaviors,4  there is a growing interest in understanding and changing structural influences on youth sleep and related outcomes.5  In particular, parents and educators are considering how school-related factors, such as what time schools begin, may be matched or mismatched to the circadian rhythms of young people.6,7  Youth spend most of their waking hours in educational settings.8  They also spend a large portion of their lives from childhood (entering kindergarten around age 5) to young adulthood (graduating high school around age 18) in formal educational spaces. In the intermediary periods, youth encounter developmental milestones, perhaps the most notable of which is puberty. Puberty is associated with changes in sleep and circadian rhythms.9,10  Changes in circadian rhythms are associated with “sleep phase delay”,11  and adolescents tire later at night. However, as youth move from elementary to middle and high school, the start of school is not similarly delayed; in fact, school often starts earlier, effectively moving in the opposite direction of youth’s biological rhythms.12  Although the biological need for sleep may decline in the intervening years between childhood (7–12 hours may be appropriate) and adolescence (7–11 hours may be appropriate),13  the opportunity for sleep is truncated by later bedtimes and earlier or similar school start times at a magnitude much larger than the decline in need for sleep.

As a result, communities and schools are beginning to reconsider school start times (SSTs) with many districts and states (eg, California) enacting legislation to delay the start of school to match the delay in adolescent sleep phase.7,12,14  Although there is a growing evidence base to support these new policies,7,12,1520  there has been less systematic analysis of the optimal start time for young people depending on individual characteristics, such as age and school level (ie, elementary, middle, and high school), sex, and socioeconomic status, as well as consideration of school characteristics, such as regional differences (ie, urban, suburban, or rural) and private versus public schools.

The current meta-analysis contributes to the science on SST and pediatric development through the investigation of 4 aims. The first aim is to investigate the associations between SST with sleep and developmental outcomes. The subsequent aims consider possible moderators. For example, the second aim investigates if the association between SST and sleep and development are moderated by initial SST (ie, which start times would benefit most from a delay?). Third, the analysis investigates if associations are moderated by new SST (ie, which new start times are associated optimal youth outcomes?). Finally, the meta-analysis considers whether associations are moderated by youth and school characteristics, such as age, year in school, sex, ethnicity/race, private versus public schools, and urban, rural, or suburban communities.

Literature searches were conducted Scopus, ScienceDirect, JSTOR, Pubmed, PsychInfo, ERIC, Proquest, EBSCO, and Google Scholar, using a combination of keywords associated with school start time (“school start time” or “school starting times”), sleep (“sleep”, “sleep duration”, or “sleep pattern”), and developmental outcomes (“achievement”, “academic performance”, “mental health”, “well-being”, or “health”) across samples with different ages and stages of development (ie, elementary, middle, or high school). The searches included studies through the end of 2019 and produced 1263 records. One hundred and nineteen duplicate records were removed, and the remaining 1144 abstracts were reviewed and screened. Of these, only articles meeting all of the following inclusion criteria were retained: (1) measured school start time; (2) measured sleep or other developmental outcomes; (3) published in English; and (4) could not be excluded based on available information in the abstract. The research team requested unpublished data from authors who have published on pediatric sleep, school start time, sleep, and developmental outcomes via e-mail and ResearchGate (n = 10); however, no records were added through this approach. Taken together, the literature searches and screening resulted in 80 reports retained for further examination.

The full texts of these 80 reports were screened for the following exclusion criteria: (1) did not include school start time (n = 9); (2) did not include empirical data (n = 21, including 17 meta-analysis and review articles); (3) did not include appropriate statistics (eg, qualitative research, n = 1); and (4) duplicate reports (n = 3); as a result, 34 were excluded (46 reports were retained). The research team also reviewed the reference lists of 17 meta-analysis and systematic review articles, producing 125 records. After screening and cross-checking, 114 reports were excluded and 11 articles were retained, yielding a total of 57 (46 + 11) reports. Approximately half of these 57 articles (n = 28) were coded by 2 to 3 coders. Coding discrepancies were discussed and resolved to achieve intercoder agreement (interrater reliability rs = 0.99–1.00 for continuous variables and κs = 0.83−1.00 for categorical or string variables). Once reliability was achieved, the remaining 29 articles were independently coded by 1 of the 3 trained coders.

The 57 reports were coded, and the primary investigator contacted authors (n = 53) to obtain missing information (eg, group comparison statistics, sample descriptives, and demographic information) for effect size computation (eg, correlations between school start time and sleep) or moderation analysis (eg, participants’ race or ethnicity). Only 21 requests were met; 13 authors indicated they were not able to provide requested statistics or raw data, and 19 authors did not respond despite several attempts. The coders coded 39 reports with complete data, whereas 29 reports had sufficient statistical information for inclusion (see Supplemental Table 9 for additional information about included studies). Two reports analyzed the same research data; therefore, 28 studies were included in the final analyses (see Fig 1 PRISMA flowchart21 ): 27 peer-reviewed articles and 1 research report. A detailed data-extraction manual can be found in the Supplemental Information. Additional information about studies for which there was insufficient information for inclusion is shown in Supplemental Table 10.

Many studies (n = 18) measured specific school start times (eg, 7:45 am, “What time do you need to arrive at school?”17 ), whereas others (n = 10) used school start time ranges (eg, 7:30–7:45 am22 ). Approximately 36% of studies (n = 10) included data pre- and postdata examining developmental outcomes before and after a change in school start time (eg, delaying start time from 7:50 to 8:45 am in the Seattle School District23 ), whereas remaining studies (n = 18) compared SSTs across countries (Australia versus United States17 ), districts (eg, Altoona versus Chippewa Falls in Wisconsin24 ), or schools (eg, “early-starting schools” versus “late-starting schools”25 ).

The following terminology is used to synthesize across analytic approaches: “initial SST” refers to the SST before the implementation of an SST delay, whereas “new SST” refers to the SST after the implementation of an SST delay. Two additional terms allow for the inclusion of studies that did not implement a change in SST but analyzed schools with different start times. These terms include “earlier SSTs”, which are discussed in contrast to “later SSTs” as a descriptive comparison of SST timing irrespective of any changes implemented in the SSTs.

First, the current study investigated the direct effects of school start times (ie, new, later school start times) on outcomes without considering initial or later start time. Next, initial and later SST were considered as moderators of the direct effect. To conduct the moderation analyses, all SSTs were coded into 5 groups: (1) before 7:30 am, (2) between 7:30 and 7:59 am, (3) between 8:00 and 8:29 am, (4) between 8:30 and 8:59 am, and (5) after 9:00 am to account for differences in how SSTs were reported; for example, some studies reported SSTs as specific times whereas others reported a range of times. However, although SSTs were coded categorically in the moderation analyses, they were treated as a continuous variable when estimating direct effects on developmental outcomes.

To investigate the broad impact of SSTs, this study focused on a range of developmental outcomes: (1) sleep (n = 25), (2) socioemotional health (both positive and negative; n = 8), (3) academics (eg, grades; n = 9), (4) cognitive development (eg, sustained attention; n = 3), (5) behavioral health (eg, substance use; n = 3), and (6) physical health (n = 3). Sleep indicators consisted of duration (n = 23), quality (n = 10), bedtime (n = 17), wake time (n = 15), sleepiness (n = 13), chronotype (n = 7), hygiene (n = 2), and social/jet lag and naps (n = 3). Sleep indicators were assessed using self-reports (n = 18), actigraphy (n = 2), a combination of self-reports and actigraphy (n = 3), a combination of actigraphy and laboratory assessment (n = 1), and a combination of self-reports and other method (ie, “school records”; n = 1). Other outcomes were assessed using self-reports (n = 6), school records (n = 4), parent reports (n = 1), cognitive tests (n = 1), biomarkers (n = 1), a combination of self-reports and parent reports (n = 1), a combination of self-reports and school records (n = 1), a combination of self-reports, teacher reports, and cognitive tests (n = 1), a combination of self-reports, teacher reports, and school records (n = 1), and a combination of self-reports and health center records (n = 1).

Studies reported the effect of SST on youth developmental outcomes using a range of statistics, including correlations and mean comparisons. Correlation coefficients were transformed to Fisher’s z using the following equation:26 
Z=12ln(1+r1r)
When studies reported mean differences in youth outcomes by SSTs, effect sizes were converted to Cohen’s d using the online utility developed by Lenhard and Lenhard.27  Cohen’s d was then transformed into correlation coefficients using the following equation:26 
r=dd2+a, a=(n1+n2)2n1n2
where n1 and n2 were group sizes. When studies reported odds ratios for the effect of SST on youth outcomes, the odds ratios were converted to Cohen’s d using the following equation:26 
d=log OddsRatio×3π
and were then transformed to correlation coefficients. When unadjusted effect sizes were not available, we converted regression coefficients into semipartial correlations using the following equation (Aloe and Becker, 201228 ):
rsp=t(1R2)(np1)
where t is the t test of the target regression coefficient, R2 is the total amount of variance in adjustment explained by the regression model, n is the sample size, and p is the number of predictors.

To standardize interpretation, all effect sizes were coded such that positive effect sizes represent later SSTs having a stronger association with better youth outcomes. For example, if a study reported a negative correlation between SST and depressive symptoms, the correlation coefficient was reverse coded to capture a positive association between later SST and better socioemotional well-being. As another example, if a study reported a mean difference in sleep duration between 2 groups with different SSTs, Cohen’s d was computed to compare sleep duration in the latter SST group to the earlier SST group.

Risk of bias was evaluated for all synthesized studies using an adapted version of the Newcastle-Ottawa Scale for assessing the quality of nonrandomized studies in meta-analyses.29  The third and fourth authors coded the representativeness of the sample (r = 0.93), inclusion of intervention cohorts (r = 0.93), assessment of school start times (r = 0.98), whether analyses controlled for baseline measure of outcome (r = 1.00), control confounding variables (r = 0.91), assessment of outcome (r = 0.98), and adequacy of follow-up of cohorts (r = 0.90). A detailed coding manual is reported in Supplemental Table 11. The rating resulted in a sum score per study (possible ranges = 0–21, with higher scores indicating better quality). Next, the quality of each study was coded as high (sum scores of 17–21), moderate (sum scores of 10–16), low (sum scores of 5–9), or poor (sum scores of 0–4). The strength of evidence was coded for each outcome domain following prior meta-analyses.30  The first and second authors coded risk of bias (r = 0.94), imprecision (r = 1.00), inconsistency (r = 1.00), indirectness (r = 1.00), publication/reporting bias (r = 0.88). A detailed coding manual is reported in Supplemental Table 12. The coding resulted in a sum score for each study that ranged between 5 and 12. The sum scores were averaged within each outcome domain. Next, strength of evidence for each outcome was coded where scores of 5 to 6 were coded as high, 7 to 10 as moderate, and 11 to 12 as low.

All analyses were conducted in a meta-regression framework using the robumeta R package.31  Random-effects models were used to allow true effect sizes to vary among studies.26  Robust variance estimations31  were used to handle nonindependence in effect sizes (eg, effect sizes drawn from the same study or project). Correlated effects weights were used based on the most prevalent source of dependence in the data (ie, multiple measures from the same participants32 ). The average correlation between dependent effect sizes (ρ; ranging between 0 and 1) was specified at the default value (0.80).31  Previous research demonstrates robustness in estimates of effect sizes across reasonable values of ρ.33  Finally, small sample adjustments were implemented to provide unbiased estimates with small numbers of studies (eg, <40) and skewed covariates.34  Simulation research shows that after incorporating the small sample adjustments, the RVEs of effect sizes are robust when the degree of freedom is >4.34  As such, estimates that had <4 degrees of freedom were excluded from analysis.

Intercept-only meta-regression models were conducted to examine the effect of SSTs on youth outcomes overall and for each domain separately. The estimated mean effect sizes were then converted to correlations, and Cohen’s35  criteria were used to evaluate the effect sizes. The significance of SST effects was determined by the 95% confidence intervals (CIs) and P values. Heterogeneity was evaluated using 2 indices: the between-study variance in study-average effect sizes (τ2), and the ratio of true heterogeneity to total variance across the observed effect sizes (I2).

Next, univariate meta-regression models examined how the association between SST and youth developmental outcomes was moderated by school start time, youth characteristics (age, sex, and race or ethnicity), and school characteristics (school level, sector, size, SES, racial or ethnic composition, urbanicity, region, and country). Categorical moderators were dummy coded, and the reference group was rotated to obtain all possible pairwise comparisons. Next, subgroup analyses were conducted for each categorical moderator to obtain the mean effect size for each group. Meta-regression analyses investigate whether there are statistically significant differences by groups (ie, are the groups statistically different from each other) based on the moderating variable. In contrast, subgroup analyses complement the moderation analyses by providing estimates for the individual effect size by subgroup (ie, the specific subgroup effect size estimate).

Multiple sensitivity analyses were conducted to examine robustness. First, the overall association between SST and youth developmental outcomes was investigated by (1) controlling for participant age and (2) focusing on studies that reported youth-level data. Next, sensitivity analyses examined the moderating effects of SST separately at each school level (ie, elementary, middle, or high school). Two sets of sensitivity analyses were investigated for the moderating effects of youth and school characteristics: (1) controlling for age in all meta-regression analyses and (2) addressing potential heterogeneity across countries by focusing on US studies only for youth race or ethnicity, school sector, and urbanicity.

Finally, publication biases were investigated using funnel plots with Egger’s tests36  and trim-and-fill analyses.37  Because RVE is not available for assessing publication bias,38  effect sizes were aggregated within each project, and publication bias using traditional meta-analytic methods (assuming effect sizes are independent) in random effects models were examined using the Metafor R package.39 

First, the average effect of SST was estimated across SST, youth characteristics, and school characteristics. This was done in 2 steps: (1) estimating the effect between SST and an aggregate indicator of overall development and (2) estimating the effect of SST for each indicator (ie, sleep, socioemotional, academic, behavioral, physical, and cognitive) separately. For socioemotional well-being, additional analyses examined positive and negative outcomes separately. Each sleep indicator (duration, quality, bedtime, wake time, hygiene, chronotype, social/jet lag and naps, and sleepiness) was analyzed separately. Table 1 presents the effect size, confidence interval, and heterogeneity estimates. Figure 2 presents a forest plot for all effect sizes. Supplemental Figs 4–13 present forest plots for each developmental indicator.

For aggregated developmental outcomes, there was a small SST effect (Table 1, top row) such that later SSTs were associated with better outcomes (mean effect size = 0.061; SE = 0.022, 95% CI = 0.015–0.107; P = .01). For sleep outcomes (Table 1, second row), there was no significant effect for aggregated sleep or for individual sleep indicators with 1 exception: later SSTs were associated with longer sleep duration, with a small effect size (mean effect size = 0.109; SE = 0.043; 95% CI = 0.019–0.199; P = .02). Studies on sleep hygiene and social/jet lag and naps were not sufficiently powered to produce reliable results (degrees of freedom [df] < 4). For socioemotional outcomes, positive and negative socioemotional well-being were investigated separately, and there was a small effect between later SSTs and lower levels of negative socioemotional well-being (higher scores indicate lower levels of negative socioemotional well-being; mean effect size = 0.060; SE = 0.023; 95% CI = 0.001–0.119; P = .04). There were no significant associations between SST and academic outcomes. Because of limited degrees of freedom (df < 4), there were not sufficient data to examine the association between SST and positive socioemotional well-being, and other developmental domains (ie, behavioral, physical, or cognitive).

Two sets of sensitivity analyses were conducted. The first set controlled for participant age and centered age at 15 years (ie, the mean age across studies; range was 9–21 years old), so that the intercept of each meta-regression model would estimate the mean effect size across studies. Results are shown in the top portion of Supplemental Table 13. Covarying age necessitated removing 2 studies that did not report participant age; this resulted in insufficient data to examine negative socioemotional outcomes, academic outcomes, or sleep chronotype. For outcomes that had sufficient degrees of freedom for analysis, an identical pattern of significance was observed as the primary findings reported above. The second set of sensitivity analyses examined the overall effects of SST on developmental outcomes, including only studies that report youth-level data. Results are shown in the bottom portion of Supplemental Table 13. Again, an identical pattern of significance was observed, supporting the robustness of the primary findings reported above.

To investigate whether associations between SST and outcomes were dependent upon initial SSTs, and to identify which start time categories may benefit most from later SSTs, moderation analyses were conducted on initial SST. Because previous research suggests potential nonlinear effects of SST,12  the 5 SST categories were dummy coded and tested as moderators in a meta-regression. The reference groups were rotated to obtain all possible comparisons. Subgroup analyses provided estimates of the effect sizes for the association between SST and youth outcomes for each SST category.

Several significant moderating effects emerged for initial SST, and the meta-regression results are shown in Table 2. For the aggregated developmental outcome indicator, later SST was associated with better outcomes for youth whose initial SST was 7:30 to 7:59 am (compared with youth with SSTs before 7:30 am). For sleep outcomes, later SST was associated with better overall sleep for youth whose initial SST was 7:30 to 7:59 am (compared with youth with SSTs before 7:30 am). No significant moderating effects were observed for other outcomes. Subgroup analyses (Table 4) observed that later SSTs were significantly associated with better aggregated outcomes, better aggregated sleep indicators, and longer sleep duration for youth whose initial SST was between 7:30 and 7:59 am, but not youth in other SST categories.

Sensitivity analyses examined the moderating effects of SST separately at each school level (elementary, middle, and high schools). Because of limited study numbers, however, analyses were only sufficiently powered for high school studies. Among high school studies, later SST was associated with better youth outcomes when initial SST was 7:30 to 7:59 am. Focusing on the moderating effects of initial SST (Supplemental Table 14), we observed 2 significant findings. Later SSTs were associated with better overall sleep and earlier bedtime for youth whose initial SST was 7:30 to 7:59 am (compared with youth with initial SSTs of 8:30–8:59 am).

To investigate whether associations between SST and outcomes were dependent upon new SSTs, moderation analyses investigated the impact of the new SST after the implementation of a later, delayed SST. The 5 SST categories were dummy coded and tested as moderators in a meta-regression. The reference groups were rotated to obtain all possible comparisons. Subgroup analyses provided the effect size estimates for the association between SST and youth outcomes for each SST category.

Significant moderating effects emerged for sleep outcomes and estimates from the meta-reg ression results are presented in Table 3. Later SSTs were associated with better sleep quality for youth whose new SST was 8:30 to 8:59 am (compared with those whose new SST was 8:00–8:29 am). No significant moderating effects were observed for other outcomes. Subgroup analyses showed that later SSTs were significantly associated with better overall outcomes, better overall sleep, and longer sleep duration among youth whose new SST was 8:30 to 8:59 am but not for other new SSTs (Table 5).

Sensitivity analyses examined the moderating effects of SST separately at each school level (elementary, middle, and high schools). Because of limited study numbers, however, analyses were only sufficiently powered for high school studies. Among high school studies, later SST was associated with better youth outcomes only when new SST was 8:30 to 8:59 am.

The final set of analyses examined the extent to which the association between SST and youth outcomes varied by youth (age, sex, and race or ethnicity) and school characteristics (school level, sector, size, socioeconomic status [SES], racial or ethnic composition, urbanicity, region, and country). Meta-regression analyses were conducted for each moderator separately. Each moderator (school level, sector, urbanicity, region, and country) was dummy coded, and the reference group was rotated to obtain all possible comparisons. To obtain effect size estimates, subgroup analyses examined the association between SST and youth outcomes within each moderator category.

Youth Characteristics

No significant moderating effect emerged for youth age. Unfortunately, there were insufficient degrees of freedom to produce reliable estimates for moderation by sex or race or ethnicity (Table 6).

School Characteristics

Significant moderating effects emerged for school characteristics, including school level, sector, and country (Table 7). Specifically, a significant difference in effect sizes emerged between middle and high school youth in which later SSTs were more strongly associated with lower levels of sleepiness for high school (compared with middle school) youth. No significant differences emerged for other outcomes. All subgroup analyses and coefficient estimates can be found in Table 8. For school sector (Table 7), significant differences emerged; for youth in private (compared with public) schools, later SSTs were more strongly associated with better overall sleep and later wake times. For geographic differences, effect sizes among youth in North America, Europe, and Asia were compared (Table 7). A significant pattern emerged; for youth in North America (compared with Europe), later SSTs were more strongly associated with earlier bedtimes.

There were insufficient degrees of freedom to produce reliable estimates for moderation by school size, SES, or racial or ethnic composition (Table 5). Finally, no significant moderating effects for school urbanicity or US region were observed in the meta-regressions (see Table 7).

Sensitivity Analyses

Two sets of sensitivity analyses were conducted. The first set addressed the large age span in the analytic sample by controlling for age in all metaregression analyses; similar to the primary analyses, no significant moderating effects were observed for youth characteristics (see Supplemental Table 16). In addition, similar significant moderating effects emerged for school characteristics, including school sector and country (see Supplemental Table 17). Specifically, later SSTs showed a stronger association with later wake time among youth in private (compared with public) schools, and youth in North America fell asleep earlier than youth in Europe.

The second set of sensitivity analyses addressed potential heterogeneity by youth race or ethnicity, school sector, and urbanicity and focused on US studies only. The results mirrored the primary analyses in which significant moderating effects emerged for school sector but not for youth race or ethnicity or school urbanicity (see Supplemental Table 18). Specifically, later SSTs were associated with better overall sleep among youth in private (compared with public) schools.

Publication bias was assessed using 2 approaches. First, funnel plots (Figs 3 A–K for overall outcomes and outcomes by domains) were created and their symmetry assessed using Egger’s test (left portion of Supplemental Table 19). Significant asymmetries were observed for the aggregated developmental outcomes, overall sleep, and sleep duration, suggesting potential publication bias in these domains. Yet, trim-and-fill analyses (right portion of Supplemental Table 19) suggest that the significant patterns between SST and youth outcomes remained robust after removing studies with extreme estimates.

Our risk of bias assessment identified 7 reports as high quality, 18 reports as moderate quality, and 4 reports as poor quality (Supplemental Table 20). We also observed moderate strength of evidence for domains with sufficient degrees of freedom for analyses (Supplemental Tables 21 and 22). As such, more high-/moderate-quality research would provide support for more conclusive effects of SST.

Overall, later SSTs were associated with better aggregated developmental outcomes, longer sleep duration, and less negative mood. With respect to which (initial) SSTs would benefit the most from a later SST, this analysis suggests that youth in schools with 7:30 to 7:50 am SSTs would benefit more than youth in schools with SSTs before 7:30 am in the areas of aggregated outcomes and overall sleep. With respect to what new SSTs are associated with the most optimal youth outcomes, youth in schools with SSTs that started between 8:30 and 8:59 am had better sleep quality compared with youth whose SSTs were between 8:00 and 8:29 am. Although youth age did not moderate the association between SST and developmental outcomes, and youth sex, and race or ethnicity were not sufficiently powered to detect effects, several school characteristics were significant. In particular, later SSTs were associated with lower levels of sleepiness for high school (compared with middle school) students. In addition, students in private schools with later SSTs had better sleep and later wake times compared with students attending public schools. Finally, later SSTs in North America were associated with earlier bedtimes than youth in Europe.

The current study has direct educational and pediatric policy implications. First, although the analysis suggests that later SSTs were associated with better aggregated outcomes and overall sleep for youth whose initial SST was 7:30 to 7:59 am (compared with initial SSTs before 7:30 am), most of the schools included in this analysis that started before 7:30 am implemented new SSTs between 8:00 and 8:29 am (n = 5) and fewer implemented new SSTs in the more optimal window of 8:30 to 8:59 am (n = 2). In contrast, schools that had initial SSTs between 7:30 am and 7:59 am were equally likely to implement new SSTs between 8:00 and 8:29 am (n = 6) and 8:30 and 8:59 am (n = 7). At the same time, the current analysis suggests that new SSTs between 8:30 and 8:59 am were associated with better sleep quality compared with youth with new SSTs between 8:00 and 8:29 am. Taken together, the data suggest youth with initial SSTs between 7:30 and 7:59 am would benefit the most from a delayed SST, especially if that new SST is between 8:30 and 8:59 am. Second, students in high schools (compared with middle schools) seemed to benefit from later SSTs; however, schools exist in systems embedded within communities, and changes in 1 component of the system often necessitate changes in another. For example, although the current analyses suggest that high school youth benefit the most from a SST delay, in reality such delays have reverberating consequences for middle and elementary schools in terms of busing, staffing, and scheduling (eg, many districts have elementary schools starting earlier to allow high schools to start later). Third, youth in private schools (compared with public schools) benefitted more from later SSTs, despite the lack of power to detect differences in associations with SST by school SES. The designation of private (versus public) schools serves as a proximal marker of SES, and SES has been associated with sleep disparities.40  The current analyses suggest that SSTs may be leverageable policy that has the potential to drive (or remedy) socially determined disparities in sleep.41  Finally, the observation that later SSTs in North American (versus European) schools were associated with earlier bedtimes suggests possible geographic or cultural differences in youth sleep hygiene and provides a counter-argument for those who claim that delaying SSTs will result in delayed bedtimes.

This study is not without limitations, many of which relate to insufficient data collection or reporting to conduct well-powered analyses. For example, developmental outcomes that would benefit from additional investigation include: positive socioemotional outcomes, behavioral health, physical health, and cognitive outcomes that were not sufficiently powered in this analysis. In addition, limited reporting on youth characteristics such as sex and race or ethnicity precluded investigation of potential differences in SSTs by sociodemographic samples in diverse pediatric populations. Among the 28 studies that were excluded because of missing information, 11 studies observed nonsignificant or mixed findings that were, to some extent, inconsistent with the expected effects of SST. Focusing on the strength of the evidence, all of the included effect sizes were in the “moderate” range with no effect sizes in the “low” or “high” categories. Similarly, with respect to outcomes, the strength of the evidence was moderate for all outcomes except for sleep hygiene, which was coded as high. As such, it is possible that the current analyses overestimate associations between SSTs and pediatric outcomes. Finally, 1 caveat about this area of research is that it is not possible to estimate the Hawthorne effect42  (ie, how knowledge of treatment effects is related to observed outcomes), because youth, parents, educational personnel, and communities are not blind to the “treatment”. Measurement over longer periods would help to judge the degree to which the Hawthorne effect is operative.

The hypotheses for these analyses were not preregistered, a protocol was not prepared, and data and analytic code are available by request from the first author and only include studies published through 2019.

Building off of the strengths and limitations of the current analysis, as communities and school districts consider delaying school start times, it is important to collect and report youth and school characteristic data that support the investigation of effect sizes for meta-analytic purposes. In particular, future SST studies should collect, and report detailed youth characteristics such as age, sex, and race or ethnicity that were under investigation in this study as well as additional sociodemographic variables such as nativity status, language preference, and school generation status (eg, first-generation or continuing-generation). With respect to school characteristics, future SST research should include consideration of additional variables such as school size, school SES, geographic location, and school racial or ethnic composition. Inclusion of such variables will considerably advance the scope of research on the link between SSTs and pediatric outcomes to be applicable to a diverse range of young people.

Dr Yip conceptualized and designed the study, drafted the introduction and conclusion, coded the data, interpreted the data, and reviewed and revised the manuscript; Dr Wang coded the data, conducted the analyses, drafted the results, prepared the graphs, figures and tables, and reviewed and revised the manuscript; Dr Xie helped prepare the literature search, coded articles, prepare the figures, and reviewed and revised the manuscript; Ms Ip helped prepare and describe the literature search, coded articles, prepared the manuscript references, and reviewed and revised the manuscript; Ms Fowle coded articles and reviewed the manuscript; Dr Buckhalt interpreted the data, and drafted, reviewed and revised the manuscript; and all authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.

FUNDING: Drs Yip, Wang, and Xie were supported by grants awarded to Dr Yip: NIH  R21MD011388  and NSF BCS - 1354134.

CONFLICT OF INTEREST DISCLOSURES: The authors have indicated they have no conflicts of interest relevant to this article to disclose.

SST

school start time

1
Borbély
AA
,
Achermann
P
.
Sleep homeostasis and models of sleep regulation
.
J Biol Rhythms
.
1999
;
14
(
6
):
557
568
2
Brand
S
,
Kirov
R
.
Sleep and its importance in adolescence and in common adolescent somatic and psychiatric conditions
.
Int J Gen Med
.
2011
;
4
:
425
442
3
Tarokh
L
,
Saletin
JM
,
Carskadon
MA
.
Sleep in adolescence: physiology, cognition and mental health
.
Neurosci Biobehav Rev
.
2016
;
70
:
182
188
4
Moore
M
,
Kirchner
HL
,
Drotar
D
, %
Johnson
N
,
Rosen
C
,
Redline
S
.
Correlates of adolescent sleep time and variability in sleep time: the role of individual and health related characteristics
.
Sleep Med
.
2011
;
12
(
3
):
239
245
5
Billings
ME
,
Cohen
RT
,
Baldwin
CM
, et al
.
Disparities in sleep health and potential intervention models: a focused review
.
Chest
.
2021
;
159
(
3
):
1232
1240
6
Owens
JA
,
Belon
K
,
Moss
P
.
Impact of delaying school start time on adolescent sleep, mood, and behavior
.
Arch Pediatr Adolesc Med
.
2010
;
164
(
7
):
608
614
7
Owens
JA
,
Devore
CD
,
Allison
M
, et al;
Adolescent Sleep Working Group
;
Committee on Adolescence
;
Council on School Health
.
School start times for adolescents
.
Pediatrics
.
2014
;
134
(
3
):
642
649
8
Flammer
A
,
Alsaker
FD
.
Adolescents in school
. In:
Jackson
S
,
Goossens
L
, eds.
Handbook of Adolescent Development
.
London
:
Psychology Press
;
2020
:
223
245
9
Campbell
IG
,
Grimm
KJ
,
de Bie
E
,
Feinberg
I
.
Sex, puberty, and the timing of sleep EEG measured adolescent brain maturation
.
Proc Natl Acad Sci USA
.
2012
;
109
(
15
):
5740
5743
10
Sadeh
A
,
Dahl
RE
,
Shahar
G
,
Rosenblat-Stein
S
.
Sleep and the transition to adolescence: a longitudinal study
.
Sleep
.
2009
;
32
(
12
):
1602
1609
11
Louzada
FM
,
da Silva
AGT
,
Peixoto
CAT
,
Menna-Barreto
L
.
The adolescence sleep phase delay: causes, consequences and possible interventions
.
Sleep Sci
.
2008
;
1
(
1
):
49
53
12
Boergers
J
,
Gable
CJ
,
Owens
JA
.
Later school start time is associated with improved sleep and daytime functioning in adolescents
.
J Dev Behav Pediatr
.
2014
;
35
(
1
):
11
17
13
Hirshkowitz
M
,
Whiton
K
,
Albert
SM
, et al
.
National Sleep Foundation’s updated sleep duration recommendations: final report
.
Sleep Health
.
2015
;
1
(
4
):
233
243
14
Owens
J
,
Drobnich
D
,
Baylor
A
,
Lewin
D
.
School start time change: an in-depth examination of school districts in the United States
.
Mind Brain Educ
.
2014
;
8
(
4
):
182
213
15
Owens
JA
,
Dearth-Wesley
T
,
Herman
AN
,
Oakes
JM
,
Whitaker
RC
.
A quasi-experimental study of the impact of school start time changes on adolescent sleep
.
Sleep Health
.
2017
;
3
(
6
):
437
443
16
Wahlstrom
KL
,
Owens
JA
.
School start time effects on adolescent learning and academic performance, emotional health and behaviour
.
Curr Opin Psychiatry
.
2017
;
30
(
6
):
485
490
17
Short
MA
,
Gradisar
M
,
Lack
LC
, et al
.
A cross-cultural comparison of sleep duration between US and Australian adolescents: the effect of school start time, parent-set bedtimes, and extracurricular load
.
Health Educ Behav
.
2013
;
40
(
3
):
323
330
18
Appleman
ER
,
Gilbert
KS
,
Au
R
.
School start time changes and sleep patterns in elementary school students
.
Sleep Health
.
2015
;
1
(
2
):
109
114
19
Paksarian
D
,
Rudolph
KE
,
He
J-P
, %
Merikangas
KR
.
School start time and adolescent sleep patterns: results from the U.S. National Comorbidity Survey--adolescent supplement
.
Am J Public Health
.
2015
;
105
(
7
):
1351
1357
20
McKeever
PM
.
Delayed high school start times of 8: 30 am or later and impact on graduation completion and attendance rates [thesis]
.
New Britain, CT
;
Central Connecticut State University
;
2016
21
Page
MJ
,
McKenzie
JE
,
Bossuyt
PM
, et al
.
The PRISMA 2020 statement: an updated guideline for reporting systematic reviews
.
BMJ
.
2021
;
372
(
71
):
n71
22
Edwards
F
.
Early to rise? The effect of daily start times on academic performance
.
Econ Educ Rev
.
2012
;
31
(
6
):
970
983
23
Dunster
GP
,
de la Iglesia
L
,
Ben-Hamo
M
, et al
.
Sleepmore in Seattle: later school start times are associated with more sleep and better performance in high school students
.
Sci Adv
.
2018
;
4
(
12
):
eaau6200
24
Dexter
D
,
Bijwadia
J
,
Schilling
D
, %
Applebaugh
G
.
Sleep, sleepiness and school start times: a preliminary study
.
WMJ
.
2003
;
102
(
1
):
44
46
25
Temkin
DA
,
Princiotta
D
,
Ryberg
R
,
Lewin
DS
.
Later start, longer sleep: implications of middle school start times
.
J Sch Health
.
2018
;
88
(
5
):
370
378
26
Borenstein
M
,
Hedges
L
,
Higgins
J
, %
Rothstein
H
.
Reporting the results of a meta-analysis
. In:
Introduction to Meta-Analysis
. 1st ed.
Chichester, UK
:
John  Wiley and Sons
;
2009
:
365
370
27
Lenhard
W
,
Lenhard
A
.
Calculation of effect sizes [in German]
.
Available at: https://www.psychometrica.de/effektstaerke.html. Accessed February 15, 2020
28
Aloe
AM
,
Becker
BJ
.
An Effect Size for Regression Predictors in Meta-Analysis
.
Journal of Educational and Behavioral Statistics
.
2012
;
37
(
2
):
278
297
29
Wells
GA
,
Shea
B
,
O’Connell
D
, et al
.
The Newcastle-Ottawa Scale (NOS) for assessing the quality of nonrandomised studies in meta-analyses
.
Oxford
;
2000
.
30
Berkman
ND
,
Lohr
KN
,
Ansari
MT
, et al
.
Grading the strength of a body of evidence when assessing health care interventions: an EPC update
.
Journal Clin Epidemiol
.
2015
;
68
(
11
):
1312
1324
31
Fisher
Z
,
Tipton
E
.
robumeta: An R-package for robust variance estimation in meta-analysis
.
arXiv:1503.02220. 2015
32
Tanner-Smith
EE
,
Tipton
E
.
Robust variance estimation with dependent effect sizes: practical considerations including a software tutorial in Stata and spss
.
Res Synth Methods
.
2014
;
5
(
1
):
13
30
33
Tanner-Smith
EE
,
Wilson
SJ
,
Lipsey
MW
.
The comparative effectiveness of outpatient treatment for adolescent substance abuse: a meta-analysis
.
J Subst Abuse Treat
.
2013
;
44
(
2
):
145
158
34
Tipton
E
.
Small sample adjustments for robust variance estimation with meta-regression
.
Psychol Methods
.
2015
;
20
(
3
):
375
393
35
Cohen
J
.
A power primer
.
Psychol Bull
.
1992
;
112
(
1
):
155
159
36
Sterne
JA
,
Sutton
AJ
,
Ioannidis
JP
, et al
.
Recommendations for examining and interpreting funnel plot asymmetry in meta-analyses of randomised controlled trials
.
BMJ
.
2011
;
343
:
d4002
37
Duval
S
,
Tweedie
R
.
A nonparametric “trim and fill” method of accounting for publication bias in meta-analysis
.
J Am Stat Assoc
.
2000
;
95
(
449
):
89
98
38
Zelinsky
NAM
,
Shadish
W
.
A demonstration of how to do a meta-analysis that combines single-case designs with between-groups experiments: the effects of choice making on challenging behaviors performed by people with disabilities
.
Dev Neurorehabil
.
2018
;
21
(
4
):
266
278
39
Viechtbauer
W
,
Viechtbauer
M
.
Package ‘metafor’
.
The Comprehensive R Archive Network
;
2017
.
40
Bagley
EJ
,
Kelly
RJ
,
Buckhalt
JA
,
El-Sheikh
M
.
What keeps low-SES children from sleeping well: the role of presleep worries and sleep environment
.
Sleep Med
.
2015
;
16
(
4
):
496
502
41
Johnson
DA
,
Jackson
CL
,
Williams
NJ
,
Alcántara
C
.
Are sleep patterns influenced by race/ethnicity - a marker of relative advantage or disadvantage? Evidence to date
.
Nat Sci Sleep
.
2019
;
11
:
79
95
42
Tan
LP
.
The effects of background music on quality of sleep in elementary school children
.
J Music Ther
.
2004
;
41
(
2
):
128
150

Supplementary data