Estimating the Effect of a Teacher Training Program on Advanced Placement ® Outcomes

This study employs a potential outcomes modeling approach to estimate the effect of Code.org’s Professional Learning Program on Advanced Placement (AP) Computer Science Principles test taking and qualifying score earned for a recent cohort of 167 schools compared to a matched group of comparison schools. Results indicate substantial and significant increases in both Computer Science AP test taking and qualifying score earning for all students. In addition, the significant effects were even greater for Computer Science AP test taking and qualifying score earned by female and minority students when impact ratios are analyzed separately. This study provides evidence of a teacher training program that is having a significant and important impact on preparing more students to succeed in computer science and improve the future of computer science education in this country.


Introduction
Despite the growing need for qualified workers in STEM fields, there remains a significant under-representation of females in STEM fields (Beede, et al., 2011) and specifically in Computer Science careers (Sax, et al., 2017).Similar gaps exist for minority students.Research has shown that targeted training of teachers to provide Computer Science courses can increase the number of minority students enrolled in advanced Computer Science courses (Goode, 2007).Goode argues that there is a critical need to provide professional development to support and encourage minority participation in Computer Science coursework.This study employs a potential outcomes modeling approach to estimate the causal effect of Code.org'sProfessional Learning Program.
Code.org, a nonprofit 501(c) (3), works across the education spectrum to expand access to computer science and increase participation by women and underrepresented minority populations in computer science coursework.Code.orgbelieves that every student in every school should have the opportunity to learn computer science, just like biology, chemistry or algebra.In addition to developing curricula for grades K-12, Code.orgprovides professional development for high school educators.The Code.org Professional Learning Program offers both in-person and online support for teachers before and during their first year teaching the Code.orgcurriculum.To date, several thousand teachers completed the program, with the majority ranking it as among the best professional development of their careers.
The Code.org Professional Learning Program is a multi-pronged approach to ensure the quality and sustainability of the program at scale.The program represents a coordination of three major Code.orgefforts --Regional Partners, Facilitator Development, and Professional Development Workshops --all built upon the foundations and principles of Code.orgcurricula which has been designed to meet learning objectives through engagement with equitable classroom practices.
Taken altogether, the Professional Learning Program can be summarized as a year of ongoing Professional Development Workshops for teachers with agendas and activities designed specifically for the Code.orgCS Principles Curriculum and teaching philosophies.Workshops are run by Code.orgProfessional Development Facilitators who receive training in a separate, year-long program devoted to PD leadership development specifically designed to support the Code.orgCS Principles Curriculum.Teachers are supported from the beginning of the program to the end by Code.orgRegional Partners who collaborate with facilitators to deliver high quality workshops.Code.orgRegional Partners are developed through a multi-year partnership with the aim of building local, sustainable hubs of high quality PD for computer science teachers.Teachers also have additional ongoing supports such as the Code.orgForum, an online professional learning community.

Goals
The primary goal of the Code.orgProfessional Learning Program is to support implementation of the Code.orgCS Principles Curriculum in schools such that it leads to more students, and a more diverse group of students, taking and earning qualifying scores on the AP Computer Science Principles Exam.Other student goals include generating positive attitudes, self-efficacy, sense of belonging in computer science classrooms, and positive expectation about computer science in their future.A residual outcome would be to increase the number and diversity of students who pursue computing-related opportunities after AP Computer Science Principles, such as taking more computer science classes or seeking employment that requires computer science skills.
The curriculum and associated professional development enable teachers with very little background knowledge in computer science to deliver the course via equitable teaching practices to engage all students.Other goals for teachers include positively affecting teachers' attitudes and self-efficacy toward teaching computer science, as well as their belief-systems about equity in computer science classrooms.The theory underlying these goals is that teachers who engage students with equitable teaching practices coupled with a curriculum rich with resources and activities that support and encourage enactment of those practices will lead to (1) better student learning overall (2) more equitable student engagement and learning.

Timeline & Implementation
The Code.org Professional Learning Program begins with teachers applying to the program through a Regional Partner starting with January of the year they enter the program.Regional Partners work with Code.org to approve admission to the program based on a number of criteria, the most influential being a stated commitment from the district, or teacher and school principal to offer and teach the course in the upcoming school year.It is important to note that even though the curriculum is designed to support implementation of the AP course, teachers are not required to offer it as an AP course for admission into the Professional Learning Program.In 2016-17 roughly half of the teachers in the program self-reported that they offered CS Principles as an AP course at their school.
The teacher training begins in earnest during the summer with a five-day in-person workshop in which teachers explore the Code.orgcurriculum and learning tools, practice and discuss classroom management and teaching strategies, and build a community of educators.Modeled after the "five requirements of transformative learning" outlined in Louckes-Horsley, Stiles, Mundry, Love, & Hewson (2010), a major focus of the professional development program is to practice new teaching strategies as part of workshop activities.In the workshop, teachers deliver lessons from the ISSN 2513-8359 curriculum to an audience of peers that highlight these teaching practices.Afterward teachers debrief the lesson, allowing them time to reflect with peers about how implementation should be tailored for their own classrooms.
Teachers also reflect on enacting equitable teaching practices in light of the historic inequities faced by underrepresented groups in computer science.The workshops devote time to developing strategies for computer science advocacy and student recruitment strategies with a goal of enrolling students in computer science classes that are representative of their school's population in terms of race, gender, and other demographic factors.
The program continues to support teachers throughout the academic school year though workshops hosted locally by Code.orgRegional Partners and run by trained Code.orgProfessional Development Facilitators.Each academic-year workshop combines further curriculum exploration and planning, and revising goals set during the summer (for example: recruiting and retaining a representative set of students, supporting student needs, assessing student learning, etc.).The workshops focus on elements of the curriculum that are essential for effectively teaching the course, such as exploring new computer science content, developing pedagogical strategies to keep the classroom environment equitable and engaging, and doing AP preparation.

Data Sources
Advanced Placement test data from a total of 167 treatment schools from the most recent Code.org cohort plus 167 non-treatment schools were analyzed for this study.Data for treatment and matched comparison schools was provided by the College Board by matching the Code.orgtreatment schools on the state in which the school is located, total school enrollment, percent of students receiving free or reduced priced lunch, and percentage of minority students at each school.The original list of program schools included 383 schools, of which 167 were matched (43.6%).The lower than anticipated match percentage resulting from stringent matching criteria.Matching criteria required that the comparison school be located in the same state as the treatment school and be within +/-20% of the total student enrollment of the treatment school.Further, each comparison school must also be within one standard error of the mean of the target treatment school in terms of percentage of minority students and percent of students qualifying for free or reduced priced lunch.Thus, all four criteria had to be met to identify an acceptable comparison school.

Research Design
This study employs a potential outcomes modeling approach (Rubin, 2005) to estimate the causal effect of program participation on first year improvements in AP test taking and AP qualifying score earning in computer science AP subjects.The potential outcomes model, also called the Rubin Causal Model (RCM) (Holland, 1986), allows for the ISSN 2513-8359 formal identification for causal inference.This approach estimates the average difference between observed outcomes and potential outcomes (counterfactuals) for each unit in the analysis.This is known as the causal estimand.Potential outcomes modeling has been widely used in a number of social science fields, including education, politics, and public health to estimate causal effects of programs or policies (Glass, Goodman, Hernan, & Samet, 2013;Keele, 2015).In fact, Keele (2015) states, "The RCM is the dominant model of causality in statistics at the moment" (p.315), while acknowledging there are many other approaches to estimating causality in a statistical framework (e.g., Dawid, 2000;Pearl, 2009).
The goal of propensity score matching within the RCM is to construct a sample of comparison schools that are similar to the treatment schools (Rosenbaum & Rubin, 1985) in terms of their likelihood of selection into treatment.This model has gained popularity in recent years and is frequently used to make causal estimates from observational studies.Rubin (2005) has argued, "the potential outcomes formulation of causal effects, whether in randomized experiments or in observational studies, has achieved widespread acceptance" (p.329).A propensity score is a scalar value that summarizes the likelihood for a unit to receive a treatment, often based on a large set of variables.In this study, we estimate the propensity score and causal estimands using a weighting approach applied in the Toolkit for Weighting and Analysis of Nonequivalent Groups ("twang") package written in the R programming language (Ridgeway, McCaffrey, Morral, Burgette, & Griffin, 2015).
Previous literature suggests that propensity score models should include all confounding variables, that is, variables that are related to the treatment assignment as well as to the outcome (Rubin, 2007;Rubin & Thomas, 1996;West & Thoemmes, 2010), or all variables that are related to the outcome (Rosenbaum, 2002).Stuart ( 2010) also argues that one should be generous in including predictors in the propensity score model, because the cost of omitting a variable that might predict the outcome is greater than the cost of including a variable that in fact did not predict the outcome (increase in bias versus slight increase in standard errors of propensity scores).In this study, school demographic data such as total enrollment, percent minority enrollment, and percent of enrollment qualifying for free or reduced priced lunch provide ample information that may predict the outcomes of this study (i.e., number of students taking Computer Science AP tests and student performance on Computer Science AP tests).Thus, these three variables will be used to balance the treatment and control conditions.

Data Analytic Approach
The twang approach to propensity score estimation uses generalized boosted models (GBMs), a multivariate nonparametric regression technique, introduced in McCaffrey, Ridgeway, and Morral (2004).This approach is argued to allow for flexible, nonlinear relationships as well as a large number of variables, and shown to perform well under certain settings (see, e.g., Imai & Ratkovic, 2014).In the GBM approach, instead of matching, a weighting approach is used to estimate the treatment effect.One of the advantages of propensity score approaches is that once nonexperimental data are used to "design an observational study" the study achieves balance between treatment and control groups as if it were based on an experimental study (Rubin, 2007).Then, the outcome analysis can proceed in the same way as the analysis that would have been done in an experimental study.
However, note that the effects we seek to obtain can either be the average effect of the treatment on the treated (ATT) or the average treatment effect (ATE).Generally, when we use matching strategies based on the estimated propensity scores, we estimate ATT instead of ATE, because we intentionally select and match control group schools that are like treatment schools.However, when we use weighting strategies (as is done with the twang package), depending on weights that are used, either ATT or ATE can be obtained.For this study, we estimated the effects of the program for both ATT and ATE in order to get a sense of not only what the effect of the program was the participating schools, but also what the effect would have been had the program been provided to the control schools as well.

Results
The first step in reviewing the results is to check on the extent to which the propensity score weighting approach results in balance across the treatment and control groups in terms of the balancing variables.As mentioned earlier, several variables were used to balance the treatment and control samples.Along with state in which the schools are located, these included: total school enrollment, percentage of students receiving free or reduced priced lunch, and percentage of total student enrollment that are minority students.These variables were chosen as they are predictive of the outcomes of interest in this study.For example, a regression model using total school enrollment, percentage ISSN 2513-8359 of total enrollment that are minority, and percentage of total enrollment eligible for free or reduced priced lunch significantly predicted total Computer Science (Computer Science A and Computer Science Principles) tests taken at the school; F(3, 333) = 25.12,p<.001, R 2 = .19.

Figure 3. Balance plot for ATT analyses
Treatment and control groups were fairly balanced prior to weighting on total enrollment (M=1354.94 for treatment; M=1221.48 for controls); t(332)=-1.53,p=.128.These minor differences were virtually eliminated through weighting (see Figure 3 for balance plot for ATT analyses).No substantial differences between treatment and control schools existed in percentage of minorities (M=47.53%for treatment; M=47.42% for controls), t(332)=-0.03,p=.977 or for percent of students qualifying for free or reduced price lunch (M=49.56%for treatment; M=49.82% for controls); t(332)=0.09,p=.929.After propensity score weighting (ATT estimation), the treatment and control schools were comparable in terms of all three balancing variables.Specifically, the average total enrollment for the weighted samples was 1354.94 and 1315.26 for treatment and control respectively.Likewise, the average percent minority enrollment was balanced at 47.5 for the treatment schools and 47.0 for the control schools; and the average percent qualifying for free or reduced priced lunch was 49.6 and 49.8 for treatment and control schools, respectively.Perfect balance is not to be expected.Austin cautions, "as with randomization, one should not expect that perfect balance will be achieved for all measured baseline variables between treated and untreated subjects in the matched sample" (Austin, 2008(Austin, , p. 2040)).Treatment and control samples were equally well balanced using the ATE propensity score estimation procedure (see Figure 4).Specifically, for the ATE estimation, the average total enrollment for the weighted samples was 1298.89 and 1263.27 for treatment and control respectively.Likewise, the average percent minority enrollment was balanced at 47.1 for the treatment schools and 47.3 for the control schools; and the average percent qualifying for free or reduced priced lunch was 49.4 and 49.9 for treatment and control schools, respectively.Given the adequately balanced samples with the ATE procedure, we will present the causal estimates from both the ATT and ATE procedures in this report.The results of the logistic regressions for the average treatment on the treated (ATT) effect are presented in Table 1 above, which shows the impact of the program on average school Computer Science, Computer Science Principles, and Computer Science a Advanced Placement test taking.Table 2 shows the impact of the Code.orgprogram on average number of earned qualifying scores of 3 or better on these same AP tests.Similar analyses were conducted for average treatment effects (ATE), the results of which are provided in Tables 3 and 4.
Figure Although the standardized effect size estimates were smaller when viewing minority student test taking effects relative to effects for all students or for female students only, they are nonetheless highly significant and substantial.In fact, Figure 6 shows the impact ratios for Computer Science Principles test taking by student group.This shows that the relative impact is greatest for minority students.Whereas the program effect, in essence, increases test participation for all students by a factor of more than 5, the effect is almost twice that for Black students (10.13).That is to say, the program increased the number of Black students taking Computer Science Principles more than ten-fold on average across the treatment schools.In addition, the program increased the number of Hispanic students taking Computer Science Principles tests nearly six-fold.As was seen with the ATT estimates, the average treatment effect estimates produced a much greater impact ratio for Black student Computer Science Principles test participation than for the overall collection of students or for Female or Hispanic students.Figure 10 shows that for the full student population, the treatment increased Computer Science Principles test participation more than 500% for all students and for Hispanic students in particular, but the increase for Female students exceeded 600% and for Black students test participation increased more than 800% greater than would be observed without program participation.comparable pattern of findings was observed for Computer Science Principles qualifying scores using the average treatment effect estimates as was found with test participation using the same ATE estimand (see Table 4).Program participation would increase the average number of qualifying scores by more than 10 per school in the overall sample; t(332) = 6.05, p < .001,by an average of 2.69 for female students; t(332) = 5.46, p < .001,by an average of .39 for Black students; t(332) = 3.83, p < .001,and by an of more than 2 qualifying scores for Hispanic students; t(332) = 4.52, p < .001(see Figure 11).Each of these projected improvements are highly statistically significant.As with the ATT estimates, no significant improvement in Computer Science A qualifying scores is anticipated by program participation.
The impact ratios using the ATE approach, while still substantial, are lower than for the ATT estimation procedure (Figure 12).For all students, the number of qualifying scores is projected to be 4.91 times larger with the ATE approach as compared with 5.32 times larger with the ATT approach.Likewise, the ratio for females is 5.20 for ATE versus 5.32 for ATT.For minority students, the ratios are considerably lower with the average treatment effect approach compared to the average treatment on the treated approach (5.88 vs. 6.71 for Black students; 4.35 vs. 5.17 for Hispanic students).Notwithstanding these discrepancies in estimation procedures, the program effects on the number of Computer Science Principles qualifying scores remain large and significant.

Discussion
This study provides evidence that the Code.orgteacher preparation program increases the number of AP tests taken and the number of AP qualifying scores earned by the students of the participating teachers.This is consistent with prior research that has shown that teacher professional development can, in certain contexts, positively impact student outcomes generally (Yoon, K. S., Duncan, T., Lee, S. Scarloss, B., & Shapley, K., 2007) and in computer science specifically (Mouza, C., Marzocchi, A, Pan, Y., & Pollock, L., 2016).In and of itself, these results are important, but these increases may lead to additional advantages for these students.Research shows that students who take AP courses have a greater likelihood of attending college (Mattern, Marini, & Shaw, 2013).Mattern, et.al state, "… the odds of enrolling in a four-year institution increased by 171% for students who took one AP Exam compared with students who took no AP exams.The increase in odds was even higher for students who took more than one AP exam" (Mattern, Marini, & Shaw, 2013, p. 5).Students participating in AP classes also earn better grades in college (Shaw, Marini, & Mattern, 2013), and have a greater likelihood of persisting in and graduating from college (Dougherty, Mellor, & Jian, 2006;Hargrove, Godin, & Dodd, 2008).In addition, students who earn qualifying scores on AP tests outperform matched Non-AP students on many college outcome measures (Murphy & Dodd, 2009).Future research should explore these longer term potential impacts of this training program.
This work is significant for many reasons.First, it demonstrates the use of propensity score potential outcomes modeling to observational data to yield meaningful and significant causal estimates of a popular professional development program's effectiveness in a context where randomized assignment to treatment condition is either infeasible or impractical.Secondly, this study provides evidence that Code.org'sProfessional Development Program for CS Principles is having significant and important impacts on preparing more students to succeed in Computer Science careers and improving the future of Computer Science education in this country.More students, notably female and minority students, are engaging in, and succeeding in, Computer Science Principles as a result of implementing this program in schools across the country.From an impact ratio perspective, the program is having a greater impact for these groups of students.

Figure 1 .
Figure 1.Code.org Professional Learning Program Logic Model

Figure 4 .
Figure 4. Balance plot for ATE analyses

Figure 10 .
Figure 10.Impact Ratios for Student Subgroups on Science Principles Test Participation.
As indicated in Table1, the average number of AP test taking for Computer Science Principles was dramatically higher for all students in the treatment schools following program implementation.On average, participation in the Code.orgprogram generated an average increase of almost 18 additional AP Computer Science tests taken in the 2016-2017 school year; t(332) = 6.72, p < .001.Moreover, these effects persist when looking at student subgroups.For female students, the increase in Computer Science test taking as a result of program participation is an average of 5.28 tests per school; t(332) = 6.07, p < .001.For Black students the increase is an average of 1.53 tests; t(332) = 4.08, p < .001and for Hispanics it is more than 5 additional tests; t(332) = 4.95, p < .001.All of the estimates are highly significant statistically, with standardized effect sizes at or above .40(Cohen's d), indicating a moderate to large causal effect of the program on student AP test taking in Computer Science courses.Upon closer inspection, it is clear that virtually all of the effect on increased test participation in Computer Science courses is a function of increasing participation in Computer Science Principles and not in increased participation in Computer Science A, which is consistent with the Code.orgmodel.In fact, there was no discernable impact of program participation on Computer Science A taking for all students; t(332) = 1.24, p=.215, female students; t(332) = 0.72, p=.470, Black students; t(332) = 1.07, p=.284, or Hispanic students; t(332) = 0.43, p=.668.In contrast, the effect of the program on Computer Science Principles (CSP) was highly significant for all students and every student subgroup analyzed, thus the effect was not a result of generalized increases in Computer Science participation, but rather a function of targeted Computer Science Principles participation.Moreover, the Cohen's d effect sizes ranged from moderate (d=.46) to large (d=.88).
5. Effect on Computer Science Principles AP Test Participation Figure 6.Impact Ratios for Student Subgroups on Computer Science Principles Test Participation

Table 2 .
ATT Estimates for Qualifying Scores earned by CourseSimilarly impressive results were found for program effects on the number of qualifying scores earned in program schools.In addition to increasing the number of students taking Computer Science AP tests, the Code.orgprogramincreased the number of qualifying scores earned by students in Computer Science AP courses.Table2demonstrates that program schools reported an average of 11.77 more qualifying scores in all Computer Science courses (t(332) = 5.92, p < .001)andanaverage of 10.41 more qualifying scores of Computer Science Principles for all students (t(332) = 6.73, p < .001),both of which were highly statistically significant.Further, as with test taking effects, the impact on qualifying scores was persistent for each student subgroup, with moderate to large effect sizes demonstrated for Computer Science Principles and no discernable effect on the number of qualifying scores earned in Computer Science A.Figure 7. Effect of program on Computer Science Principles qualifying scores earnedFigure7shows the impact of participation in the Code.orgprogram on qualifying scores earned in Computer Science Principles in the treatment schools by student subgroup relative to what would have been expected had the program not been implemented in the treatment schools.On average, the program resulted in 2.68 more qualifying scores for female students; t(332) = 5.91, p < .001,0.40 more qualifying scores for Black students; t(332) = 4.14, p < .001,and2.25 more qualifying scores per school for Hispanic students; t(332) = 5.37, p < .001.Although these values are smaller compared to the effect for all students, are nonetheless highly significant substantial effects.The effect sizes for these groups are all in the moderate range (d=.45 to d=.65).Figure 8. Impact Ratios for Student Subgroups on Computer Science Principles Qualifying ScoresFurther, the impact ratios for at least one minority subgroup are greater than for non-minority students.As Figure8shows, whereas the program results in a more than five-fold increase in the number of qualifying scores in Computer Science Principles for all students, Black students saw an increase of more than 6.7 times what would have happened without participation in the Code.orgprogram.Figure 9. Average Treatment Effect on Computer Science Principles AP Test ParticipationThese average treatment on the treated (ATT) estimates show program participation substantially increased number of Advanced Placement Computer Science Principles tests taken and qualifying scores earned for students in the treatment schools.In addition to these estimates, we estimated the average treatment effect (ATE), which is the expected average effect of the program if it had been presented to the control schools as well.The results of these analyses regarding test participation are presented in Table3.Consistent with program expectations, program implementation in the full sample would significantly improve Computer Science Principles participation for all students and all student subgroups, but would not impact test participation in Computer Science A any group.On average, program implementation in all schools in the sample would have resulted in an additional 15.76 Computer Figure 12.Impact Ratios for Student Subgroups on Computer Science Principles Qualifying Scores