## Data Science Course Framework

Through the Launch Years initiative, the Charles A. Dana Center at The University of Texas at Austin has collaborated with stakeholders and content experts from K–12 and higher education to develop a framework for a high school data science course. In addition to Design Principles and Content Outcomes, the framework describes a course that supports students’ social, emotional, and academic development (SEAD)—an often-overlooked aspect of learning that research indicates is crucial to students’ ability to thrive in school, career, and life.

Design Principles

**Active Learning.**

The course provides regular opportunities for students to actively engage in data explorations using a variety of different instructional strategies (e.g., hands-on and technology-based activities, projects, small-group collaborative work, facilitated student discourse, interactive lectures).

**Growth Mindset.**

The course supports students in developing the tenacity, persistence, and perseverance necessary for learning data science, for using mathematics and statistics to tackle authentic problems, and for being successful in post-high school endeavors.

**Problem Solving.**

The course provides opportunities for students to engage in the entire statistical problem-solving process.

**Authenticity.**

The course presents data explorations that allow students to address relevant questions that arise in their communities.

**Context and Interdisciplinary Connections.**

The course presents data science in context and connects data science to various disciplines and everyday experiences.

**Communication.**

The course develops students’ ability to communicate insights from their data explorations and findings in varied ways, including with words, data visualizations, and numbers.

**Technology.**

The course introduces students to current technologies appropriate for data exploration and visualization, and prepares them to learn and use new ones.

**Assessment.**

The course uses project-based assessments both as formative assessments and to evaluate student progress.

Social, Emotional, and Academic Development

**Use collaboration and communication.**

Students should recognize situations for which collaboration is an effective strategy, identify the features of collaborative work groups, and develop strategies for overcoming group work challenges. They should work collaboratively with students from various cultural and ethnic backgrounds while examining alternate points of view, accepting constructive criticism and revising personal views when evidence warrants.

**Utilize resources to overcome obstacles.**

Students should engage in productive academic behaviors, including recognizing when help is needed with a task, and developing and applying a variety of strategies and sources for seeking help; monitoring and adjusting attitudes, emotions, and thoughts when facing challenging tasks or academic setbacks; and seeking and using feedback to improve performance.

**Recognize and improve individual behaviors.**

Students should maintain motivation and persistence through a variety of strategies, including identifying and adjusting habits and beliefs that have interfered with success; applying metacognitive awareness to plan, monitor, evaluate, and reflect on their learning; and setting and monitoring goals.

Content Outcomes

**Understand the role of data in the world.**

Students need to demonstrate an understanding of data and the many different types of data that exist, including nontraditional data types such as photos, text, and sounds. They should understand the ways in which data are generated and collected, recognizing that primary data are collected directly by researchers from main sources while secondary data have already been collected and are readily available for use. It is crucial that students recognize the extent of their digital footprint and that it is based on the data created by their digital lives, while also considering issues surrounding data privacy and the ethical use of data. Examples of proficiency include the ability to distinguish between different types of data and between primary and secondary data; represent, summarize, and interpret data; recognize trends in data; and reflect on implications of data privacy and data use policies.

**Ask data-based questions.**

In order to understand the vast information that can come from data, students should recognize the types of questions that can be answered through data exploration and be able to formulate their own questions. Students should determine what data might be collected to answer data-based questions and consider what questions can be answered from readily available data. Performing exploratory data analysis, drawing preliminary conclusions, and using what they have learned are key to formulating new questions for further exploration. Examples of proficiency include the ability to ask new questions based on what is learned from a data exploration and to identify the relevant data to address a data-based question of interest.

**Collect and manage data.**

Sources of data are vast, so students should consider the different ways in which data might be generated, including through sampling (observational studies), experimentation (statistical experiments), and simulation, explaining the role of random selection in sample selection and the role of random assignment in statistical experiments. In order to appreciate the challenges of data collection, students should collect and organize data about their own lives and communities. They should identify bias and sources of bias in data, and describe how bias in data impacts people and society. While considering data collection, students should discuss the ethics and consequences of collecting and using data, including consideration of the bias that may be present in data collection or selection processes. Students should also acquire data in different formats and work with large, real-world, publicly available data sets. Combining two or more data sources might be necessary to investigate a question of interest. Students should become familiar with different types of data structures such as arrays, stacks, and queues, understanding that data (both traditional and nontraditional) are not always collected/shared/received in a form that is ready for analysis and often require the use of different digital tools to clean and prepare data for analysis (e.g., merge data sets, deal with incomplete data, normalize data, create new variables). Students should explore the basics of programming as needed, and be comfortable editing and documenting code, or finding the appropriate tools to transform the data to be useful in their own data analysis. Examples of proficiency include the ability to distinguish between observational studies and statistical experiments; explain why random selection is important in observational studies and why random assignment is important in statistical experiments; use data scraping to obtain data from an online source; use transformations to create new variables for analysis; and clean data sets to address extraneous, incorrect, or missing data values in preparation for analysis.

**Explore data to make sense of and represent the story that the data are telling.**

Interpreting and critiquing data visualizations are fundamental skills for building data acumen. Students should use technology, and programming where appropriate, to create a variety of data visualizations to explore data and to share insights based on what the data reveal. Students should be able to look for patterns, describe data distributions, and compare distributions while examining graphical displays. They should also be able to create graphical displays, data visualizations, and tables to explore relationships. Technology should be used to explore correlation between two numerical variables visually and numerically, while tables and conditional relative frequencies should be used to explore associations between categorical variables. Interpreting graphical displays, data visualizations, and tables using more than two variables allows students the opportunity to demonstrate multivariable thinking. Examples of proficiency include the ability to draw insights from a data visualization and to communicate those insights to others; use and interpret graphical displays and tables to describe relationships between two variables; and create and interpret data visualizations that demonstrate multivariable thinking.

**Analyze data to create data-based arguments and to reach data-based conclusions.**

Students need to understand the importance of communicating with data and making data-based arguments. In order to communicate with data effectively, students should use appropriate summary measures to describe data distributions and to compare data distributions. They should understand that variability is present in data and take sampling variability into account when formulating data-based arguments or making data-based decisions. Students should combine their knowledge of probability, technology, and programming where appropriate, to construct simulations to estimate probabilities and to assess statistical significance. Since conclusions based on sample data are subject to misinterpretation, students should acknowledge potential errors and their possible consequences in the data collection process. Examples of proficiency include the ability to describe data distributions, including shape, center, variability and any unusual features for numerical data distributions; compare two or more data distributions using graphical and numerical summaries; design and implement a simulation to assess statistical significance and interpret the results of the simulation; contrast statistical significance and practical significance in a given context; describe potential errors and possible consequences of a data-based argument; and interpret a margin of error in context.

**Understand limitations of data sources and data-based conclusions.**

When working with publicly available data sets, students should be able to determine if conclusions are appropriate based on the study design and the way in which the data were collected. They should evaluate and critique data-based claims and arguments, understanding that data and data-based arguments may have inherent sources of bias and they should seek to identify them. Examples of proficiency include the ability to evaluate whether a conclusion from a data analysis or exploration is appropriate given the data source and data collection method; evaluate a data-based argument and identify potential sources of bias; and critique a data-based argument.

**Use data to make predictions.**

A key component in using data is the ability to make predictions accurately. Students use data to build models (including linear models, nonlinear models, and models with multiple predictor variables) to describe relationships between variables. They should evaluate the appropriateness and usefulness of prediction models, while also using models to make their own predictions. Examples of proficiency include the ability to use graphical displays to make informal predictions; fit linear and nonlinear models and evaluate the usefulness of models; and use fitted models to make predictions.

**Use data to inform decision making.**

The use of probability, including conditional probability, to make decisions and to quantify uncertainty about real-world situations is necessary. Students need to be able to understand and interpret results from classification and decision tree algorithms. Examples of proficiency include the ability to calculate and interpret probabilities, including conditional probabilities; estimate probabilities empirically and by using simulation; use relevant probabilities to inform a decision; and use a given classification or decision tree to reach a decision, describing the steps in the process of reaching that decision.