Recommendation 6
Hunter Glanz (chair), Michael Sullivan, Amelia McNamara, Patti Frazer Lock
Incorporate software/apps to explore concepts and work with data.
Introductory Statistics
All students in an Introductory Statistics course should be exposed to a statistical software package or statistical apps, beyond graphing calculators. We do not prescribe any specific technology, since course goals, student audiences, and institutional constraints will differ. Using technology to create graphs, provide summary statistics, and handle much of the inferential analysis allows instructors to spend significantly more time on interpretation and on helping students garner insight from data. In addition, technology can be a powerful way to develop student understanding of the key ideas of statistics and data science. Technology is integral to data science, and essential to doing any type of data analysis.
Choose the technology based on the learning outcomes of the course. Technology has the benefit of creating a more equitable course. Algebraic and computational skills creates an unnecessary barrier to successfully complete the Introductory Stats or Data Science course.
There are many things to consider when determining what software to use in your course, and we discuss some of these considerations here. We provide a list of free online statistical applets here.
Assessment can be challenging when it comes to the use of technology. We address this issue here, where we provide a wide variety of help with assessment options.
Software Selection Link
When Fisher published Statistical Methods for Research Workers, he understood the importance of facilitating calculations to make statistical methods more accessible. Fisher also understood that statistical computations would become easier with the advent of technology. The purpose of a statistical course is not to get bogged down in calculation, but to use models to develop an understanding of the world through data. Statistical software allows for students to analyze data frames with many observations and variables. This presents students with a more realistic scenario than simply working only with data sets that have a single variable with a few observations.
Technology via statistical spreadsheets and applets also allows students to develop conceptual understanding of statistical ideas quickly via data visualization, simulation, bootstrapping, and randomization. These tools lessen the concern of doing calculation and increase the benefit of seeing the power of statistics.
Data Visualization discuss how software should be utilized to easily generate graphs beyond just simple boxplots. But allow for engagement and multiple variables. For example, easily identify outliers in a data frame from the graph. Provide graphs with multiple variables.
Simulation
Statistical software should be used to simulate sampling from data frames so that statistical concepts may be developed. For example, obtain 1000 simple random samples from a data frame for a particular quantitative variable for various sample sizes. Compute the sample mean for each sample and describe the shape, center, spread of the sample mean to illustrate sampling distributions.
Also discuss simulation for introducing hypothesis test for a proportion.
Applets here??
Bootstrapping
Randomization Discuss how statistical software allows for an introduction to
Statistical Applets Link
Assessment Link: Assessment is broadly categorized as either formative or summative. Here, we provide suggestions for how technology may be incorporated for each of these assessment methods.
Formative Assessment
Consider the use of personal response systems (PRS). PRS may be used to develop an active learning environment. The advantage of using PRS is that they increase engagement, encourage peer-to-peer instruction, and allow the instructor to target lectures as a result of the immediate feedback. A leading expert of using peer-to-peer instruction in classrooms is Eric Mazur of Harvard University. Here is an article on using peer-to-peer instruction. Even though Eric is a professor of physics, his methods apply to Introductory Statistics course as well. Here are some YouTube videos of Dr. Mazur’s thoughts:
Allow for more in-depth assignments that utilize large data frames. For example, provide a large data frame and ask students to determine the best explanatory variable for a given response variable based on the correlation coefficient. Build a linear model utilizing that explanatory variable.
Consider the use of course management systems that algorithmically generate homework problems for students. These systems typically provide instant feedback to students along with learning aids that help students learn how to solve statistical problems.
Summative Assessment This assessment method can be a challenge for instructors who do not have access to labs with computers. However, there are techniques that may be utilized in this environment even without the use of hand-held technology.
Focus should be more on the interpretation of statistical results as opposed to the producing statistical results. Use screen captures of statistical output to interpret results. For example, provide output of a least-squares regression for a statistical problem and ask a variety of questions regarding the output such as (i) state the correlation coefficient (ii) state the least-squares regression equation, (iii) make predictions (iv) ask students to determine whether a particular observation is above or below average for a particular value of the explanatory variable, and so on. The instructor could also provide graphs such as residual plots to assess the linear model (outliers, appropriateness of the linear model, and so on).
Provide multiple screen captures. Require the student to identify the correct output before interpreting results. For example, provide a scenario that requires a hypothesis test on a proportion. Provide output that performs a hypothesis test for \(H_0: p < p_0\), \(H_0: p \neq p_0\), and \(H_0: p > p_0\). Perhaps even provide output for \(H_0: p < \hat{p}\) , and so on. Ask students to verify model requirements, identify the p-value, state a conclusion using the correct output.
Introductory Data Science
All students in an Introductory Data Science course should be exposed to a statistical software package, coding language, or statistical apps, beyond graphing calculators. We do not prescribe any specific technology, since course goals, student audiences, and institutional constraints will differ. Using technology to create graphs, provide summary statistics, and handle much of the modeling allows instructors to spend significantly more time on interpretation and on helping students garner insight from data. In addition, technology can be a powerful way to develop student understanding of the key ideas of data science.
There are many things to consider when determining what software to use in your course, and we discuss some of these considerations here. We provide a list of free online statistical applets here.
Assessment can be challenging when it comes to the use of technology. We address this issue here, where we provide a wide variety of help with assessment options.
Software Selection Link
When Fisher published Statistical Methods for Research Workers, he understood the importance of facilitating calculations to make statistical methods more accessible. Fisher also understood that statistical computations would become easier with the advent of technology. The purpose of a data science course is not to get bogged down in calculation, but to use models to develop an understanding of the world through data. Statistical software allows for students to analyze data frames with many observations and variables. This presents students with a more realistic scenario than simply working only with data sets that have a single variable with a few observations.
Technology via coding, spreadsheet software, and applets also allows students to develop conceptual understanding of data science ideas quickly via data wrangling, summarization, visualization, and modeling. These tools lessen the concern of doing calculation and increase the benefit of seeing the power of data science.
Data Wrangling discuss how software should be utilized to clean and prepare data for summarization, visualization, and analysis.
Data Summarization discuss how software should be utilized to summarize data in univariate and multivariate ways to uncover relationships and identify possible needs for more wrangling.
Data Visualization discuss how software should be utilized to easily generate graphs beyond just simple boxplots. But allow for engagement and multiple variables. For example, easily identify outliers in a data frame from the graph. Also, how this informs possible needs for more wrangling. Provide graphs with multiple variables.
Data Modeling discuss how software should be utilized to model relationships between a response variable and one or more explanatory variables.
Statistical Applets Link
Assessment Link: Assessment is broadly categorized as either formative or summative. Here, we provide suggestions for how technology may be incorporated for each of these assessment methods.
Formative Assessment
Consider the use of personal response systems (PRS). PRS may be used to develop an active learning environment. The advantage of using PRS is that they increase engagement, encourage peer-to-peer instruction, and allow the instructor to target lectures as a result of the immediate feedback. A leading expert of using peer-to-peer instruction in classrooms is Eric Mazur of Harvard University. Here is an article on using peer-to-peer instruction. Even though Eric is a professor of physics, his methods apply to Introductory Data Science course as well. Here are some YouTube videos of Dr. Mazur’s thoughts:
Allow for more in-depth assignments that utilize large data frames. For example, provide a large data frame and ask students to determine the best explanatory variable for a given response variable based on visualizations. Build a regression or nonparametric model (e.g. decision tree) to model and assess the relationship.
Consider the use of course management systems that algorithmically generate homework problems for students. These systems typically provide instant feedback to students along with learning aids that help students learn how to solve statistical problems.
Summative Assessment This assessment method can be a challenge for instructors who do not have access to labs with computers. However, there are techniques that may be utilized in this environment even without the use of hand-held technology.
- Use screen captures of data science output to interpret results. For example, provide output of a least-squares regression application and ask a variety of questions regarding the output such as (i) state the correlation coefficient (ii) state the least-squares regression equation, (iii) make predictions (iv) ask students to determine whether a particular observation is above or below average for a particular value of the explanatory variable, and so on. The instructor could also provide graphs such as residual plots to assess the linear model (outliers, appropriateness of the linear model, and so on).
Provide multiple screen captures. Require the student to identify the correct output before interpreting results.
Additional Resources
Annotated bibliography
Examples
Assessments