Simon P. Couch

statistics, software, sociology

About Me


I'm currently a software engineer on the {tidymodels} team at RStudio, maintaining R software packages like {broom}, {stacks}, and {infer}.

I care about free and open source statistical software that reckons with sociological understandings of inequality in its design.

When I'm not working, I enjoy cooking, hanging with my dog Millie, and playing folk music.

Some Projects I Work On

A Grammar for Stacked Ensemble Modeling

Most recently, I've been working on {stacks}, an R package for stacked ensemble modeling that closely aligns with functionality from the rest of the {tidymodels} package ecosystem. This project began as an internship project at RStudio and later became the focus of my undergraduate thesis project. {stacks} was awarded the 2021 John M. Chambers Statistical Software Award.

learn more // source code

Convert Statistical Analysis Objects to Tidy Tibbles

I co-author and maintain an R package called {broom}, which provides methods to represent common statistical outputs in R (namely, model objects) as rectangular data.

learn more // source code

Emphasizing Intuition in Hypothesis Testing

I co-author and co-maintain an R package called {infer} which provides a framework for hypothesis testing that aligns with tidy data principles and emphasizes the underlying intuition behind "statistical significance."

learn more // paper // source code

Methods for Forest Ecological Modeling

I've co-authored an R package {forestecology} providing methods for model fitting and assessment in forest ecology. Specifically, the package implements model wrappers for use on spatially mapped, repeat censused forests plots to estimate species-specific competition coefficients.

learn more // source code

An R API for Flights Data

I maintain the R package {anyflights}, which allows users to query datasets similar to those found in the canonical {nycflights13} data package for any recent year and US airport.

learn more // source code

Making Bikeshare Data Accessible

I maintain and co-author the R package {gbfs}, providing a suite of tools to easily interface with live bikeshare data from hundreds of cities worldwide.

learn more // source code

An Alternative to Rankings-Based College Exploration

Newsmagazine college rankings like those provided by U.S. News are both unhelpful for prospective students and reinforce existing status obsession. As an alternative, we developed a data-centered tool that focuses on what makes a college similar to, not better than, another college.

learn more // source code

Research Projects

Differentially Private Nonparametric Hypothesis Testing

Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security - London, United Kingdom

Our group developed a new class of privacy-preserving hypothesis tests, largely analogous to the non-parametrics in conventional statistics, that drastically improves on all existing differentially-private tests intended for the same use cases.

Simon P. Couch , Zeki Kazan, Kaiyan Shi, Andrew Bray, Adam Groce

official publication // source code

Encouraging Equitable Bikeshare

In making use of my R package {gbfs} to scrape latitude/longitudes of over 45,000 bikes across the country and joining that data with census demographic data, this project set out to determine the characteristics of a community that are most salient in understanding the spatial distribution of bikes, and how these interact with different operating models of bikeshare programs. A preliminary report on this research tied for second place in the ASA-sponsored “Undergraduate Statistics Research Competition.”

Advised by Heather Kitada Smalley PhD

working paper // source code

Gendering Code Documentation

Through interviews with women in statistical computing, as well as quantitative analyses utilizing web scraping and text mining methods, I argue that code documentation in the tidyverse is written more effectively and inclusively than other R code documentation, and that this is meaningful for understanding gender representation in the user community.

Advised by Kjersten Whittington PhD

working paper // source code

Social Divisions in Data

I argue that, rather than viewing datasets as unargumentative, sterile, or “raw,” datasets express and reinforce the social conceptualizations held by their authors. More specifically, in this paper, I examine how sex, gender, race, and ethnicity are named and encoded as variables in data, and how this process is patterned by identities held by the datasets’ authors.

Advised by Kjersten Whittington PhD

working paper // source code

Race, Differential Privacy, and the 2020 U.S. Census

This project examines the racialized implications of the U.S. Census Bureau's decision to implement differentially private count estimation algorithms to estimate subpopulation sizes in the 2020 U.S. Census.

Advised by Yaejoon Kwon PhD

working paper // source code

Professional Experience

...In Industry

Software Engineer - RStudio, PBC
Remote // April 2022 - Present

developing and maintaining R software packages for statistical modeling

Software Engineer (Contract) - RStudio, PBC
Remote // May 2021 - July 2021

introduce support for multiple regression and unify hypothesis testing interfaces in the {infer} package, culminating in the first production release of the software

Intern - RStudio, PBC
Remote // May 2020 - July 2020

lead a major update of the {broom} R package and begin development on {stacks}, a package for {tidymodels}-aligned stacked ensemble modeling.

Analytics Intern - John Deere
Olathe, Kansas // May 2019 - August 2019

automated data integrity checks, consulted on third-party data purchases, reported on survey panel research, and developed geospatial models on customer mobility

Data Analysis Intern - Kartini Clinic for Children & Families
Portland, OR // January 2018

analyzed, interpreted, and modeled financial and pharmacogenetic data, conducted company-wide presentations to communicate notable findings, and advised future data collection strategies at this adolescent eating disorder clinic


Data Science Consultant
Instructional Technology Services
January 2019 - May 2021

advise and implement programs in R and other data technologies to catalyze research efforts by faculty and student researchers

Teaching Assistant
Assistant Professor of Statistics Kelly McConville
August 2019 - May 2021

host lab sessions for courses in computational statistics, foster a positive and inclusive environment in what is many students' first experience with coding and analytics

R Developer
Associate Professor of Statistics Andrew Bray
December 2017 - May 2021

write, optimize, build tests for, and maintain expressive, concise, and computationally efficient code for several R packages

House Adviser
Reed College Residence Life
August 2018 - May 2020

maintain close relationships with each of 15 to 20 residents yearly, plan times to come together as a group, and encourage proactive conflict resolution

Peer Career Adviser
Center for Life Beyond Reed
August 2018 - May 2019

advised students one-on-one on various professional development pursuits and coordinated in-residence workshops and information sessions, leading the Research, Technology, and Innovation Community of Purpose


R (advanced)

Data Pedagogy (RStudio-certified)

Git, GitHub, & Actions (intermediate)

Python (intermediate)

SQL (intermediate)

ArcGIS Pro (intermediate)

Tableau (intermediate)

LaTeX (intermediate)

CSS (basic)

HTML (basic)

Honors & Awards

Graduate Research Fellowship
National Science Foundation // May 2021
An $138,000 grant "recogniz[ing] and support[ing] outstanding graduate students in NSF-supported STEM disciplines" that supported my time as a Biostatistics student at the Johns Hopkins Bloomberg School of Public Health.

John M. Chambers Statistical Software Award
American Statistical Association // January 2021
My R package {stacks} won this prize awarded for the "development and implementation of computational tools for the statistical profession by a graduate or undergraduate student."

Goldwater Scholar
Barry Goldwater Scholarship & Excellence in Education Foundation // April 2020
"[O]ne of the oldest and most prestigious national scholarships in the natural sciences, engineering and mathematics in the United States... [awarded to students] who show exceptional promise of becoming this Nation’s next generation of research leaders."

2nd Place (tied)—Undergraduate Statistics Research Project Competition
American Statistical Association // September 2019
A competition aiming to recognize research completed by undergraduates with exceptional significance, originality, and clarity of presentation, the first prize for which was awarded to “Encouraging Equitable Bikeshare: Implications of Docked and Dockless Models for Spatial Equity.”

Commendation for Excellence
Reed College President’s Office // June 2018, 2019
Awarded to the top 5 percent of each class, based on school year GPA, annually.

1st Place (tied)—Undergraduate Statistics Research Project Competition
American Statistical Association // October 2018
A competition aiming to recognize research completed by undergraduates with exceptional significance, originality, and clarity of presentation, the first prize for which was awarded to “A Differentially Private Wilcoxon Signed-Rank Test.”

All-State Academic Scholar
Lawrence Journal-World // June 2017
An honor “meant to recognize the most promising high school senior [in Kansas]… based on their strong academics, extracurricular involvement, and essays.”

Contact Me

I'm always glad to talk data science, statistics, and carrying more inclusive masculinities in these settings.

Feel free to reach out!

Also, connect with me. :)