HOME ABOUT PROJECTS RESEARCH EXPERIENCE CONTACT

Simon P. Couch

statistics, software, sociology



About Me

I'm currently wrapping up my Bachelor's in math/statistics (minor in sociology) at Reed College and maintaining a few R packages.

Most recently, I've been focusing on developing {stacks}, an R package for stacked ensemble modeling that aligns with the {tidymodels} ecosystem. This began as an internship project at RStudio, and is now the focus of my undergraduate thesis.

I care about data science, statistics, and the thoughtful application of these practices in enabling and articulating social arguments.

When I'm not working, I enjoy gardening, cooking, and spending time with friends and family!


Some Projects I Work On

A Grammar for Stacked Ensemble Modeling

Most recently, I've been working on {stacks}, an R package for stacked ensemble modeling that closely aligns with functionality from the rest of the {tidymodels} package ecosystem. This project began as an internship project at RStudio, and is now the focus of my undergraduate thesis project.

source code

Convert Statistical Analysis Objects to Tidy Tibbles

I co-author and co-maintain an R package called {broom}, which provides methods to represent common statistical outputs in R (namely, model objects) as rectangular data.

learn more // source code

Emphasizing Intuition in Hypothesis Testing

I co-maintain an R package called {infer} which provides a framework for hypothesis testing that aligns with tidy data principles and emphasizes the underlying intuition behind "statistical significance."

learn more // source code

An R API for Flights Data

I maintain the R package {anyflights}, which allows users to query datasets similar to those found in the canonical {nycflights13} data package for any recent year and US airport.

learn more // source code

Making Bikeshare Data Accessible

I maintain and co-author the R package {gbfs}, providing a suite of tools to easily interface with live bikeshare data from hundreds of cities worldwide.

learn more // source code

An Alternative to Rankings-Based College Exploration

Newsmagazine college rankings like those provided by U.S. News are both unhelpful for prospective students and reinforce existing status obsession. As an alternative, we developed a data-centered tool that focuses on what makes a college similar to, not better than, another college.

learn more // source code


Research Projects

Differentially Private Nonparametric Hypothesis Testing

Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security - London, United Kingdom

Our group developed a new class of privacy-preserving hypothesis tests, largely analogous to the non-parametrics in conventional statistics, that drastically improves on all existing differentially-private tests intended for the same use cases.

Simon P. Couch , Zeki Kazan, Kaiyan Shi, Andrew Bray, Adam Groce

official publication // source code

Encouraging Equitable Bikeshare

In making use of my R package {gbfs} to scrape latitude/longitudes of over 45,000 bikes across the country and joining that data with census demographic data, this project set out to determine the characteristics of a community that are most salient in understanding the spatial distribution of bikes, and how these interact with different operating models of bikeshare programs. A preliminary report on this research tied for second place in the ASA-sponsored “Undergraduate Statistics Research Competition.”

Advised by Heather Kitada Smalley Ph.D.

working paper // source code

Gendering Code Documentation

Through interviews with women in statistical computing, as well as quantitative analyses utilizing web scraping and text mining methods, I argue that code documentation in the tidyverse is written more effectively and inclusively than other R code documentation, and that this is meaningful for understanding gender representation in the user community.

Advised by Kjersten Whittington Ph.D.

working paper // source code

Social Divisions in Data

I argue that, rather than viewing datasets as unargumentative, sterile, or “raw,” datasets express and reinforce the social conceptualizations held by their authors. More specifically, in this paper, I examine how sex, gender, race, and ethnicity are named and encoded as variables in data, and how this process is patterned by identities held by the datasets’ authors.

Advised by Kjersten Whittington Ph.D.

working paper // source code

Race, Differential Privacy, and the 2020 U.S. Census

This project examines the racialized implications of the U.S. Census Bureau's decision to implement differentially private count estimation algorithms to estimate subpopulation sizes in the 2020 U.S. Census.

Advised by Yaejoon Kwon Ph.D.

working paper // source code


Professional Experience

...In Industry


Intern - RStudio
Remote // May 2020 - July 2020

implement new functionality, address bug reports and pull requests, improve documentation, and increase unit testing coverage for {tidymodels} R software packages


Analytics Intern - John Deere
Olathe, Kansas // May 2019 - August 2019

automated data integrity checks, consulted on third-party data purchases, reported on survey panel research, and developed geospatial models on customer mobility


Data Analysis Intern - Kartini Clinic for Children & Families
Portland, OR // January 2018

analyzed, interpreted, and modeled financial and pharmacogenetic data, conducted company-wide presentations to communicate notable findings, and advised future data collection strategies at this adolescent eating disorder clinic

...On-Campus


Data Science Consultant
Instructional Technology Services
January 2019 - Present

advise and implement programs in R and other data technologies to catalyze research efforts by faculty and student researchers


Teaching Assistant
Assistant Professor of Statistics Kelly McConville
August 2019 - Present

lecture and host lab sessions for courses in computational statistics, foster a positive and inclusive environment in what is many students' first experience with coding and analytics


R Developer
Associate Professor of Statistics Andrew Bray
December 2017 - May 2020

write, optimize, build tests for, and maintain expressive, concise, and computationally efficient code to assist R users in real-world problem solving


House Adviser
Reed College Residence Life
August 2018 - May 2020

maintain close relationships with each of 15 to 20 residents yearly, plan times to come together as a group, and encourage proactive conflict resolution


Peer Career Adviser
Center for Life Beyond Reed
August 2018 - May 2019

advised students one-on-one on various professional development pursuits and coordinated in-residence workshops and information sessions, leading the Research, Technology, and Innovation Community of Purpose

Skills


R (advanced)

Git & GitHub (intermediate)

Python (intermediate)

SQL (intermediate)

ArcGIS Pro (intermediate)

Tableau (intermediate)

LaTeX (intermediate)

CSS (basic)

HTML (basic)

Honors & Awards


Goldwater Scholar
Barry Goldwater Scholarship & Excellence in Education Foundation // April 2020
"[O]ne of the oldest and most prestigious national scholarships in the natural sciences, engineering and mathematics in the United States... [awarded to students] who show exceptional promise of becoming this Nation’s next generation of research leaders."


2nd Place (tied)—Undergraduate Statistics Research Project Competition
American Statistical Association // September 2019
A competition aiming to recognize research completed by undergraduates with exceptional significance, originality, and clarity of presentation, the first prize for which was awarded to “Encouraging Equitable Bikeshare: Implications of Docked and Dockless Models for Spatial Equity.”


Commendation for Excellence
Reed College President’s Office // June 2018, 2019
Awarded to the top 5 percent of each class, based on school year GPA, annually.


1st Place (tied)—Undergraduate Statistics Research Project Competition
American Statistical Association // October 2018
A competition aiming to recognize research completed by undergraduates with exceptional significance, originality, and clarity of presentation, the first prize for which was awarded to “A Differentially Private Wilcoxon Signed-Rank Test.”


All-State Academic Scholar
Lawrence Journal-World // June 2017
An honor “meant to recognize the most promising high school senior [in Kansas]… based on their strong academics, extracurricular involvement, and essays.”


Contact Me

I'm always glad to talk data science, statistics, and carrying more inclusive masculinities in these settings.

Feel free to reach out!

Also, connect with me. :)