3 minute read
Written by: Lauren Wolfe - Training Specialist
As we’ve been reassessing our work here at the Coop, we have found that there is a need to facilitate more interaction between the people that make up our community. It makes sense - we no longer have easy access to a quick catch up in the hallways or over lunch on campus and many of us have settled into our new socially-distant lifestyle. This is why the Coop Blog is going to shift some focus back onto our community by highlighting members and their data related work. For our first community highlight we’re showcasing Ellis and his TidyX screencast!
Ellis is a self-described bioengineer turned data scientist who works as a statistical programmer within the Statistical Center for HIV/AIDS Research and Prevention (SCHARP) here at Fred Hutch. Folks who’ve spent a lot of time around the Coop or the R community in general will likely already be familiar with him as he’s an regular user of the Coop Slack channel and has hosted various Coop community groups such as the data visualization group. Outside of his work at Fred Hutch, Ellis is an active member of the larger R community contributing to open-source projects and organizing the local Seattle UseR group.
To keep his skills sharp Ellis regularly takes part in Tidy Tuesday, a project out of the R for Data Science Online Learning Community. The goal of Tidy Tuesday is to provide a constructive and supportive space for new and seasoned R users alike to practice their data wrangling and visualization skills. The premise is that each week a dataset is shared out via the TidyTuesday github repository. The dataset has generally been “tamed”, meaning that some cleaning has been done, but it doesn’t meet the standards of being “tidy”. Tidy data is a term coined by Hadley Wickham. The principles behind tidy data act to provide a standard way of organizing data values in a dataset to facilitate analysis. Participants are asked to use techniques from the (R for Data Science book)[https://r4ds.had.co.nz/] to explore and tidy the dataset and finally create a visualization to share out to the R statistics community via Twitter using the hashtag #TidyTuesday.
Having participated in Tidy Tuesday and seen the benefits, Ellis wanted to know why R users were not taking advantage of this free, weekly event! To get some answers he conducted a Twitter poll asking what were the main reasons that kept folks from taking part in Tidy Tuesday. The top two answers were lack of time and newcomers felt nervous and did not know where to start. While he couldn’t really change the fact that some folks don’t have time to participate, he definitely could find a way to reduce the barrier of entry for Tidy Tuesday. And so TidyX was born!
TidyX is a screencast with goal of providing concrete examples of how to interact with a Tidy Tuesday dataset. In each episode Ellis and his co-host, Patrick Ward, work through the code of a Tidy Tuesday submission from a pervious week. Code review is a great way to get ideas on how to explore a dataset and learn about new methods for data wrangling and visualization. Then Ellis shows viewers how the same code can be applied to a different dataset to emphasize how code can be reused between datasets with similar features. Check out the most recent TidyX where Ellis and Patrick work through some Tidy Tuesday code with David Robinson, a seasoned R stats user and Tidy Tuesday participant!
To stay up to date with the latest and greatest you can follow TidyX on twitter at @tidy_explained.
For more R stats content from Ellis you can follow him on Twitter at @ellis_hughes. You’ll come for the Tidy Tuesday posts but stay for the GitHub poetry. You can also check out Ellis’s work on GitHub at @thebioengineer. Of specific interest might be his TidyTuesdayR package, which allows you to download Tidy Tuesday datasets without ever leaving the R console.
We are big fans of Tidy Tuesday here at the Coop and highly recommend giving it a try to test your data wranging and visualization skills in R! And now, if you need some inspiration there is the TidyX screencast to guide the way! In the past, Coop community members had taken part in Tidy Tuesday through the Data Visualization community group. While that group is on hiatus due to COVID-19, we still encourage anyone taking part in Tidy Tuesday to post their work in the #data-viz channel on the Coop Slack to share their work and get feedback.
If your or someone you know is working on a project that you think should be highlighted - let us know by reaching out on Slack, MS Teams, or by emailing