4 min read

My rstudioconf 2019 highlights

Event details

January 15-18th. Austin, TX. Combination of 2 day workshops and 2 days of conference.

The RStudio official repo with abstracts for every session, workshop, and e-poster: https://github.com/rstudio/rstudio-conf/tree/master/2019

All the conference session recordings: https://resources.rstudio.com/rstudio-conf-2019

Overall thoughts

This was my second RStudioconf.

My biggest motivator continued to be getting exposed to different applications of R and the RStudio tool set.

I met people from all kinds of areas including financial, educational admin, insurance, life sciences, other technology companies and on and on. It was a VERY diverse group of attendees.

Attendance was around 1700 and seems to have grown by at least 50% since last year based on a chart shown during the opening keynote.

The community is my favorite part of working with R and attending this conference. Everyone knew more than me but was extremely approachable and willing to help answer some of my questions.

It definitely felt like there were more and more people trying to figure out how and where to move R into production.

I do think the bulk of the value comes from the 2 day course - workshop attendance is now large enough where you can meet a ton of people on break and during the socials. The break out sessions during the conference are 20m quick overviews, which are useful, but can be viewed online.

That being said, I did purchase a, “super early bird,” pass for next year’s conference in San Francisco.

Applied Machine Learning Course

All content (slides and code) available here: https://github.com/topepo/rstudio-conf-2019

I took the Applied Machine Learning course, led by Max Kuhn - led by the author of Applied Predictive Modeling

I just bought this book because contrary to much of the material out there, it appears that it spends much more time on practical aspects of the overall modeling process - and not just the FUN parts (analysis and visualization).

Max has a background in applying statistical analysis in the pharmaceutical, medical and manufacturing spaces.

Max now works for RStudio and is focused on creating packages to make modeling better and easier.

While a significant portion of the course went faster than I could absorb it was a great use of time because I was exposed to a brand new set of tools and concepts to experiment with. Most of these are part of, or related to, the tidymodels suite of packages

  • The parsnip package: One issue that makes using R cumbersome in general is that different functions that do the same thing often have different interfaces and arguments. Parsnip standardizes this for modeling related functions and this should significantly reduce the amount of syntactical minutia that needs to be memorized.

  • The tidypredict package: Run predictive models in SQL from R.

  • Multivariate adaptive regression splines (MARS) modeling: allows you to apply linear models to non-linear data. Linear models are usually the easiest to understand explain to stakeholders but are constrained in that they assume the thing you are modeling is linear. MARS gets around that constraint. MARS is implemented in the earth R package.

  • The rsample package: Makes it easy to bootstrap and resample (test and training sets)

  • the recipes package: I wasn’t quite ready for this but from what I learned it makes it easier to reuse work done to pre-process data for future models. I expect to learn more about this when I advance my learning of feature engineering.

  • Max is working on a new book focused on the concept of feature engineering. One of the ways this would be applied is if a model is created and none of the predictors are especially effective at predicting the outcome. Applying transformations (log, exponential, etc.) to the predictors may improve predictive performance. The WIP version is available for FREE here: Feature Engineering and Selection: A Practical Approach for Predictive Models