R Weekly 2017 Issue 9
Highlight
-
Fitting logistic regression on 100gb dataset on a laptop - Lessons learned from “Outbrain Click Prediction” kaggle competition
-
Factoextra R Package: Easy Multivariate Data Analyses and Elegant Visualization
R in the Real World
-
Who is Alan Turing? - Last September, the British government announced its intention to pursue what has become known as the Alan Turing law.
-
Putting It All Together - The U.S. labor force participation rate (LFPR) is an oft-overlooked and under- or mis-reported economic indicator.
-
Fitting logistic regression on 100gb dataset on a laptop - Lessons learned from “Outbrain Click Prediction” kaggle competition
-
First commit or initial commit? - Today I used the gh package to get first commits of all repositories of the ropensci and ropenscilabs organizations.
-
Predicting food preferences with sparklyr (machine learning)
Insights
-
On Watering Holes, Trust, Defensible Systems and Data Science Community Security - How to install R packages securely
R Internationally
R in Organizations
-
Prophet: How Facebook operationalizes time series forecasting at scale
-
neuroconductor - Neuroconductor is an open-source platform for rapid testing and dissemination of reproducible computational imaging software.
R in Academia
Tutorials
-
The Zero Bug - Common data aggregation tools often can not “count to zero” from examples, and this causes problems.
Videos and Podcasts
-
NSSD 33 - Big Time Bubble Time - About the interview process for data science and data analyst jobs, opinionated analysis development, and what job interviews were like back in 1999.
Resources
- RStudio Extensions - An R Markdown website that documents the various ways users can extend the RStudio IDE.
New Packages and Tools
- Prophet - Prophet is a forecasting procedure implemented in R and Python. It is fast and provides completely automated forecasts that can be tuned by hand by data scientists and analysts.
-
startup - Friendly R startup configuration with multiple files under .Rprofile.d/ and .Renviron.d/ that can be conditionally included / excluded based on their filenames and R features available.
-
ggimage - Supports aesthetic mapping of image files to be visualized in ‘ggplot2’ graphic system.
-
SentimentAnalysis - Dictionary-based sentiment analysis
-
strcode - Structure and abstract your code. The strcode package contains tools to organize your code better. It consists of an RStudio Add-in to divide code into sections and a function to get a summary of a codebase.
-
ggraph - A grammar of graphics for relational data
- RcppMLPACK2 - RcppMLPACK2 and the MLPACK Machine Learning Library
New Releases
-
future 1.3.0 - Unified Parallel and Distributed Processing in R for Everyone.
-
leaflet 1.1.0 - interactive maps for R
-
RPushbullet 0.3.1 - RPushbullet is an R client for the wonderful Pushbullet messaging / notification system.
-
padr 0.2.0 - padr::pad does now do group padding
-
rmdformats 0.3.2 - The goal is to produce clean documents “out of the box”, with or without the RStudio IDE.
R Project Updates
Updates from R Core:
-
Encoding name
"utf8"
is mapped to"UTF-8"
. Many implementations oficonv
accept"utf8"
, but not GNUlibiconv
(including the current version 1.15). -
(C-level Native routine registration.) The undocumented
styles
field of the components ofR_CMethodDef
andR_FortranMethodDef
is deprecated. -
Fix for
cairo_pdf()
(andsvg()
andcairo_ps()
) when replaying saved display list that contains mix ofgrid
andgraphics
output. Thanks to Yihui Xie. -
(C-level Native routine registration.) The deprecated
styles
component ofR_CMethodDef
andR_FortranMethodDef
no longer does anything. -
sessionInfo()
shows the full paths to the library or executable files providing the BLAS/LAPACK implementations currently in use (not available on Windows). -
grep(perl = TRUE)
and friends can now make use of PCRE’s Just-In-Time mechanism, for PCRE >= 8.20 on platforms where JIT is supported. It is used by default whenever thepattern
is studied, which by default requires an inputx
of length at least 10. (Based on a patch from Mikko Korpela.) This is controlled by a new optionPCRE_use_JIT
. -
There is a new option
PCRE_study
which controls whengrep(perl = TRUE)
and friendsstudy
the compiled pattern. -
The deprecated support for PCRE versions older than 8.20 will be removed in R 3.4.1. (Versions 8.20-8.31 will still be accepted but deprecated.)
-
grep(perl = TRUE)
and friends set a maximal recursion limit, taking into account R’s estimate of the remaining C stack space. This reduces the chance of C stack overflow, but because it is conservative may return a non-match with a warning in examples that succeeded before. (PR#16757) -
The binning algorithm used by bandwidth selectors
bw.ucv()
,bw.bcv()
andbw.SJ()
switches to a version linear in the input sizen
forn > nb/2
. (The calculations are the same, but for largen/nb
it is worth doing the binning in advance.) -
R CMD Rd2pdf
had problems with packages with non-ASCII titles in.Rd
files (usually the titles were omitted). -
The internal methods of
download.file()
andurl()
now report that they cannot follow this (rather than failing silently). -
(Unix-alike)
download.file(method = "auto")
(the default) re-tries withmethod = "libcurl"
. -
(Unix-alike)
url(method = "default")
with an explicitopen
argument re-tries withmethod = "libcurl"
. This covers many of the usages, e.g.readLines()
with a URL argument.
Upcoming Events
-
R/Finance 2017 May 19 and 20, 2017
From the inaugural conference in 2009, the annual R/Finance conference in Chicago has become the primary meeting for academics and practioners interested in using R in Finance. -
useR! 2017 July 4, 2017
The annual useR! conference is the main meeting of the international R user and developer community.
More past events at R conferences & meetups.
Quotes of the Week
Leo Tolstoy on the #tidyverse and #rstats pic.twitter.com/KGN36yaQM4
— Sean Kross (@seankross)
Max Kuhn: "At @rstudio about 30% of our work is choosing gifs and naming functions" #rstats
— Emily Robinson (@robinson_es)