R Weekly 2017 Issue 9
Highlight
-
Fitting logistic regression on 100gb dataset on a laptop dsnotes.com - Lessons learned from “Outbrain Click Prediction” kaggle competition
( dsnotes.com ) -
Finding Radiohead’s most depressing song, with Rrcharlie.com
( rcharlie.com ) -
How to Teach R: Common mistakeswww.rstudio.com
( rstudio.com ) -
Factoextra R Package: Easy Multivariate Data Analyses and Elegant Visualizationwww.sthda.com
( sthda.com )
R in the Real World
-
Who is Alan Turing?fronkonstin.com - Last September, the British government announced its intention to pursue what has become known as the Alan Turing law.
( fronkonstin.com ) -
Putting It All Togetherrud.is - The U.S. labor force participation rate (LFPR) is an oft-overlooked and under- or mis-reported economic indicator.
( rud.is )
-
H-1B Visa Petitions Exploratory Data Analysisblog.nycdatascience.com
( nycdatascience.com ) -
#AskNASA: What’s the Optimal Time for Aliens to Invade Earth?www.exactness.net
( exactness.net ) -
How Herd Immunity Works [OC]www.reddit.com
( reddit.com )
-
Fitting logistic regression on 100gb dataset on a laptop dsnotes.com - Lessons learned from “Outbrain Click Prediction” kaggle competition
( dsnotes.com ) -
Finding Radiohead’s most depressing song, with Rrcharlie.com
( rcharlie.com ) -
First commit or initial commit?maelle.github.io - Today I used the gh package to get first commits of all repositories of the ropensci and ropenscilabs organizations.
( maelle.github.io ) -
coauthorship and citation networksxianblog.wordpress.com
( xianblog.wordpress.com ) -
SatRday and visual inference of vine copulasblog.eighty20.co.za
( eighty20.co.za ) -
Predicting food preferences with sparklyr (machine learning)shiring.github.io
( shiring.github.io )
Insights
-
Reporting in a Repeatable, Parameterised, Transparent Wayblog.ouseful.info
( ouseful.info ) -
How to Teach R: Common mistakeswww.rstudio.com
( rstudio.com ) -
The difference between R and Excelblog.revolutionanalytics.com
( revolutionanalytics.com ) -
On Watering Holes, Trust, Defensible Systems and Data Science Community Securityrud.is - How to install R packages securely
( rud.is )
- rxNeuralNet vs. xgBoost vs. H2Otomaztsql.wordpress.com
( tomaztsql.wordpress.com )
R Internationally
R in Organizations
-
Prophet: How Facebook operationalizes time series forecasting at scaleblog.revolutionanalytics.com
( revolutionanalytics.com ) -
neuroconductorwww.neuroconductor.org - Neuroconductor is an open-source platform for rapid testing and dissemination of reproducible computational imaging software.
( neuroconductor.org )
R in Academia
- Free DataCamp for your Classroomwww.datacamp.com
( datacamp.com )
Tutorials
-
Training Neural Networks with MXNetwww.jakubglinka.com
( jakubglinka.com ) -
Make Power Fun (Again?)educate-r.org
( educate-r.org ) -
Factor Analysis with the Principal Factor Method and Rwww.aaronschlegel.com
( aaronschlegel.com ) -
Is my time series additive or multiplicative?itsalocke.com
( itsalocke.com ) -
Factoextra R Package: Easy Multivariate Data Analyses and Elegant Visualizationwww.sthda.com
( sthda.com )
-
The Zero Bugwww.win-vector.com - Common data aggregation tools often can not “count to zero” from examples, and this causes problems.
( win-vector.com ) -
Bar bar plots but not Babar plotsmaelle.github.io
( maelle.github.io ) -
Mapping Biodiversity data on smaller than one degree scalevijaybarve.wordpress.com
( vijaybarve.wordpress.com ) -
Quick tip: knitr Python Windows setup checklistitsalocke.com
( itsalocke.com ) -
Part 3: Spatial analysis of geotagged datawww.seascapemodels.org
( seascapemodels.org ) -
Raccoon Ch 2.5 – Unbalanced and Nested Anovawww.quantide.com
( quantide.com ) -
How to make a global map in R, step by stepsharpsightlabs.com
( sharpsightlabs.com )
Videos and Podcasts
-
R Consortium ISC Project Status Webinarwww.r-consortium.org
( r-consortium.org ) -
NSSD 33 - Big Time Bubble Timesoundcloud.com - About the interview process for data science and data analyst jobs, opinionated analysis development, and what job interviews were like back in 1999.
( soundcloud.com )
Resources
- RStudio Extensionsrstudio.github.io - An R Markdown website that documents the various ways users can extend the RStudio IDE.
( rstudio.github.io )
New Packages and Tools
- Prophetfacebookincubator.github.io - Prophet is a forecasting procedure implemented in R and Python. It is fast and provides completely automated forecasts that can be tuned by hand by data scientists and analysts.
( facebookincubator.github.io )
-
startupcran.r-project.org - Friendly R startup configuration with multiple files under .Rprofile.d/ and .Renviron.d/ that can be conditionally included / excluded based on their filenames and R features available.
( cran.r-project.org ) -
ggimagecran.r-project.org - Supports aesthetic mapping of image files to be visualized in ‘ggplot2’ graphic system.
( cran.r-project.org )
-
SentimentAnalysiswww.rblog.uni-freiburg.de - Dictionary-based sentiment analysis
( rblog.uni-freiburg.de ) -
strcodelorenzwalthert.github.io - Structure and abstract your code. The strcode package contains tools to organize your code better. It consists of an RStudio Add-in to divide code into sections and a function to get a summary of a codebase.
( lorenzwalthert.github.io ) -
More January Package Pickswww.rstudio.com
( rstudio.com ) -
ggraphwww.data-imaginist.com - A grammar of graphics for relational data
( data-imaginist.com )
- RcppMLPACK2gallery.rcpp.org - RcppMLPACK2 and the MLPACK Machine Learning Library
( gallery.rcpp.org )
New Releases
-
future 1.3.0www.jottr.org - Unified Parallel and Distributed Processing in R for Everyone.
( jottr.org ) -
leaflet 1.1.0blog.rstudio.org - interactive maps for R
( rstudio.org ) -
RPushbullet 0.3.1dirk.eddelbuettel.com - RPushbullet is an R client for the wonderful Pushbullet messaging / notification system.
( dirk.eddelbuettel.com ) -
R Tools for Visual Studio 1.0 Previewblog.revolutionanalytics.com
( revolutionanalytics.com ) -
padr 0.2.0edwinth.github.io - padr::pad does now do group padding
( edwinth.github.io ) -
rmdformats 0.3.2github.com - The goal is to produce clean documents “out of the box”, with or without the RStudio IDE.
( github.com )
R Project Updates
Updates from R Coredeveloper.r-project.org:
-
Encoding name
"utf8"
is mapped to"UTF-8"
. Many implementations oficonv
accept"utf8"
, but not GNUlibiconv
(including the current version 1.15). -
(C-level Native routine registration.) The undocumented
styles
field of the components ofR_CMethodDef
andR_FortranMethodDef
is deprecated. -
Fix for
cairo_pdf()
(andsvg()
andcairo_ps()
) when replaying saved display list that contains mix ofgrid
andgraphics
output. Thanks to Yihui Xie. -
(C-level Native routine registration.) The deprecated
styles
component ofR_CMethodDef
andR_FortranMethodDef
no longer does anything. -
sessionInfo()
shows the full paths to the library or executable files providing the BLAS/LAPACK implementations currently in use (not available on Windows). -
grep(perl = TRUE)
and friends can now make use of PCRE’s Just-In-Time mechanism, for PCRE >= 8.20 on platforms where JIT is supported. It is used by default whenever thepattern
is studied, which by default requires an inputx
of length at least 10. (Based on a patch from Mikko Korpela.) This is controlled by a new optionPCRE_use_JIT
. -
There is a new option
PCRE_study
which controls whengrep(perl = TRUE)
and friendsstudy
the compiled pattern. -
The deprecated support for PCRE versions older than 8.20 will be removed in R 3.4.1. (Versions 8.20-8.31 will still be accepted but deprecated.)
-
grep(perl = TRUE)
and friends set a maximal recursion limit, taking into account R’s estimate of the remaining C stack space. This reduces the chance of C stack overflow, but because it is conservative may return a non-match with a warning in examples that succeeded before. (PR#16757) -
The binning algorithm used by bandwidth selectors
bw.ucv()
,bw.bcv()
andbw.SJ()
switches to a version linear in the input sizen
forn > nb/2
. (The calculations are the same, but for largen/nb
it is worth doing the binning in advance.) -
R CMD Rd2pdf
had problems with packages with non-ASCII titles in.Rd
files (usually the titles were omitted). -
The internal methods of
download.file()
andurl()
now report that they cannot follow this (rather than failing silently). -
(Unix-alike)
download.file(method = "auto")
(the default) re-tries withmethod = "libcurl"
. -
(Unix-alike)
url(method = "default")
with an explicitopen
argument re-tries withmethod = "libcurl"
. This covers many of the usages, e.g.readLines()
with a URL argument.
Upcoming Events
-
R/Finance 2017www.rinfinance.com May 19 and 20, 2017
From the inaugural conference in 2009, the annual R/Finance conference in Chicago has become the primary meeting for academics and practioners interested in using R in Finance.( rinfinance.com ) -
useR! 2017user2017.brussels July 4, 2017
The annual useR! conference is the main meeting of the international R user and developer community.( user2017.brussels )
More past events at R conferences & meetupsconf.rweekly.org.
Quotes of the Week
Leo Tolstoy on the #tidyversetwitter.com and #rstatstwitter.com pic.twitter.com/KGN36yaQM4t.co
— Sean Kross (@seankross) twitter.com( twitter.com ) ( twitter.com )
Max Kuhn: "At @rstudiotwitter.com about 30% of our work is choosing gifs and naming functions" #rstatstwitter.com
— Emily Robinson (@robinson_es) twitter.com( twitter.com ) ( twitter.com )