Some interesting Data Science stuff found between 2018-02-01 and 2018-02-28.


https://www.youtube.com/watch?v=atiYXm7JZv0 - (by J.J. Allaire from Rstudio) - Machine Learning with R and TensorFlow - a video introduction to Deep Learning in R.

#deeplearning #rstats #tensorflow #datascience https://t.co/W4SjSTBYQq


https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 - (by James Le) - a collection of some basic ML algorithms for newbies. Pictures in the article are pretty good.

#datascience #ml https://t.co/QGOliYhXgt


https://tensorflow.rstudio.com/blog/keras-customer-churn.html - (by Matt Dancho) - predicting customer churn using deep learning in R.

#rstats #keras #deeplearning https://t.co/01MBSSNc5G


https://towardsdatascience.com/recommender-engine-under-the-hood-7869d5eab072 - (by Venkat Raman) - some intuitions behind content-based and collaborative filtering recommenders systems with a simple example in Python.

#datascience #recommenders #python https://t.co/IOA7dBlw8h


https://activewizards.com/blog/top-15-scala-libraries-for-data-science/ - (by Igor Bobriakov) a collection of useful packages for doing data science in Scala (some machine learning and datavis).

#scala #datascience https://t.co/KGvMXfDPkJ


https://rviews.rstudio.com/2018/02/14/deep-learning-rstudio-conf-2018/ - (by Andrie de Vries) - summary of Deep Learning talks from Deep rstudio::conf 2018 (RStudio conference).

#rstats #deeplearning #datascience https://t.co/5uvWcr5w6C


http://flowingdata.com/2016/03/22/comparing-ggplot2-and-r-base-graphics/ - (by Nathan Yau) - a comparison of base R graphics and ggplot2. The author seems to prefer good old base graphics - and there’s something in it…

#datavis #rstats #base #datascience https://t.co/knCktHImNq


http://www.win-vector.com/blog/2018/02/r-tip-use-seq_len-to-avoid-the-backwards-sequence-bug/ - (by John Mount) - small R tip - it’s better to use seq_len(), rather than `1:n.

#rstats #datascience https://t.co/lIC8at6OGV


https://deanattali.com/blog/shinyalert-package/ - (by Dean Attali) - simple modals for shiny applications.

#rstats #shiny #datascience https://t.co/fAcnQH4IfT


http://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf - (by Martin Zinkevich) - Rules of Machine Learning: Best Practices for ML Engineering - a collection of “best practices” for Machine Learning.

#datascience #ml #bestpractices


https://appsilondatascience.com/blog/rstats/2018/02/07/circleci.html (by Marek Rogala from Appsilon Data Science) - how to set up continuous integration for your private R projects with CircleCI. CI is also a great way to deploy a model to the production, so it should be familiar for all data scientists.

#rstats https://t.co/z7G6Cso7Qb


https://www.kdnuggets.com/2018/01/docker-help-become-more-effective-data-scientist.html - (by Hamel Husain)- a short introduction to Docker containers for data scientists.

#docker https://t.co/aaA55Sm0Gh


http://selbydavid.com/2018/01/09/neural-network/ - (by - David Selby) - building a neural network from scratch in R.

#deeplearning #rstats https://t.co/WV4bJPMPhF


http://blog.revolutionanalytics.com/2018/02/what-does-microsoft-do-with-r.html - (by David Smith) - summary of all Microsoft’s tools for R ecosystem (R in SQL Server, Visual Studio, Power BI, foreach package, and much more).

#rstats #microsoft #datascience https://t.co/wfgIwSwTdj


https://www.kdnuggets.com/2018/02/5-machine-learning-projects-overlook-feb-2018.html - (by Matthew Mayo) - I’m not a fan of “5 … things … you must/should …”, however, this is an interesting one. There’s sth about fasttext (NLP), Bayesian tools, CatBoost (xgboost alternative) and Keras. It’s mostly Python stuff.

#python https://t.co/Lfh3n9QL3g


http://andrewgelman.com/2018/02/03/teach-statistics-course-journalists/ - some guides from Andrew Gelman on teaching journalist statistics. However, I think that that knowledge can be used more broadly for teaching every group interested in statistics.

#datascience https://t.co/CoAtL54OkH


https://www.kdnuggets.com/2018/02/generalists-dominate-data-science.html - (by Russell Jurney) - data scientist (and others members of the team) should have some general skills to speed up the development cycle. Sometimes, being generalist is better than being a specialist.

#agiledatascience https://t.co/6oZU3YCoTY


https://rviews.rstudio.com/2018/02/02/cost-effective-bigquery-with-r/ - (by Roland Stevenson) - bad data representation can be costly… This article remembers me my first query on Hive - select * from table limit 10. I learned to include WHERE in every query;)

#db #bigrquery #rstats https://t.co/Selg9OEF4z


https://www.displayr.com/r-date-conversion/ - (by Mathew McLean) - another package for working with dates in R. I think that lubridate solves 99% of my problems with dates, but AsDateTime might be even easier.

devtools::install_github(“Displayr/flipTime”) #copyAndInstall

#rstats #appyrds https://t.co/4Ral1huXZ0


http://www.win-vector.com/blog/2018/01/big-cdata-news/ - (by John Mount) - new ways of reshaping the data with cdata package from Win-Vector. It allows performing a lot of not so easy transformations in just one step. It seems to be very promising.

#rstats #pkg https://t.co/3YA75JVh3o


https://www.kdnuggets.com/2018/01/how-not-lie-statistics.html - (by Kevin Gray) some thoughts on data analysis principles. How to not fool others (and yourself).

https://t.co/GRdJa1l4sH