My first package published on CRAN - DepthProc recently hit 20k downloads. library(cranlogs) library(ggplot2) downloads <- cran_downloads("DepthProc", from = "2014-08-21", to = "2018-06-10") ggplot(downloads) + geom_line(aes(x = date, y = cumsum(count))) + ylab("Downloads") + xlab("Date") + theme_bw() + ggtitle("DepthProc", "Download stats") There are some jumps on the line. I wondered if they all occurred just after the package release (old users updates to the new versions). Here’s some code to check this.
I had some time to look at some of my started, yet never finished projects. I found something which served me very well for some time, and it was quite useful. In my one project, I was working with a lot of large logs files. In the beginning, I was loading the whole file into R memory, and then I was processing it using stringi package and other tools. This was not the best solution.
I enjoyed work with Facebook’s fastText (https://github.com/facebookresearch/fastText) library and its R’s wrapper fastrtext (https://github.com/pommedeterresautee/fastrtext). However, I want to spend some more time with StarSpace library (also Facebook’s library for NLP). Unfortunately, there’s no R package for StarSpace! It’s quite surprising because I there are thousands of packages. Nevertheless, this one is missing. In the end, I decided to write my wrapper - https://github.com/zzawadz/StarSpaceR. I had some problems with compilation because of dozens of compiler flags which must be set before compilation.
My primary language for data analysis is still R. However, when it comes to the Big Data I prefer Scala because of it is the central language behind Spark, and gives more freedom than the sparklyr interface (I sometimes use sparklyr, but this is a topic for another post). When I started my journey with Scala I found, that it is possible to achieve a lot with knowing just the Spark’s API and a bit of SQL.
Quite recently someone asked about if it’s possible to use my dragulaR (https://github.com/zzawadz/dragulaR) package with renderUI. My first thought was that this might be quite hard. I knew that insertUI is not a problem because you can set ‘immediate = TRUE’ parameter, and just after that use ‘js$refreshDragulaR(“dragula”)’ to refresh the dragula container. However, with insertUI you cannot simply use refreshDragulaR, because it must be called when all the elements in the uiOutput are ready, and this is not so easy to do so.
Some interesting Data Science stuff found between 2018-02-01 and 2018-02-28. https://www.youtube.com/watch?v=atiYXm7JZv0 - (by J.J. Allaire from Rstudio) - Machine Learning with R and TensorFlow - a video introduction to Deep Learning in R. #deeplearning #rstats #tensorflow #datascience https://t.co/W4SjSTBYQq https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 - (by James Le) - a collection of some basic ML algorithms for newbies. Pictures in the article are pretty good. #datascience #ml https://t.co/QGOliYhXgt https://tensorflow.rstudio.com/blog/keras-customer-churn.html - (by Matt Dancho) - predicting customer churn using deep learning in R.
Some time ago I had to move from sparklyr to Scala for better integration with Spark, and easier collaboration with other developers in a team. Interestingly, this conversion was much easier than I thought because Spark’s DataFrame API is somewhat similar to dplyr, there’s groupBy function, agg instead of summarise, and so on. You can also use traditional, old SQL to operate on data frames. Anyway, in this post, I’ll show how to fit very simple LDA (Latent Dirichlet allocation) model, and then extract information about topic’s words.
Some interesting Data Science stuff found between 2018-01-16 and 2018-01-31. https://simplystatistics.org/2018/01/22/the-dslabs-package-provides-datasets-for-teaching-data-science/ - (by Rafael Irizarry) package dslab containing datasets for teaching data science. install.packages(“dslabs”) #CopyAndInstall #applyrds #rstats #datascience https://t.co/db0LUvCBx8 https://github.com/facebookresearch/StarSpace - a general purpose #NLP library from @fb_research. For now, it works only from a command line. However, it’s easy to build and use from command line. #FacebookResearch #applyrds https://t.co/hcyjVdLdIZ https://research.fb.com/facebook-open-sources-detectron/ - Facebook open sources Detectron, a platform for object detection running on the top of Caffe2.
Some interesting Data Science stuff found between 2018-01-16 and 2018-01-16. https://www.tidyverse.org/articles/2018/01/dbplyr-1-2/ - a new version of the database backend for dplyr. It allows using stringr functions in the mutate statements, and the operations are evaluated directly on the database. #applyrds #db #rstats https://t.co/76sX7KjIxR https://github.com/welovedatascience/stranger - new package for anomaly detection in R. #rstats #pkg #applyrds https://t.co/O1itP9YXML https://hughjonesd.github.io/huxtable/ - an alternative for xtable? I hope so:) Conditional formatting (e.g., make background red if the value is larger than 3) seems to be very easily achievable.
R has various packages to call other languages, like Rccp, rJava or sparklyr. Those tools significantly expand R’s capabilities, because the user doesn’t need to learn a lot of different stuff. Everything is nicely integrated into R. However, sometimes the problem is different - there’s an existing system written in some language, and R can be used to expand its possibilities. So in that scenario R must be called. In that post, I’ll describe how R can be integrated with C# program using Microsoft.