<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>zstat.pl - blog</title>
    <link>https://www.zstat.pl/</link>
    <description>Recent content on zstat.pl - blog</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Thu, 05 May 2016 21:48:51 -0700</lastBuildDate>
    
	<atom:link href="https://www.zstat.pl/index.xml" rel="self" type="application/rss+xml" />
    
    
    <item>
      <title>About</title>
      <link>https://www.zstat.pl/about/</link>
      <pubDate>Thu, 05 May 2016 21:48:51 -0700</pubDate>
      
      <guid>https://www.zstat.pl/about/</guid>
      <description>I really should put there some description&amp;hellip;</description>
    </item>
    
    <item>
      <title>Projects</title>
      <link>https://www.zstat.pl/projects/</link>
      <pubDate>Thu, 05 May 2016 21:48:51 -0700</pubDate>
      
      <guid>https://www.zstat.pl/projects/</guid>
      <description>img { display: inline; } .post-content img { display: inline; }  R packages  https://www.depthproc.zstat.pl/ https://www.dragular.zstat.pl/ https://www.customlayout.zstat.pl/ https://mi2-warsaw.github.io/FSelectorRcpp/  CRAN Author: FSelectorRcpp https://mi2-warsaw.github.io/FSelectorRcpp/
      
&amp;lsquo;Rcpp&amp;rsquo; (free of &amp;lsquo;Java&amp;rsquo;/&amp;lsquo;Weka&amp;rsquo;) implementation of &amp;lsquo;FSelector&amp;rsquo; entropy-based feature selection algorithms based on an MDL discretization (Fayyad U. M., Irani K. B.: Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In 13&amp;rsquo;th International Joint Conference on Uncertainly in Artificial Intelligence (IJCAI93), pages 1022-1029, Chambery, France, 1993.</description>
    </item>
    
    <item>
      <title>Data Scientist in Supply Chain</title>
      <link>https://www.zstat.pl/supply-chain/</link>
      <pubDate>Sun, 19 Apr 2020 21:48:51 -0700</pubDate>
      
      <guid>https://www.zstat.pl/supply-chain/</guid>
      <description>General knowledge about Supply Chain Basic  Clement J, Coldrick A, Sari J. Manufacturing data structures: building foundations for excellence with bills of materials and process information. Chichester: Wiley; 1995. Arnold JRT, Ph. D. Chapman SN, Clive LM. Introduction to Materials Management Casebook. Upper Saddle River, N.J; 2000. 138 p. Introduction to Materials Management. 8 edition. Boston: Pearson; 2016. 512 p. King P. Crack the Code Understanding safety stock and mastering its equations.</description>
    </item>
    
    <item>
      <title>REST API in R with plumber</title>
      <link>https://www.zstat.pl/2023/04/04/rest-api-in-r-with-plumber/</link>
      <pubDate>Tue, 04 Apr 2023 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2023/04/04/rest-api-in-r-with-plumber/</guid>
      <description>API and R Nowadays, it’s pretty much expected that software comes with an HTTP API interface. Every programming language out there offers a way to expose APIs or make GET/POST/PUT requests, including R. In this post, I’ll show you how to create an API using the plumber package. Plus, I’ll give you tips on how to make it more production ready - I’ll tackle scalability, statelessness, caching, and load balancing. You’ll even see how to consume your API with other tools like python, curl, and the R own httr package.</description>
    </item>
    
    <item>
      <title>Random thoughts on SQL and R</title>
      <link>https://www.zstat.pl/2023/03/23/random-thoughts-on-sql-and-r/</link>
      <pubDate>Thu, 23 Mar 2023 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2023/03/23/random-thoughts-on-sql-and-r/</guid>
      <description>The post below is a collection of useful (but kind of random) thoughts about R and SQL. It should serve more like a starting point, rather than exhaustive discussion of the presented topics. Note that it mostly describes SQLite - many concepts will be similar for different databases, but they might not be identical. You’ve been warned!
R, SQL, dbplyr and ORM When we are talking about accessing SQL from a programming language, it is good to know a little bit about an ORM concept.</description>
    </item>
    
    <item>
      <title>Information gain in FSelectorRcpp</title>
      <link>https://www.zstat.pl/2020/07/28/information-gain-in-fselectorrcpp/</link>
      <pubDate>Tue, 28 Jul 2020 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2020/07/28/information-gain-in-fselectorrcpp/</guid>
      <description>Some intuitions behind the Information Gain, Gain ratio and Symmetrical Uncertain calculated by the FSelectorRcpp package, that can be a good proxy for correlation between unordered factors.
 I a big fan of using FSelectorRcpp in the exploratory phase to get some overview about the data. The main workhorse is the information_gain function which calculates… information gain. But how to interpret the output of this function?
To understand this, you need to know a bit about entropy.</description>
    </item>
    
    <item>
      <title>Active learning - part 1</title>
      <link>https://www.zstat.pl/2020/07/19/active-learning-part-1/</link>
      <pubDate>Sun, 19 Jul 2020 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2020/07/19/active-learning-part-1/</guid>
      <description>I just started exploring the ‘active learning’ topic. It’s a very handy tool when the number of data points to build a model is limited and labelling new points is costly. It allows to determine which points should be labelled next to bring the most gain in model performance. In this post I will cover some of my small experiments in this area.
Caution!
If you’re interested in ready-to-use tools for active learning, this post might not be for you - I don’t cover any framework here.</description>
    </item>
    
    <item>
      <title>Some notes on Apache Spark</title>
      <link>https://www.zstat.pl/2018/12/19/some-notes-on-apache-spark/</link>
      <pubDate>Wed, 19 Dec 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2018/12/19/some-notes-on-apache-spark/</guid>
      <description>Some notes based on two videos describing Apache Spark concepts.
 https://www.youtube.com/watch?v=AoVmgzontXo - Spark SQL: A Compiler from Queries to RDDs: Spark Summit East talk by Sameer Agarwal https://www.youtube.com/watch?v=vfiJQ7wg81Y - Top 5 Mistakes When Writing Spark Applications.  Spark SQL: A Compiler from Queries to RDDs: Spark Summit East talk by Sameer Agarwal  https://youtu.be/AoVmgzontXo?t=641 - an example of the optimization done by Catalyst (better to watch the whole video to get better understanding of the whole context).</description>
    </item>
    
    <item>
      <title>Reproducible package management in R</title>
      <link>https://www.zstat.pl/2018/10/01/reproducible-package-management-in-r/</link>
      <pubDate>Mon, 01 Oct 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2018/10/01/reproducible-package-management-in-r/</guid>
      <description>Reproducibility is a severe issue. Writing code usually helps, because the code is like a journal of your work, especially if you combine it with literate programming techniques, which in R’s world is so easy to do (Rmarkdown, knitr).
However, there’s one thing, which can cause some problems - the packages versions. Some of the old code might not work, because there were changes in the API or in the behavior of the packages (I’m looking at you - dplyr).</description>
    </item>
    
    <item>
      <title>customLayout 0.2.0 is now on CRAN</title>
      <link>https://www.zstat.pl/2018/09/28/customlayout-0-2-0-on-cran/</link>
      <pubDate>Fri, 28 Sep 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2018/09/28/customlayout-0-2-0-on-cran/</guid>
      <description>The new version of my customLayout package is on CRAN. It now supports working with PowerPoint slides using layouts created in R. For more information please read the vignette here.
It also extends the idea of adjusting the font size for the flextables (see this post) - check the phl_adjust_table function.
I also created a simple roadmap which describes my next steps. Please note that this package is still under development.</description>
    </item>
    
    <item>
      <title>Functional boxplot - some intuitions.</title>
      <link>https://www.zstat.pl/2018/09/04/functional-boxplot/</link>
      <pubDate>Tue, 04 Sep 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2018/09/04/functional-boxplot/</guid>
      <description>Warning! This post describes some intuitions behind the idea of the functional boxplots. I think that it is a very useful technique, but all statistical tools should be used with caution. Reading only one blog post might be not enough to apply them in practice. At the end of the post, I added an information about useful resources covering this topic in a more rigid way.
A classical boxplot is an excellent tool for the quick summary of the data.</description>
    </item>
    
    <item>
      <title>Notes on tidyeval</title>
      <link>https://www.zstat.pl/2018/09/02/dplyr-vs-seplyr/</link>
      <pubDate>Sun, 02 Sep 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2018/09/02/dplyr-vs-seplyr/</guid>
      <description>I recently watched the “Tidy eval: Programming with dplyr, tidyr, and ggplot2” video. It’s an excellent introduction to the concept of the tidy evaluation, which is the core concept for programming with dplyr and friends.
In this video, Hadley showed on the slide the grouped_mean function (12:48). An attempt to implement this functions might be a good exercise in tidy evaluation, and an excellent opportunity to compare this approach with standard evaluation rules provided by the seplyr package.</description>
    </item>
    
    <item>
      <title>Common problems with rJava</title>
      <link>https://www.zstat.pl/2018/08/25/common-problems-with-rjava/</link>
      <pubDate>Sat, 25 Aug 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2018/08/25/common-problems-with-rjava/</guid>
      <description>rJava is an essential package because it allows accessing rich Java world. There are at least dozens of packages on CRAN which depends on Java (e.g., the excellent rscala for calling scala code from R). However, sometimes installing rJava might be quite problematic. In this post, I’ll focus on the pitfalls found on Linux/Ubuntu, but if you are on Windows following instructions from here, should solve your problem.
R CMD javareconf One of the first thing that you should try if you have a problem with rJava is to check if you have java installed on your system, by running java -version in the console.</description>
    </item>
    
    <item>
      <title>FSelectorRcpp 0.2.1 release</title>
      <link>https://www.zstat.pl/2018/08/06/fselectorrcpp-0-2-1-release/</link>
      <pubDate>Mon, 06 Aug 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2018/08/06/fselectorrcpp-0-2-1-release/</guid>
      <description>New release of FSelectorRcpp (0.2.1) is on CRAN. I described near all the new functionality here. The last thing that we added just before release is an extract_discretize_transformer. It can be used to get a small object from the result of discretize function to transform the new data using estimated cutpoints. See the example below.
library(FSelectorRcpp) set.seed(123) idx &amp;lt;- sort(sample.int(150, 100)) iris1 &amp;lt;- iris[idx, ] iris2 &amp;lt;- iris[-idx, ] disc &amp;lt;- discretize(Species ~ .</description>
    </item>
    
    <item>
      <title>Spark Streaming and Mllib</title>
      <link>https://www.zstat.pl/2018/08/03/spark-streaming-ml/</link>
      <pubDate>Fri, 03 Aug 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2018/08/03/spark-streaming-ml/</guid>
      <description>In my first post on Spark Streaming, I described how to use Netcast to emulate incoming stream. But later I found this question on StackOverflow. In one of the answer, there’s a piece of code which shows how to emulate incoming stream programmatically, without external tools like Netcat, it makes life much more comfortable.
In this post, I describe how to fit a model using Spark’s MLlib, and then use it on the incoming data, and save the result in a parquet file.</description>
    </item>
    
    <item>
      <title>Spark Streaming - basic setup</title>
      <link>https://www.zstat.pl/2018/08/01/spark-streaming-basic-setup/</link>
      <pubDate>Wed, 01 Aug 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2018/08/01/spark-streaming-basic-setup/</guid>
      <description>Streaming data is quite a hot topic right now, so I decided to write something on this topic on my blog. I’m new in that area, but I don’t think this is much different than standard batch processing. Of course, I’m more focused on building models and other ML stuff, not all the administration things, like setting up Kafka, making everything fault tolerant, etc.
In this post, I’ll describe a very basic app, not very different than the one described in the https://spark.</description>
    </item>
    
    <item>
      <title>Upcoming changes in FSelectorRcpp-0.2.0</title>
      <link>https://www.zstat.pl/2018/07/31/upcoming-changes-in-fselectorrcpp-0-2-0/</link>
      <pubDate>Tue, 31 Jul 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2018/07/31/upcoming-changes-in-fselectorrcpp-0-2-0/</guid>
      <description>The main purpose of the FSelectorRcpp package is the feature selection based on the entropy function. However, it also contains a function to discretize continuous variable into nominal attributes, and we decided to slightly change the API related to this functionality, to make it more user-friendly.
EDIT: Updated version (0.2.1) is on CRAN. It can be installed using: install.packages(&amp;quot;FSelectorRcpp&amp;quot;) The dev version can be installed using devtools:
devtools::install_github(&amp;quot;mi2-warsaw/FSelectorRcpp&amp;quot;, ref = &amp;quot;dev&amp;quot;)</description>
    </item>
    
    <item>
      <title>Partitioning in Spark</title>
      <link>https://www.zstat.pl/2018/07/29/partitioning-in-spark/</link>
      <pubDate>Sun, 29 Jul 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2018/07/29/partitioning-in-spark/</guid>
      <description>Spark is delightful for Big Data analysis. It allows using very high-level code to perform a large variety of operations. It also supports SQL, so you don’t need to learn a lot of new stuff to start being productive in Spark (of course assuming that you have some knowledge of SQL).
However, if you want to use Spark more efficiently, you need to learn a lot of concepts, especially about data partitioning, relations between partitions (narrow dependencies vs.</description>
    </item>
    
    <item>
      <title>Scala in knitr</title>
      <link>https://www.zstat.pl/2018/07/27/scala-in-knitr/</link>
      <pubDate>Fri, 27 Jul 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2018/07/27/scala-in-knitr/</guid>
      <description>I use blogdown to write my blog posts. It allows me to create a Rmarkdown file, and then execute all the code and format the output. It has great support for R (it’s R native) and Python. Some other languages are also supported, but the functionality is pretty limited. For example, each code chunk is evaluated in a separate session (I’m not sure if it’s the case for all engines, I read about this in https://yihui.</description>
    </item>
    
    <item>
      <title>Conda</title>
      <link>https://www.zstat.pl/2018/07/26/conda/</link>
      <pubDate>Thu, 26 Jul 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2018/07/26/conda/</guid>
      <description>One of the most important things in software development and data analysis is to manage your dependencies, to make sure that your work can be easily replicated or deployed to the production. In R’s ecosystem, there are plenty of tools and materials on this topic. There’s a short list:
 https://rstudio.github.io/packrat/ https://ropenscilabs.github.io/r-docker-tutorial/ https://mran.microsoft.com/ https://rviews.rstudio.com/2018/01/18/package-management-for-reproducible-r-code/ https://cran.r-project.org/web/views/ReproducibleResearch.html  However, I’m starting to spend a bit more time in the Python world, where I don’t have a lot of experience.</description>
    </item>
    
    <item>
      <title>ggplot2 with 2 y-axes</title>
      <link>https://www.zstat.pl/2018/07/19/ggplot2-with-2-y-axes/</link>
      <pubDate>Thu, 19 Jul 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2018/07/19/ggplot2-with-2-y-axes/</guid>
      <description>On one of my R workshops, someone asked me about creating a ggplot2 with two Y-axes. I do not use such types of plots, because I read somewhere that they have some problems with perception. However, I committed myself to check if it’s possible to create such visualizations using ggplot2.
Without a lot of digging, I found this answer from the author of the ggplot2 package on StackOverflow - https://stackoverflow.com/a/3101876. He thinks that those types of plots are bad, fundamentally flawed, and you shouldn’t use them, and ggplot2 does not allow to create them.</description>
    </item>
    
    <item>
      <title>Create pptx in R using officer package</title>
      <link>https://www.zstat.pl/2018/07/12/create-pptx-in-r-using-officer-package/</link>
      <pubDate>Thu, 12 Jul 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2018/07/12/create-pptx-in-r-using-officer-package/</guid>
      <description>When you need to create a pptx file in R, the best way is to use an officer package. officer is quite easy to use and the documentation is quite extensive so that I won’t describe the basics (https://davidgohel.github.io/officer/articles/powerpoint.html - link to the officer‘s docs). However, I always have some problems with specifying the proper parameters for the ph_with_* functions, especially the type and index parameters. Of course one can use the ph_with_*_at versions, but it requires to manually adjust all the coordinates, which might be even more problematic.</description>
    </item>
    
    <item>
      <title>Type-S Errors.</title>
      <link>https://www.zstat.pl/2018/07/11/type-s-errors/</link>
      <pubDate>Wed, 11 Jul 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2018/07/11/type-s-errors/</guid>
      <description>I’m a big fan of Andrew’s Gelman blog (http://andrewgelman.com/). I think that my statistical intuition is way much better after reading it. For example, there’s a post about different types of errors in NHST, not limited to the widely known Type I and Type II errors - http://andrewgelman.com/2004/12/29/type_1_type_2_t/. You should read this before continuing because the rest of this post will be based on it, and the article which is linked in that post (http://www.</description>
    </item>
    
    <item>
      <title>Calculate the font size for the R&#39;s flextable package.</title>
      <link>https://www.zstat.pl/2018/07/07/calculate-the-font-size-for-the-r-s-flextable-package/</link>
      <pubDate>Sat, 07 Jul 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2018/07/07/calculate-the-font-size-for-the-r-s-flextable-package/</guid>
      <description>The flextable table is an excellent package for creating beautiful tables, especially if you want to export them to the pptx file. However, it might be a bit problematic to set the proper font size for the given size of the table.
E.g., I have a table with five rows (+ 1 header row), and I want to create a table which height is 2 inches. What’s the best font size for this setting?</description>
    </item>
    
    <item>
      <title>Lua</title>
      <link>https://www.zstat.pl/2018/07/03/lua/</link>
      <pubDate>Tue, 03 Jul 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2018/07/03/lua/</guid>
      <description>In one of my projects, I had to choose which language I would use to write some small module - C, C++ or Lua. I didn’t want to use C, because the module required a lot of string handling. Another option was C++, but some other parts of the system were written in Lua, so I thought it would be much easier to integrate everything without switching languages, and I had heard some good things about Lua, so in the end, I decided to give it a shot.</description>
    </item>
    
    <item>
      <title>Some materials for learning a Deep learning and NLP</title>
      <link>https://www.zstat.pl/2018/06/22/some-materials-for-learning-a-deep-learning-and-nlp/</link>
      <pubDate>Fri, 22 Jun 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2018/06/22/some-materials-for-learning-a-deep-learning-and-nlp/</guid>
      <description>Here is my short list of materials related to Deep Learning and NLP that I found useful during my exploration of this topics. I don’t think that this list is exhaustive, but maybe you will find something useful;)
Videos: General introduction:  http://www.fast.ai/ - an introduction to Deep Learning focused on examples. There are a lot of Jupyter notebooks supplementing the course - https://github.com/fastai/courses/tree/master/deeplearning1/nbs. https://www.coursera.org/specializations/deep-learning - Deep Learning specialization from Andrew Ng.</description>
    </item>
    
    <item>
      <title>Deploying Shiny Apps</title>
      <link>https://www.zstat.pl/2018/06/21/deploying-shiny-apps/</link>
      <pubDate>Thu, 21 Jun 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2018/06/21/deploying-shiny-apps/</guid>
      <description>Shiny apps are a very convenient way of sharing your work with others, especially with non-technical co-workers. The best way is to deploy your app somewhere on the internet (or intranet), so the user won’t need to install R, packages, and other stuff, let alone the need for easy updates.
There’re a few ways to host your applications, and all of them comes with some pros and cons:
Shinyapps http://www.shinyapps.io/ The most natural solution is to put your app on the shinyapps.</description>
    </item>
    
    <item>
      <title>Dynamic modules in shiny - Part II</title>
      <link>https://www.zstat.pl/2018/06/19/dynamic-modules-in-shiny---part-ii/</link>
      <pubDate>Tue, 19 Jun 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2018/06/19/dynamic-modules-in-shiny---part-ii/</guid>
      <description>In the previous post about shiny modules, I described how to dynamically add new modules, or even select a module type from a few possibilities. The main problem was that the new module is added inside an observer function, so we cannot directly get the returned value of the new modules.
However, we can solve this problem quite easily using reactiveValues. We just add a parameter to the module’s server function in which we will send the reactive value used to communicate between the main application and the module.</description>
    </item>
    
    <item>
      <title>DepthProc hit 20k downloads.</title>
      <link>https://www.zstat.pl/2018/06/11/depthproc-hit-20k-downloads./</link>
      <pubDate>Mon, 11 Jun 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2018/06/11/depthproc-hit-20k-downloads./</guid>
      <description>My first package published on CRAN - DepthProc recently hit 20k downloads.
library(cranlogs) library(ggplot2) downloads &amp;lt;- cran_downloads(&amp;quot;DepthProc&amp;quot;, from = &amp;quot;2014-08-21&amp;quot;, to = &amp;quot;2018-06-10&amp;quot;) ggplot(downloads) + geom_line(aes(x = date, y = cumsum(count))) + ylab(&amp;quot;Downloads&amp;quot;) + xlab(&amp;quot;Date&amp;quot;) + theme_bw() + ggtitle(&amp;quot;DepthProc&amp;quot;, &amp;quot;Download stats&amp;quot;) There are some jumps on the line. I wondered if they all occurred just after the package release (old users updates to the new versions). Here’s some code to check this.</description>
    </item>
    
    <item>
      <title>Caching function&#39;s result based on file modification time in R.</title>
      <link>https://www.zstat.pl/2018/06/05/caching-functions-result-based-on-file-modification-time-in-r./</link>
      <pubDate>Tue, 05 Jun 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2018/06/05/caching-functions-result-based-on-file-modification-time-in-r./</guid>
      <description>I had some time to look at some of my started, yet never finished projects. I found something which served me very well for some time, and it was quite useful.
In my one project, I was working with a lot of large logs files. In the beginning, I was loading the whole file into R memory, and then I was processing it using stringi package and other tools. This was not the best solution.</description>
    </item>
    
    <item>
      <title>Pimp My Library Pattern in Scala</title>
      <link>https://www.zstat.pl/2018/05/24/pimp-my-library-pattern-in-scala/</link>
      <pubDate>Thu, 24 May 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2018/05/24/pimp-my-library-pattern-in-scala/</guid>
      <description>My primary language for data analysis is still R. However, when it comes to the Big Data I prefer Scala because of it is the central language behind Spark, and gives more freedom than the sparklyr interface (I sometimes use sparklyr, but this is a topic for another post).
When I started my journey with Scala I found, that it is possible to achieve a lot with knowing just the Spark’s API and a bit of SQL.</description>
    </item>
    
    <item>
      <title>StarSpace in R</title>
      <link>https://www.zstat.pl/2018/05/24/starspace-in-r/</link>
      <pubDate>Thu, 24 May 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2018/05/24/starspace-in-r/</guid>
      <description>I enjoyed work with Facebook’s fastText (https://github.com/facebookresearch/fastText) library and its R’s wrapper fastrtext (https://github.com/pommedeterresautee/fastrtext). However, I want to spend some more time with StarSpace library (also Facebook’s library for NLP). Unfortunately, there’s no R package for StarSpace!
It’s quite surprising because I there are thousands of packages. Nevertheless, this one is missing. In the end, I decided to write my wrapper - https://github.com/zzawadz/StarSpaceR.
I had some problems with compilation because of dozens of compiler flags which must be set before compilation.</description>
    </item>
    
    <item>
      <title>dragulaR with renderUI</title>
      <link>https://www.zstat.pl/2018/05/23/dragular-renderui/</link>
      <pubDate>Wed, 23 May 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2018/05/23/dragular-renderui/</guid>
      <description>Quite recently someone asked about if it’s possible to use my dragulaR (https://github.com/zzawadz/dragulaR) package with renderUI. My first thought was that this might be quite hard. I knew that insertUI is not a problem because you can set ‘immediate = TRUE’ parameter, and just after that use ‘js$refreshDragulaR(“dragula”)’ to refresh the dragula container. However, with insertUI you cannot simply use refreshDragulaR, because it must be called when all the elements in the uiOutput are ready, and this is not so easy to do so.</description>
    </item>
    
    <item>
      <title>Data Science - News - 2018-02-28</title>
      <link>https://www.zstat.pl/2018/02/28/news-2018-02-28rmd/</link>
      <pubDate>Wed, 28 Feb 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2018/02/28/news-2018-02-28rmd/</guid>
      <description>Some interesting Data Science stuff found between 2018-02-01 and 2018-02-28. https://www.youtube.com/watch?v=atiYXm7JZv0 - (by J.J. Allaire from Rstudio) - Machine Learning with R and TensorFlow - a video introduction to Deep Learning in R.
#deeplearning #rstats #tensorflow #datascience https://t.co/W4SjSTBYQq
https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 - (by James Le) - a collection of some basic ML algorithms for newbies. Pictures in the article are pretty good.
#datascience #ml https://t.co/QGOliYhXgt
https://tensorflow.rstudio.com/blog/keras-customer-churn.html - (by Matt Dancho) - predicting customer churn using deep learning in R.</description>
    </item>
    
    <item>
      <title>Get topics&#39; words from the LDA model.</title>
      <link>https://www.zstat.pl/2018/02/07/scala-spark-get-topics-words-from-lda-model/</link>
      <pubDate>Wed, 07 Feb 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2018/02/07/scala-spark-get-topics-words-from-lda-model/</guid>
      <description>Some time ago I had to move from sparklyr to Scala for better integration with Spark, and easier collaboration with other developers in a team. Interestingly, this conversion was much easier than I thought because Spark’s DataFrame API is somewhat similar to dplyr, there’s groupBy function, agg instead of summarise, and so on. You can also use traditional, old SQL to operate on data frames. Anyway, in this post, I’ll show how to fit very simple LDA (Latent Dirichlet allocation) model, and then extract information about topic’s words.</description>
    </item>
    
    <item>
      <title>RNews - 2018-01-31</title>
      <link>https://www.zstat.pl/2018/01/31/news-2018-01-31rmd/</link>
      <pubDate>Wed, 31 Jan 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2018/01/31/news-2018-01-31rmd/</guid>
      <description>Some interesting Data Science stuff found between 2018-01-16 and 2018-01-31. https://simplystatistics.org/2018/01/22/the-dslabs-package-provides-datasets-for-teaching-data-science/ - (by Rafael Irizarry) package dslab containing datasets for teaching data science.
install.packages(“dslabs”) #CopyAndInstall
#applyrds #rstats #datascience https://t.co/db0LUvCBx8
https://github.com/facebookresearch/StarSpace - a general purpose #NLP library from @fb_research. For now, it works only from a command line. However, it’s easy to build and use from command line.
#FacebookResearch #applyrds https://t.co/hcyjVdLdIZ
https://research.fb.com/facebook-open-sources-detectron/ - Facebook open sources Detectron, a platform for object detection running on the top of Caffe2.</description>
    </item>
    
    <item>
      <title>RNews - 2018-01-16</title>
      <link>https://www.zstat.pl/2018/01/16/news-2018-01-16rmd/</link>
      <pubDate>Tue, 16 Jan 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2018/01/16/news-2018-01-16rmd/</guid>
      <description>Some interesting Data Science stuff found between 2018-01-16 and 2018-01-16. https://www.tidyverse.org/articles/2018/01/dbplyr-1-2/ - a new version of the database backend for dplyr. It allows using stringr functions in the mutate statements, and the operations are evaluated directly on the database. #applyrds #db #rstats https://t.co/76sX7KjIxR
https://github.com/welovedatascience/stranger - new package for anomaly detection in R. #rstats #pkg #applyrds https://t.co/O1itP9YXML
https://hughjonesd.github.io/huxtable/ - an alternative for xtable? I hope so:) Conditional formatting (e.g., make background red if the value is larger than 3) seems to be very easily achievable.</description>
    </item>
    
    <item>
      <title>Call R from C#</title>
      <link>https://www.zstat.pl/2017/12/30/call-r-from-c/</link>
      <pubDate>Sat, 30 Dec 2017 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2017/12/30/call-r-from-c/</guid>
      <description>R has various packages to call other languages, like Rccp, rJava or sparklyr. Those tools significantly expand R’s capabilities, because the user doesn’t need to learn a lot of different stuff. Everything is nicely integrated into R.
However, sometimes the problem is different - there’s an existing system written in some language, and R can be used to expand its possibilities. So in that scenario R must be called.
In that post, I’ll describe how R can be integrated with C# program using Microsoft.</description>
    </item>
    
    <item>
      <title>Dynamic modules in shiny - Part I</title>
      <link>https://www.zstat.pl/2017/12/22/dynamic-modules-in-shiny-part-i/</link>
      <pubDate>Fri, 22 Dec 2017 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2017/12/22/dynamic-modules-in-shiny-part-i/</guid>
      <description>The best way to organize shiny app is to use modules. They are also an excellent choice when you need to replicate some functionality few times. For example, when you want to compare some plots with different parameters, the modules are your way to go. In basic usage, modules require a direct call of callModule in a server and a place in UI. However, sometimes you don’t know how many instances of a given module are needed.</description>
    </item>
    
    <item>
      <title>RNews - 2017-12-19</title>
      <link>https://www.zstat.pl/2017/12/19/rnews-2017-12-19/</link>
      <pubDate>Tue, 19 Dec 2017 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2017/12/19/rnews-2017-12-19/</guid>
      <description>This is the last (planned) RNews in 2017.
In this week there’s something about reproducible research (anyone should do this!), and deep learning - one tutorial, and one general article about image classification in radiology.
Books  http://www.britishecologicalsociety.org/wp-content/uploads/2017/12/guide-to-reproducible-code.pdf - this book is a guide to writing reproducible code using R. There’s an in Ecology and Evolution in the title, but the scope of this writing should not be limited to only that areas:)   Articles:  https://tensorflow.</description>
    </item>
    
    <item>
      <title>RNews - 2017-12-12</title>
      <link>https://www.zstat.pl/2017/12/12/rnews-2017-12-12/</link>
      <pubDate>Tue, 12 Dec 2017 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2017/12/12/rnews-2017-12-12/</guid>
      <description>This list is quite small because I hit the history limit on Twitter:( Nevertheless, there are some exciting things - e.g. videos from H2O World 2017.
Videos  https://www.youtube.com/playlist?list=PLNtMya54qvOHQs2ZmV-pPSW_etMUykE0_ - list of all videos from H2O World 2017. There are at least a few videos which seem to be worth watching. Unfortunately, I didn’t have time to do so:(.   Articles:  https://rviews.rstudio.com/2017/12/11/r-and-tensorflow/ - this article claims that installing keras (for deep learning) is as simple as calling keras::install_keras().</description>
    </item>
    
    <item>
      <title>RNews - 2017-12-05</title>
      <link>https://www.zstat.pl/2017/12/05/rnews-2017-12-05/</link>
      <pubDate>Tue, 05 Dec 2017 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2017/12/05/rnews-2017-12-05/</guid>
      <description>Another pack of news from R’s world.
Aggregators:  https://rweekly.org/ - this is a great aggregator - even better than mine;)   Articles:  http://smarterpoland.pl/index.php/2017/12/explain-explain-explain/ - a quick summary of the packages which can be used to explain the results of various models (lm, glm, xgboost, etc.). Unfortunately, there’s nothing about LIME, which is more general purpose package for explaining models (https://cran.r-project.org/web/packages/lime/index.html).
 https://christophm.github.io/interpretable-ml-book/ - an online book about explaining models predictions.</description>
    </item>
    
    <item>
      <title>RNews - 2017-11-27</title>
      <link>https://www.zstat.pl/2017/11/27/rnews-2017-11-27/</link>
      <pubDate>Mon, 27 Nov 2017 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2017/11/27/rnews-2017-11-27/</guid>
      <description>Next portion of exciting information from R’s world (in fact, not only R’s but it’s the main thing here).
Aggregators:  https://trello.com/b/rbpEfMld/data-science - a huge collection of the resources related to the Data Science (R, Python, Big Data, and so on).   Articles:  https://drsimonj.svbtle.com/visualising-residuals - long post about visualization of the residuals. The “Multiple Regression” section is worth to check - the analysis is quite impressive.
 http://www.tandfonline.com/doi/full/10.1080/01621459.2017.1311264 - the old saga about why p-values are evil continues.</description>
    </item>
    
    <item>
      <title>Drag and Drop in Shiny</title>
      <link>https://www.zstat.pl/2017/11/24/drag-and-drop-in-shiny/</link>
      <pubDate>Fri, 24 Nov 2017 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2017/11/24/drag-and-drop-in-shiny/</guid>
      <description>Shiny application is a great way to deliver the result of statistical analysis, especially when it must be reproducible. I don’t know why, but people prefer to use graphic interface, rather than run scripts;)
One of my clients recently requested to have an ability to move around elements in the dashboard. There are some R packages to attain such effect, and possibly it would be a bit easier to use them, but I had an unfinished project called dragulaR.</description>
    </item>
    
    <item>
      <title>RNews - 2017-11-19</title>
      <link>https://www.zstat.pl/2017/11/21/rnews-19-11-2017/</link>
      <pubDate>Tue, 21 Nov 2017 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2017/11/21/rnews-19-11-2017/</guid>
      <description>Here is a small pack of some engaging news from R’s world gathered from Twitter in the last week.
R Tips  http://www.storybench.org/convert-google-doc-rmarkdown-publish-github-pages/ - title speaks for itself - it shows how to convert GoogleDocs to Rmarkdown format.
 http://blog.jumpingrivers.com/posts/2017/speed_package_installation/ - in short - options(Ncpus = 6) - allows to use multicores in install.packages, which can significantly speed up the packages’ installation process.
   Packages  https://www.tidyverse.org/articles/2017/11/withr-2.1.0/ - allows a user to call code in a special environment with some global variables alerted.</description>
    </item>
    
    <item>
      <title>RNews - 2017-11-05</title>
      <link>https://www.zstat.pl/2017/11/05/rnews-05-11-2017rmd/</link>
      <pubDate>Sun, 05 Nov 2017 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2017/11/05/rnews-05-11-2017rmd/</guid>
      <description>Here is a small pack of some engaging news from R’s world gathered from Twitter in the last week.
Articles  https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/ - minimal amount of knowledge about Unicode. The article is targeted to developers, but it might also be useful for Data Scientists. I highly recommend other posts on that blog - the host is one of the creators of VBA in Excel, Trello, and StackOverflow.
 https://imgur.com/gallery/GD5gi - visualization of the sorting algorithms.</description>
    </item>
    
    <item>
      <title>Another note on memory management in R</title>
      <link>https://www.zstat.pl/2017/06/18/another-note-on-memory-management-in-r/</link>
      <pubDate>Sun, 18 Jun 2017 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2017/06/18/another-note-on-memory-management-in-r/</guid>
      <description>In the last post where I described one issue related to usage of R&amp;rsquo;s data structures inside C++ code. The problem was caused by memory management system in R, which allows R to store two variables in the same place in the memory just after making an assignment.
See the following snippet:
#include &amp;lt;Rcpp.h&amp;gt; #include &amp;lt;vector&amp;gt; // [[Rcpp::plugins(cpp11)]] // [[Rcpp::export]] void change(Rcpp::NumericVector x) { // The C++ function does not return anything (it&#39;s void), // it only modifies the first element of the vector.</description>
    </item>
    
    <item>
      <title>R&#39;s structures inside C&#43;&#43;</title>
      <link>https://www.zstat.pl/2017/06/12/r-s-structures-inside-c/</link>
      <pubDate>Mon, 12 Jun 2017 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2017/06/12/r-s-structures-inside-c/</guid>
      <description>Connecting R with C++ is very easy because near all work needed to glue the code together is done by Rcpp. However, there are some very dangerous traps. C++, when used improperly can mess up a lot of things in the R session. In this post, I want to show you how to write secure C++ code to reduce the chances of breaking anything in R.
References. When working with R, we usually do not care if the object is copied or not.</description>
    </item>
    
    <item>
      <title>The power of Progress Bar</title>
      <link>https://www.zstat.pl/2017/05/17/the-power-of-progress-bar/</link>
      <pubDate>Wed, 17 May 2017 00:00:00 +0000</pubDate>
      
      <guid>https://www.zstat.pl/2017/05/17/the-power-of-progress-bar/</guid>
      <description>In the last post, I wrote some notes about code optimization using Rcpp and C++. However, I forgot to add one main thought related to this topic:
“The First Rule of Program Optimization: Don’t do it. The Second Rule of Program Optimization (for experts only!): Don’t do it yet.”
I agree with that statement, and I made this mistake more than dozen times. I spent hours on optimizing the code, trying to get the results faster, and I most cases I succeed.</description>
    </item>
    
  </channel>
</rss>