Posts List

Information gain in FSelectorRcpp

Some intuitions behind the Information Gain, Gain ratio and Symmetrical Uncertain calculated by the FSelectorRcpp package, that can be a good proxy for correlation between unordered factors. I a big fan of using FSelectorRcpp in the exploratory phase to get some overview about the data. The main workhorse is the information_gain function which calculates… information gain. But how to interpret the output of this function? To understand this, you need to know a bit about entropy.

Active learning - part 1

I just started exploring the ‘active learning’ topic. It’s a very handy tool when the number of data points to build a model is limited and labelling new points is costly. It allows to determine which points should be labelled next to bring the most gain in model performance. In this post I will cover some of my small experiments in this area. Caution! If you’re interested in ready-to-use tools for active learning, this post might not be for you - I don’t cover any framework here.

StarSpace in R

I enjoyed work with Facebook’s fastText ( library and its R’s wrapper fastrtext ( However, I want to spend some more time with StarSpace library (also Facebook’s library for NLP). Unfortunately, there’s no R package for StarSpace! It’s quite surprising because I there are thousands of packages. Nevertheless, this one is missing. In the end, I decided to write my wrapper - I had some problems with compilation because of dozens of compiler flags which must be set before compilation.