Some intuitions behind the Information Gain, Gain ratio and Symmetrical Uncertain calculated by the FSelectorRcpp package, that can be a good proxy for correlation between unordered factors. I a big fan of using FSelectorRcpp in the exploratory phase to get some overview about the data. The main workhorse is the information_gain function which calculates… information gain. But how to interpret the output of this function? To understand this, you need to know a bit about entropy.
The main purpose of the FSelectorRcpp package is the feature selection based on the entropy function. However, it also contains a function to discretize continuous variable into nominal attributes, and we decided to slightly change the API related to this functionality, to make it more user-friendly. EDIT: Updated version (0.2.1) is on CRAN. It can be installed using: install.packages("FSelectorRcpp") The dev version can be installed using devtools: devtools::install_github("mi2-warsaw/FSelectorRcpp", ref = "dev")