I enjoyed work with Facebook’s fastText (https://github.com/facebookresearch/fastText) library and its R’s wrapper fastrtext (https://github.com/pommedeterresautee/fastrtext). However, I want to spend some more time with StarSpace library (also Facebook’s library for NLP). Unfortunately, there’s no R package for StarSpace!

It’s quite surprising because I there are thousands of packages. Nevertheless, this one is missing. In the end, I decided to write my wrapper - https://github.com/zzawadz/StarSpaceR.

I had some problems with compilation because of dozens of compiler flags which must be set before compilation. I think that for the first time I had to use custom configure script to set up everything paths, flags and so on. The good thing is that in the configure script you can just simply run an R script to do all the job.

I also use some code from fastrtext. The author had a brilliant idea to include a custom header with some macros to change fastText behavior, like redirect streams to the R console, rename the main function to pass CMD check and so on. If you are interested, check his code here: https://github.com/pommedeterresautee/fastrtext/blob/master/src/r_compliance.h.

The current version only supports loading model to the memory and extracting words embeddings for a set of words. Check the example below: .

# library(devtools)
# install_github("zzawadz/StarSpaceR")

# There's simple, pretrained model included in the package.
modelPath <- system.file(package = "StarSpaceR", "exdata/model_class")

model <- ssr_load_model(modelPath) # load model to the memory
model$get_vectors(c("words", "topology")) # get word embeddings

#                 [,1]         [,2]         [,3]        [,4] ...
# words    -0.00479455 -0.002737640 -0.000592433 -0.00318651 ...
# topology  0.00753618  0.000651733 -0.012981600 -0.01609830 ...

If you have any comments or ideas reach me on Twitter (@zzawadz) or fill an issue on Github (https://github.com/zzawadz/StarSpaceR/issues).


Be cautious. There’s a big problem with the package. An attempt to get the word not present in the dictionary may cause the fatal error and a crash of the whole R session. I will be examining this problem in the meantime (I think it’s a great way to learn all the internals!), but you are warned:) Do not use this on the production!

# Not run
# model$get_vectors(c("wor"))