Posts List

Get topics' words from the LDA model.

Some time ago I had to move from sparklyr to Scala for better integration with Spark, and easier collaboration with other developers in a team. Interestingly, this conversion was much easier than I thought because Spark’s DataFrame API is somewhat similar to dplyr, there’s groupBy function, agg instead of summarise, and so on. You can also use traditional, old SQL to operate on data frames. Anyway, in this post, I’ll show how to fit very simple LDA (Latent Dirichlet allocation) model, and then extract information about topic’s words.