Topic Modeling
About
Chapter 4: Topic Modeling covers the case study on Topic Modeling of Documents using Three Different Tools. It is divided into three parts: 4A, 4B, and 4C. 4A case study uses Topic-Modeling-Tool, 4B case study uses RapidMiner, and 4C case study uses R programming language to perform topic modeling.
How to Cite
Please cite this compendium as: Lamba, Manika, & Madhusudhan, Margam. (2021). Topic Modeling of Documents using Three Different Tools (Version 1.1). https://doi.org/10.5281/zenodo.5203494
Contents
The compendium contains the data, code, and notebook associated with the case studies. It is organized as follows:
- The
4a_dataset\
folder contains the data for 4A case study.- The
4a_supplementary.docx
file contains the supplementary data associated with 4A case study.
- The
- The
4b_dataset\
folder contains the data for 4B case study.- The
4b_supplementary.pdf
file contains the supplementary data associated with 4B case study.
- The
- The
4c_dataset.csv
file contains the data for 4C case study. - The
stm.R
file contatins the R code for 4C case study. - The
Case_Study_4C.ipynb
file contatins the Jupyter notebook for 4C case study.
In addition to the provided sample data, you can use dataset from Appendix A, Appendix B, Appendix C, Curated Datasets, or your own dataset to perform topic modeling.
How to Download or Install
There are several ways to use the compendium’s contents and reproduce the analysis:
-
Download the compendium as a zip archive from the GitHub repository.
- After unpacking the downloaded zip archive, you can explore the files on your computer.
-
Reproduce the analysis in the cloud without having to install any software. The same Docker container replicating the computational environment used by the authors can be run using BinderHub on mybinder.org:
-
Click RStudio: to launch an interactive RStudio session in your web browser for hands-on practice for 4C case study. In the virtual environment, open the
stm.R
file to run the code. -
Click Jupyter+R: to launch an interactive Jupyter Notebook session in your web browser using R kernel. When you execute code within the notebook, the results appear beneath the code.
^ Limitations of Binder
- The server has limited memory so you cannot load large datasets or run big computations.
- Binder is meant for interactive and ephemeral interactive coding so an instance will die after 10 minutes of inactivity.
- An instance cannot be kept alive for more than 12 hours.
-
Visualize the Results
A storyboard is built to summarize the visualizations for 13 case studies performed in the book. To know more about the case studies, and the methodology used to get the results, read the book.
Licenses
Text and Figures: ©2021 Lamba and Madhusudhan - all rights reserved, unless stated otherwise.
Code, Data, Hex-sticker: MIT License
- Posted on:
- July 20, 2021
- Length:
- 3 minute read, 455 words
- Categories:
- Topic Modeling Tool R RapidMiner
- Tags:
- topic modeling
- See Also: