This storyboard summaries the visualizations for 13 case studies performed in the book, “Text Mining for Information Professionals: An Uncharted Territory”. To know more about the case studies, and the methodology used to get the results, download the book from SpringerNature. To cite this storyboard use: ©2021 Lamba and Madhusudhan - all rights reserved, unless stated otherwise.
Click on the hyperlink in the figure legend to open the article associated with the citation.
5A Network Text Analysis of Documents using bibliometrix package of R
5B Network Text Analysis of Documents using textnets package of R
Case_Study | Title | Virtual_RStudio_Server |
---|---|---|
1B | Clustering of Documents using R | link |
4C | Topic Modeling of Documents using R | link |
5B | Network Text Analysis of Documents using Textnets package of R | link |
6B | Burst Detection of Documents using R | link |
7B | Sentiment Analysis of Documents using R | link |
9A | Build a Dashboard using R | link |
Reproduce the analysis in the cloud without having to install any software. The computational environment used by the authors runs using BinderHub. Click the hyperlink to open an interactive virtual RStudio environment for hands-on practice for the case studies that used R programming language. In the virtual environment, open the .R
file to run the code.
In the virtual environment, open the .R
or .Rmd
file containing the R
code for the process.
Case_Study | Title | Virtual_Jupyter_Notebook |
---|---|---|
1B | Clustering of Documents using R | link |
4C | Topic Modeling of Documents using R | link |
5B | Network Text Analysis of Documents using Textnets package of R | link |
6B | Burst Detection of Documents using R | link |
7B | Sentiment Analysis of Documents using R | link |
9A | Build a Dashboard using R | link |
Reproduce the analysis in the cloud without having to install any software. The computational environment used by the authors runs using BinderHub. Click the hyperlink to open an interactive virtual Jupyter Notebook for hands-on practice for the case studies that used R programming language.
The heatmap plot shows the distances between the documents.
The clustered heatmap plot shows another way to visualize the distances between the documents.
The dendogram presents the hierarichal clustering of documents using the ward method.
For clustering in R, elbow method was used to determine the number of clusters.
Euclidean distance method was used to determine the distance between the documents.
Hierarchical clustering with dendrograms is another way to visualise the distance between the documents.
Circular dendogram is yet another way to visualise the distance between the documents.
Phylogenic structure is another way of visualizing the same results with different perspective according to your research problem and dataset.
The results shows the topics assigned to the corpus of research articles.
The table shows the topics assigned to the corpus of ETDs.
The figure shows the topics which were identified using Structural Topic Modeling (STM).
The figure shows another way of representing the topic modeling results.
The figure shows another way of representing the topic modeling results.
The figure shows an interactive way of representing the topic modeling results.
When you click on the figure, a new window will open in your browser where you can interact and visualizes the changes by altering various parameters.
The table presents the result for top five representative ETDs for the modeled topics and were ranked according to their probability.
The figure shows correlation between the topics using a network graph.
The figure presents the word co-occurrence network for top words.
The figure represents the clusters/communities of words (nodes).
The figure shows the horizontal line graphs for bursts.
The figure shows the accumulation of preprints in the aRxiv database.
The figure shows the bursts of preprints in the aRxiv database.
The figure represents the percentage comparison between polarities for 20 different productivity facets.
The figure represents the percentage comparison between subjectivities for 20 different productivity facets.
The figure shows the percentage-based means for amazon book reviews.
The figure shows the discrete cosine transformation for amazon book reviews.
The figure shows the emotions for amazon book reviews.
The figure shows the confusion matrix for the SVM predictive model.
R
The storyboard consists of three important sections:
Storyboard: It summaries the visualization for a specific case study. For this case study 13 different storyboards were prepared to summarize the results from all the case studies;
Frame: It shows different visualizations from a specific case study and divide it into different sub-sections; and
Commentary: This section is used to explain the visualization.