This storyboard summaries the visualizations for 13 case studies performed in the book, “Text Mining for Information Professionals: An Uncharted Territory”. To know more about the case studies, and the methodology used to get the results, download the book from SpringerNature. To cite this storyboard use: ©2021 Lamba and Madhusudhan - all rights reserved, unless stated otherwise.
Click on the hyperlink in the figure legend to open the article associated with the citation.
5A Network Text Analysis of Documents using bibliometrix package of R
5B Network Text Analysis of Documents using textnets package of R
Case_Study | Title | Virtual_RStudio_Server |
---|---|---|
1B | Clustering of Documents using R | link |
4C | Topic Modeling of Documents using R | link |
5B | Network Text Analysis of Documents using Textnets package of R | link |
6B | Burst Detection of Documents using R | link |
7B | Sentiment Analysis of Documents using R | link |
9A | Build a Dashboard using R | link |
Reproduce the analysis in the cloud without having to install any software. The computational environment used by the authors runs using BinderHub. Click the hyperlink to open an interactive virtual RStudio environment for hands-on practice for the case studies that used R programming language. In the virtual environment, open the .R
file to run the code.
In the virtual environment, open the .R
or .Rmd
file containing the R
code for the process.
Case_Study | Title | Virtual_Jupyter_Notebook |
---|---|---|
1B | Clustering of Documents using R | link |
4C | Topic Modeling of Documents using R | link |
5B | Network Text Analysis of Documents using Textnets package of R | link |
6B | Burst Detection of Documents using R | link |
7B | Sentiment Analysis of Documents using R | link |
9A | Build a Dashboard using R | link |
Reproduce the analysis in the cloud without having to install any software. The computational environment used by the authors runs using BinderHub. Click the hyperlink to open an interactive virtual Jupyter Notebook for hands-on practice for the case studies that used R programming language.
©2021 Lamba and Madhusudhan - all rights reserved
The heatmap plot shows the distances between the documents.
©2021 Lamba and Madhusudhan - all rights reserved
The clustered heatmap plot shows another way to visualize the distances between the documents.
©2021 Lamba and Madhusudhan - all rights reserved
The dendogram presents the hierarichal clustering of documents using the ward method.
©2021 Lamba and Madhusudhan - all rights reserved
For clustering in R, elbow method was used to determine the number of clusters.
©2021 Lamba and Madhusudhan - all rights reserved
Euclidean distance method was used to determine the distance between the documents.
©2021 Lamba and Madhusudhan - all rights reserved
Hierarchical clustering with dendrograms is another way to visualise the distance between the documents.
©2021 Lamba and Madhusudhan - all rights reserved
Circular dendogram is yet another way to visualise the distance between the documents.
©2021 Lamba and Madhusudhan - all rights reserved
Phylogenic structure is another way of visualizing the same results with different perspective according to your research problem and dataset.
Timeline showing the core topics in DESIDOC Journal of Library and Information Technology from 1981 to 2018 (©2019 Springer Nature, all rights reserved – reprinted with permission from Springer Nature, published in Lamba and Madhusudhan (2019))
The results shows the topics assigned to the corpus of research articles.
Latent Dirichlet Allocation Topic and Word Result for PQDT Global ETDs during 2014-2018 (©2020 Cadernos BAD, all rights reserved – reprinted under Creative Commons CC BY license, published in Lamba and Madhusudhan (2020) )
The table shows the topics assigned to the corpus of ETDs.
©2021 Lamba and Madhusudhan - all rights reserved
The figure shows the topics which were identified using Structural Topic Modeling (STM).
©2021 Lamba and Madhusudhan - all rights reserved
The figure shows another way of representing the topic modeling results.
©2021 Lamba and Madhusudhan - all rights reserved
The figure shows another way of representing the topic modeling results.
The figure shows an interactive way of representing the topic modeling results.
When you click on the figure, a new window will open in your browser where you can interact and visualizes the changes by altering various parameters.
©2021 Lamba and Madhusudhan - all rights reserved
The table presents the result for top five representative ETDs for the modeled topics and were ranked according to their probability.
©2021 Lamba and Madhusudhan - all rights reserved
The figure shows correlation between the topics using a network graph.
Word Co-Occurrence Network (©2021 Lamba and Madhusudhan - all rights reserved)
The figure presents the word co-occurrence network for top words.
Text Network (©2021 Lamba and Madhusudhan - all rights reserved)
The figure represents the clusters/communities of words (nodes).
©2021 Lamba and Madhusudhan - all rights reserved
The figure shows the horizontal line graphs for bursts.
©2021 Lamba and Madhusudhan - all rights reserved
The figure shows the accumulation of preprints in the aRxiv database.
©2021 Lamba and Madhusudhan - all rights reserved
The figure shows the bursts of preprints in the aRxiv database.
Polarity Percentage (©2018 Springer Nature, all rights reserved – reprinted with permission from Springer Nature, published in Lamba and Madhusudhan (2018))
The figure represents the percentage comparison between polarities for 20 different productivity facets.
Subjectivity Percentage (©2018 Springer Nature, all rights reserved – reprinted with permission from Springer Nature, published in Lamba and Madhusudhan (2018))
The figure represents the percentage comparison between subjectivities for 20 different productivity facets.
©2021 Lamba and Madhusudhan - all rights reserved
The figure shows the percentage-based means for amazon book reviews.
©2021 Lamba and Madhusudhan - all rights reserved
The figure shows the discrete cosine transformation for amazon book reviews.
©2021 Lamba and Madhusudhan - all rights reserved
The figure shows the emotions for amazon book reviews.
Screenshot of evaluation result (©2020 Cadernos BAD, all rights reserved – reprinted under Creative Commons CC BY license, published in Lamba and Madhusudhan (2020))
The figure shows the confusion matrix for the SVM predictive model.
R
Screenshot of the dashboard summarizing the case studies’ results from Text Mining for Information Professionals: An Uncharted Territory book (©2021 Lamba and Madhusudhan - all rights reserved)
The storyboard consists of three important sections:
Storyboard: It summaries the visualization for a specific case study. For this case study 13 different storyboards were prepared to summarize the results from all the case studies;
Frame: It shows different visualizations from a specific case study and divide it into different sub-sections; and
Commentary: This section is used to explain the visualization.