About

Introduction


This storyboard summaries the visualizations for 13 case studies performed in the book, “Text Mining for Information Professionals: An Uncharted Territory”. To know more about the case studies, and the methodology used to get the results, download the book from SpringerNature. To cite this storyboard use: ©2021 Lamba and Madhusudhan - all rights reserved, unless stated otherwise.

Click on the hyperlink in the figure legend to open the article associated with the citation.

Virtual RStudio Server

Case_Study Title Virtual_RStudio_Server
1B Clustering of Documents using R link
4C Topic Modeling of Documents using R link
5B Network Text Analysis of Documents using Textnets package of R link
6B Burst Detection of Documents using R link
7B Sentiment Analysis of Documents using R link
9A Build a Dashboard using R link

Reproduce the analysis in the cloud without having to install any software. The computational environment used by the authors runs using BinderHub. Click the hyperlink to open an interactive virtual RStudio environment for hands-on practice for the case studies that used R programming language. In the virtual environment, open the .R file to run the code.

In the virtual environment, open the .R or .Rmd file containing the R code for the process.

Virtual Jupyter Notebook

Case_Study Title Virtual_Jupyter_Notebook
1B Clustering of Documents using R link
4C Topic Modeling of Documents using R link
5B Network Text Analysis of Documents using Textnets package of R link
6B Burst Detection of Documents using R link
7B Sentiment Analysis of Documents using R link
9A Build a Dashboard using R link

Reproduce the analysis in the cloud without having to install any software. The computational environment used by the authors runs using BinderHub. Click the hyperlink to open an interactive virtual Jupyter Notebook for hands-on practice for the case studies that used R programming language.

1A

Heatmap Showing Distances Between Documents

©2021 Lamba and Madhusudhan - all rights reserved


The heatmap plot shows the distances between the documents.

Clustered Heatmap Showing Distances Between Documents

©2021 Lamba and Madhusudhan - all rights reserved


The clustered heatmap plot shows another way to visualize the distances between the documents.

Dendogram Showing Hierarchical Clustering of Documents

©2021 Lamba and Madhusudhan - all rights reserved


The dendogram presents the hierarichal clustering of documents using the ward method.

1B

Determine the Number of K for Clustering using Elbow Method

©2021 Lamba and Madhusudhan - all rights reserved


For clustering in R, elbow method was used to determine the number of clusters.

Visualizing Distance Matrices

©2021 Lamba and Madhusudhan - all rights reserved


Euclidean distance method was used to determine the distance between the documents.

Agglomerative Hierarchical Clustering

©2021 Lamba and Madhusudhan - all rights reserved


Hierarchical clustering with dendrograms is another way to visualise the distance between the documents.

Circular Dendogram

©2021 Lamba and Madhusudhan - all rights reserved


Circular dendogram is yet another way to visualise the distance between the documents.

Phylogenic Dendogram

©2021 Lamba and Madhusudhan - all rights reserved


Phylogenic structure is another way of visualizing the same results with different perspective according to your research problem and dataset.

4A

Core Topics

Timeline showing the core topics in DESIDOC Journal of Library and Information Technology from 1981 to 2018 (©2019 Springer Nature, all rights reserved – reprinted with permission from Springer Nature, published in Lamba and Madhusudhan (2019))


The results shows the topics assigned to the corpus of research articles.

4B

Core Topics

Latent Dirichlet Allocation Topic and Word Result for PQDT Global ETDs during 2014-2018 (©2020 Cadernos BAD, all rights reserved – reprinted under Creative Commons CC BY license, published in Lamba and Madhusudhan (2020) )


The table shows the topics assigned to the corpus of ETDs.

4C

Method 1: Plotting Top Words using stm

©2021 Lamba and Madhusudhan - all rights reserved


The figure shows the topics which were identified using Structural Topic Modeling (STM).

Method 2: Plotting MAP Histogram using stm

©2021 Lamba and Madhusudhan - all rights reserved


The figure shows another way of representing the topic modeling results.

Method 3: Visualizing Topic Model using ggplot2

©2021 Lamba and Madhusudhan - all rights reserved


The figure shows another way of representing the topic modeling results.

Method 4: Interactive Visualization

©2021 Lamba and Madhusudhan - all rights reserved


The figure shows an interactive way of representing the topic modeling results.

When you click on the figure, a new window will open in your browser where you can interact and visualizes the changes by altering various parameters.

Understanding Topics through Top 5 Representative Documents

©2021 Lamba and Madhusudhan - all rights reserved


The table presents the result for top five representative ETDs for the modeled topics and were ranked according to their probability.

Topic Correlation

©2021 Lamba and Madhusudhan - all rights reserved


The figure shows correlation between the topics using a network graph.

5A

Network Text Analysis of Documents using bibliometrix

Word Co-Occurrence Network (©2021 Lamba and Madhusudhan - all rights reserved)


The figure presents the word co-occurrence network for top words.

5B

Network Text Analysis of Documents using textnets

Text Network (©2021 Lamba and Madhusudhan - all rights reserved)


The figure represents the clusters/communities of words (nodes).

6A

Horizontal Line Graph

©2021 Lamba and Madhusudhan - all rights reserved


The figure shows the horizontal line graphs for bursts.

6B

Accumulation of Submissions

©2021 Lamba and Madhusudhan - all rights reserved


The figure shows the accumulation of preprints in the aRxiv database.

Bursts in Submissions

©2021 Lamba and Madhusudhan - all rights reserved


The figure shows the bursts of preprints in the aRxiv database.

7A

Percentage Comparison for Polarities

Polarity Percentage (©2018 Springer Nature, all rights reserved – reprinted with permission from Springer Nature, published in Lamba and Madhusudhan (2018))


The figure represents the percentage comparison between polarities for 20 different productivity facets.

Percentage Comparison for Subjectivities

Subjectivity Percentage (©2018 Springer Nature, all rights reserved – reprinted with permission from Springer Nature, published in Lamba and Madhusudhan (2018))


The figure represents the percentage comparison between subjectivities for 20 different productivity facets.

7B

Percentage-Based Means

©2021 Lamba and Madhusudhan - all rights reserved


The figure shows the percentage-based means for amazon book reviews.

Discrete Cosine Transformation (DCT)

©2021 Lamba and Madhusudhan - all rights reserved


The figure shows the discrete cosine transformation for amazon book reviews.

Emotion Graph

©2021 Lamba and Madhusudhan - all rights reserved


The figure shows the emotions for amazon book reviews.

8A

Predictive Modeling of Documents using RapidMiner

Screenshot of evaluation result (©2020 Cadernos BAD, all rights reserved – reprinted under Creative Commons CC BY license, published in Lamba and Madhusudhan (2020))


The figure shows the confusion matrix for the SVM predictive model.

9A

Build a Dashboard in R

Screenshot of the dashboard summarizing the case studies’ results from Text Mining for Information Professionals: An Uncharted Territory book (©2021 Lamba and Madhusudhan - all rights reserved)


The storyboard consists of three important sections:

  1. Storyboard: It summaries the visualization for a specific case study. For this case study 13 different storyboards were prepared to summarize the results from all the case studies;

  2. Frame: It shows different visualizations from a specific case study and divide it into different sub-sections; and

  3. Commentary: This section is used to explain the visualization.