About

Introduction

This storyboard summaries the visualizations for 13 case studies performed in the book, “Text Mining for Information Professionals: An Uncharted Territory”. To know more about the case studies, and the methodology used to get the results, download the book from SpringerNature. To cite this storyboard use: ©2021 Lamba and Madhusudhan - all rights reserved, unless stated otherwise.

Click on the hyperlink in the figure legend to open the article associated with the citation.

Virtual RStudio Server

Case_Study	Title	Virtual_RStudio_Server
1B	Clustering of Documents using R	link
4C	Topic Modeling of Documents using R	link
5B	Network Text Analysis of Documents using Textnets package of R	link
6B	Burst Detection of Documents using R	link
7B	Sentiment Analysis of Documents using R	link
9A	Build a Dashboard using R	link

Reproduce the analysis in the cloud without having to install any software. The computational environment used by the authors runs using BinderHub. Click the hyperlink to open an interactive virtual RStudio environment for hands-on practice for the case studies that used R programming language. In the virtual environment, open the .R file to run the code.

In the virtual environment, open the .R or .Rmd file containing the R code for the process.

Virtual Jupyter Notebook

Case_Study	Title	Virtual_Jupyter_Notebook
1B	Clustering of Documents using R	link
4C	Topic Modeling of Documents using R	link
5B	Network Text Analysis of Documents using Textnets package of R	link
6B	Burst Detection of Documents using R	link
7B	Sentiment Analysis of Documents using R	link
9A	Build a Dashboard using R	link

Reproduce the analysis in the cloud without having to install any software. The computational environment used by the authors runs using BinderHub. Click the hyperlink to open an interactive virtual Jupyter Notebook for hands-on practice for the case studies that used R programming language.

1A

Heatmap Showing Distances Between Documents

The heatmap plot shows the distances between the documents.

Clustered Heatmap Showing Distances Between Documents

The clustered heatmap plot shows another way to visualize the distances between the documents.

Dendogram Showing Hierarchical Clustering of Documents

The dendogram presents the hierarichal clustering of documents using the ward method.

1B

Determine the Number of K for Clustering using Elbow Method

For clustering in R, elbow method was used to determine the number of clusters.

Visualizing Distance Matrices

Euclidean distance method was used to determine the distance between the documents.

Agglomerative Hierarchical Clustering

Hierarchical clustering with dendrograms is another way to visualise the distance between the documents.

Circular Dendogram

Circular dendogram is yet another way to visualise the distance between the documents.

Phylogenic Dendogram

Phylogenic structure is another way of visualizing the same results with different perspective according to your research problem and dataset.

4A

Core Topics

Timeline showing the core topics in DESIDOC Journal of Library and Information Technology from 1981 to 2018 (©2019 Springer Nature, all rights reserved – reprinted with permission from Springer Nature, published in Lamba and Madhusudhan (2019))

The results shows the topics assigned to the corpus of research articles.

4B

Core Topics

Latent Dirichlet Allocation Topic and Word Result for PQDT Global ETDs during 2014-2018 (©2020 Cadernos BAD, all rights reserved – reprinted under Creative Commons CC BY license, published in Lamba and Madhusudhan (2020) )

The table shows the topics assigned to the corpus of ETDs.

4C

Method 1: Plotting Top Words using stm

The figure shows the topics which were identified using Structural Topic Modeling (STM).

Method 2: Plotting MAP Histogram using stm

The figure shows another way of representing the topic modeling results.

Method 3: Visualizing Topic Model using ggplot2

The figure shows another way of representing the topic modeling results.

Method 4: Interactive Visualization

The figure shows an interactive way of representing the topic modeling results.

When you click on the figure, a new window will open in your browser where you can interact and visualizes the changes by altering various parameters.

Understanding Topics through Top 5 Representative Documents

The table presents the result for top five representative ETDs for the modeled topics and were ranked according to their probability.

Topic Correlation

The figure shows correlation between the topics using a network graph.

5A

Network Text Analysis of Documents using bibliometrix

The figure presents the word co-occurrence network for top words.

5B

Network Text Analysis of Documents using textnets

The figure represents the clusters/communities of words (nodes).

6A

Horizontal Line Graph

The figure shows the horizontal line graphs for bursts.

6B

Accumulation of Submissions

The figure shows the accumulation of preprints in the aRxiv database.

Bursts in Submissions

The figure shows the bursts of preprints in the aRxiv database.

7A

Percentage Comparison for Polarities

The figure represents the percentage comparison between polarities for 20 different productivity facets.

Percentage Comparison for Subjectivities

The figure represents the percentage comparison between subjectivities for 20 different productivity facets.

7B

Percentage-Based Means

The figure shows the percentage-based means for amazon book reviews.

Discrete Cosine Transformation (DCT)

The figure shows the discrete cosine transformation for amazon book reviews.

Emotion Graph

The figure shows the emotions for amazon book reviews.

8A

Predictive Modeling of Documents using RapidMiner

The figure shows the confusion matrix for the SVM predictive model.

9A

Build a Dashboard in `R`

The storyboard consists of three important sections:

Storyboard: It summaries the visualization for a specific case study. For this case study 13 different storyboards were prepared to summarize the results from all the case studies;
Frame: It shows different visualizations from a specific case study and divide it into different sub-sections; and
Commentary: This section is used to explain the visualization.

About

Introduction

Contents

Table of Contents

Important Links:

Virtual RStudio Server

Virtual Jupyter Notebook

1A

Heatmap Showing Distances Between Documents

Clustered Heatmap Showing Distances Between Documents

Dendogram Showing Hierarchical Clustering of Documents

1B

Determine the Number of K for Clustering using Elbow Method

Visualizing Distance Matrices

Agglomerative Hierarchical Clustering

Circular Dendogram

Phylogenic Dendogram

4A

Core Topics

4B

Core Topics

4C

Method 1: Plotting Top Words using stm

Method 2: Plotting MAP Histogram using stm

Method 3: Visualizing Topic Model using ggplot2

Method 4: Interactive Visualization

Understanding Topics through Top 5 Representative Documents

Topic Correlation

5A

Network Text Analysis of Documents using bibliometrix

5B

Network Text Analysis of Documents using textnets

6A

Horizontal Line Graph

6B

Accumulation of Submissions

Bursts in Submissions

7A

Percentage Comparison for Polarities

Percentage Comparison for Subjectivities

7B

Percentage-Based Means

Discrete Cosine Transformation (DCT)

Emotion Graph

8A

Predictive Modeling of Documents using RapidMiner

9A

Build a Dashboard in R

Build a Dashboard in `R`