Part 4. Digital Tools Explained

4.4 Voyant Tools

What is Voyant Tools?


Voyant Tools is a web-based text reading and analysis application. The tool performs a range of functions to tabulate, analyze, and visualize texts. Created by Stéfan Sinclair, Geoffrey Rockwell, and their project team. Voyant Tools makes it possible to text mine or derive information about a single text or many different texts. Users on the site upload documents to Voyant Tools and receive information about word count, word usage patterns and percentages, language density, and collocates, to name only a few of the results that the application provides.


Voyant is designed to integrate into a collaborative research process, including the possibility of sharing corpora and embedding tools into web pages (as you might embed a video); overall, Voyant is ideal for combining digital tools and argumentation to produce scholarship.

Once you create a corpus you will arrive at the default “skin” or arrangement of tools (see below).

 

The various tools in the interface are designed to interact with one another. For instance, if you click on a word in Cirrus, you’ll see the Trends tool update with information about the selected work. Similarly, if you click on a node in the Trends tool the Contexts tool should update as well. Interactivity and navigation between the different scales of a corpus (from the macroscopic Cirrus overview to the microscopic individual word occurrences) are a key part of the design of Voyant Tools.

Additional tools are readily accessible by clicking the tabs in each tool pane. For instance, beside the Cirrus header label is the Corpus Terms label, clicking on the tab will switch the tool.

Getting Started Distant Reading Frederick Douglass

Why is Voyant useful for performing a distant read?

Voyant facilitates a reader’s ability to take a broad overview of a text. The manner in which the document is arranged is very important. Voyant lets you measure the use of specific words used over a corpus. It also takes account the most frequently used words and phrases. A key feature of Voyant is its collocates feature. This allows users to identify specific words and observe their connections with key terms. The tool can assist in the process of performing a broad analysis of the words and sentence usage in The Narrative. We can assess the text based on word usage.

In this example, let’s consider how we could use Voyant to text mine and explore one of the most well-known works in American and African American literature, Frederick Douglass’s Narrative of the Life of Frederick Douglass (1845).

When I load texts into Voyant, there are four general steps I take to get a general sense of the corpus.

  1. Take an overview of your corpus using the Summary and Documents tools
  2. Take account of the most frequently used words and see their distribution across the corpus using the Cirrus and Trends tools.
  3. Take account of key terms and how they are used most often in the corpus using the Terms tool.
  4. Find out how words are used within the corpus using the Context tool.

 

Step One: Summary and Documents

The first thing to do is get a general sense of the documents in the corpus. This prompts you to think numerically and make exact observations about length and word usage.

 

This image shows the summary screen in Voyant Tools
Figure 4.4.1 The Summary provides a simple, textual overview of the current corpus, including (as applicable for multiple documents) number of words, number of unique words, longest and shortest documents, highest and lowest vocabulary density, average number of words per sentence, most frequent words, notable peaks in frequency, and distinctive words.
This image shows the document home screen in Voyant Tools
Figure 4.4.2

For instance, using the Summary and Documents tool, I discovered:

  • This corpus has 11 documents (11 chapters) with 34,634 total words and 4,297 unique word forms. It’s organized id ascending order.
  • Chapter 10 is the longest chapter with 12880 words.
  • Chapter 6 is the shortest chapter with 1286 words.
  • Most of the chapters have a word density in the mid-range of 30 percent. Chapter 6 has the highest (40%) and chapter 10 has the lowest (18%).

Step Two: Cirrus and Trends

Next, I take a general account of the most frequently used words using the cirrus or document terms tool. I used the “Cirrus” and “Trends” tool to make observations about the most frequently used words.

 

This image shows the cirrus, reader, and trends home screens in Voyant Tools
Figure 4.4.3

The word cloud provides a brief snapshot frequently used words by proportion, and also offers tabulations of those terms. The display creates an opportunity to consider aspects of a document from a distinct visual and quantitative perspective.

For Douglass’s Narrative, we immediately see at least five prominent words “Mr” (169), “slaves” (124), “master” (123), “slave” (116), and “time” (114).  Hovering over the words reveal those numbers of mentions.  Other words such as “slavery,” “covey,” “old,” “new,” “said,” “went,” and “work” are also frequently recurring words that appear over fifty times.

The trends tool is the distribution of a word’s occurrence across a corpus or document. The chart displays the frequency of the term in relation to how it’s used across the entire corpus. The use of the word “Mr” is especially pronounced in chapter 4. It falls considerably but rises again in chapters 9 and 10.  The word “master” is used throughout the narrative, sometimes, minimally, until makes a notable peak in chapter 8 and 9.

Step Three: Terms 

I used the terms tools to identify collocates. Collocates are ****.

This will help me understand how Douglass uses words to narrate stories.

Words like “Mr” and “Slave” stand out to me. Mr is often times you as a sign to denotre respect. Slave or slaves is used to identify people during a heinous part of American history. Douglass talks about enslaved people with such compassion throughout this narrative. That led me to think more about how he uses the word “Mr.” The irony is that he usually uses the word “slave” to talk compassionately about enslaved people. He usually uses the word “mr” to refer to a slave owner or overseer.

This image shows the expanded terms tool in Voyant Tools and the collocates, correlations, and phrases associated with the term "mr."
Figure 4.4.4
This image shows the expanded phrases for the word "mr" in the terms window on Voyant Tools
Figure 4.4.5

Switching to the “Terms” interface and clicking the “+” sign next to the word “Mr” reveals more information about how the word is used in context. “mr covey” represents four of the top five phrase. After extending it to 10 top phrase, “mr covey” appeared in seven out of ten confirming his importance throughout the narrative.

Step Four: Context

To gain more insight, I used the “context” tool to search for instances of “covey” used in the narrative. In chapter 5, Douglass describes Mr. Covey as “a poor man” and “a farm renter.” Douglass noted that Covey “had acquired a very high reputation for breaking young slaves.” The Narrative extended Covey’s infamous reputation.

This image shows the context window in Voyant Tools
Figure 4.4.6

The term “Covey” (61) is among the most frequently mentioned words in Douglass’s book as the eighth most frequently used term (not including pronouns). In fact, the term “Covey” is mentioned 5 times in Chapter 9 and 56 times in Chapter 10 – the longest chapter of the Narrative. As a result, Covey is by far the most discussed subject in the chapter.

This tool shows the context tool expanded in Voyant Tools
Figure 4.4.7

Links Tool

What is the Links Tool (Collocates Graph)?

Links represents the collocation of terms in a corpus by depicting them in a network through the use of a force directed graph.

Why is the Links Tool (Collocates Graph) Useful?

Collocates Graph represents keywords and terms that occur in close proximity as a force directed network graph.In this graph the frequency of the word is indicate by relative size of the term.

How to use inks Tool (Collocates Graph)

This represents a network graph where keywords in blue are shown linked to collocates in orange. You can hover over a term to see its frequency (for keywords it’s the corpus frequency, for collocates it’s the frequency in the context of the linked keywords). You can drag and drop terms to move them. You can drag terms off the canvas to remove them.

 

I used the “Links” visualization tool to explore networks between frequently used words. I began by looking at the top five words.

This image shows the links tool in Voyant Tools
Figure 4.4.8

I clicked on the word “mr” and watched the network expand to include “covey,” “Hopkins,” “ruggles,” “johnson,” “hugh,” “Hamilton,” “Thomas,” and “freeland.” All of these names refer to principle characters and alert us to how Douglass narrates his story.

This image shows the expanded view of the links tool in Voyant Tools focusing on the word "mr"
Figure 4.4.9
This image shows the expanded view of the links tool in Voyant Tools focusing on the word "mr"
Figure 4.4.10

 

Next I clicked on the word “slaves.” The words “boast,” “plantation,” “ability,” “farms,” “colonel,” “emancipate,” “escape,” and “home,” appear. These words talk about the slaves, black people, as property and connects them to specific places such as “plantations” and “farms.”

This image shows the expanded view of the links tool in Voyant Tools focusing on the word "mr"
Figure 4.4.11
This image shows the expanded view of the links tool in Voyant Tools focusing on the word "mr"
Figure 4.4.12

 

I typed the word “time” into the search to reveal the words connected with it (“spent,” “hire,” and “spring”). These words suggest that Douglass equates time with money especially in terms of how he rents his labor out to make money in the Narrative.

This image shows the expanded view of the links tool in Voyant Tools focusing on the word "mr"
Figure 4.4.13
This image shows the text entry box in Voyant Tools
Figure 4.4.14

.

I added the word “slave” and the words “children,” “life,” and “whip,” appeared. I was interested in the word “whip” since it had not been among the most frequently used words. I clicked on it. The words “covey” and “cowskin” popped out. These words relate to how the brutality Douglass experienced while in slavery.

 

This image shows the expanded view of the links tool in Voyant Tools focusing on the word "mr"
Figure 4.4.15
This image shows the expanded view of the links tool in Voyant Tools focusing on the word "mr"
Figure 4.4.16

 

Termsberry

What is the Termsberry Tool?

The TermsBerry tool provides a way of exploring high frequency terms and their collocates (words that occur in proximity).

Why is Termsberry Useful?

The TermsBerry tool is intended to mix the power of visualizing high frequency terms with the utility of exploring how those same terms co-occur (that is, to what extend they appear in proximity with one another). In some ways it’s like Cirrus (the word cloud) but even more useful with the added collocates and corpus coverage information.

How to use Termsberry Tool?

When you hover over a term it becomes the keyword and then each of the other bubbles will indicate the collocate frequency for that term (within the specified context, by default two words the left and two words to the right). The darker the color, the higher the collocate frequency. The hovering term also has a tooltip that appears and that provides the term frequency as well as the number of documents in which that term appears.
The highest frequency terms (or most distinct terms if you change the options) appear in the middle and in larger bubbles, with terms spiralling outwards. The darkness of the terms represents the proportion of the documents where the term appears (darker means that it appears in more documents; there will be no differentiation if there’s only one document in the corpus).

When you hover over a term it becomes the keyword and then each of the other bubbles will indicate the collocate frequency for that term (within the specified context, by default two words the left and two words to the right). The darker the colour, the higher the collocate frequency. The hovering term also has a tooltip that appears and that provides the term frequency as well as the number of documents in which that term appears.

 

Another way to visualize the narrative is by using “TermsBerry.”

This image shows view of the termsberry tool in Voyant Tools
Figure 4.4.17

I hovered over the words “slave” and “slaves” Even though the words are very similar and refer to the same thing, I realized that the words are used in different contexts. There are many more words associated with “slaves” in comparison to “slave.” While there is considerable overlap, this leads me to consider in what ways does Douglass use these two terms.

 

This image shows the expanded view of the termsberry tool in Voyant Tools
Figure 4.4.18
This image shows the expanded view of the termsberry tool in Voyant Tools
Figure 4.4.19

.

I also hovered over the words “whip” as well as “whipped.”  The word “whip” was associated with “covey,” “colonel,” “slave,” and “slaves.” A notable distinction about “whipped” is that it’s also associated with “death.”

This image shows the expanded view of the termsberry tool in Voyant Tools
Figure 4.4.20
This image shows the expanded view of the termsberry tool in Voyant Tools
Figure 4.4.21

.

I also looked at the word “read.” Douglass’s Narrative features his pursuits to become literate. The word “read” is a useful word to consider. I also added the word “write.” I noticed a connection between “read” and “write.”

Bubblelines

What is the Bubblelines Tool?

Bubblelines visualizes the frequency and distribution of terms in a corpus.

Why is Bubblelines useful?

Bubblelines is a visualization tool that helps to understand patterns of word repetition in one or more documents. Each document is represented as a horizontal line and each seach term is represented as a bubble – the bubble represents the frequency of the term in the corresponding segment of text (the text is divided into segments of equal length).

How to use the Bubblelines Tool?

Each document in the corpus is represented as a horizontal line and divided into segments of equal length (50 segments by default). Each selected word is represented as a bubble with the size of the bubble indicating the word’s frequency in the corresponding segment of text. The larger the bubble the more frequently the word occurs.

You can add more terms by using the search box – simply type in a term and hit enter (see Term Searches for more advanced searching capabilities). You can also clear all existing terms. The “Segments” slider allows you to adjust how many segments are used for each document: if you use a value of 10, it means that the document will be divided into 10 equal parts (based on the number of terms in each part). The minimum value is 10 and the maximum value is 300, with incremental jumps of 10.

 

 

This image shows the bubbles tool in Voyant Tools
Figure 4.4.22

I used Bubblelines to identify which chapters Douglass discusses his pursuits of literacy the most. As he matured, Douglass learned that literacy was the “white man’s power to enslave the black man” after hearing his former master Mr. Auld say that learning to read would spoil the young Douglass and “forever unfit him to be a slave.”

I decided to focus on three key terms “read,” “write,” and “learn.” These three words are signifiers that alert readers that Douglass is referencing his pursuits of literary. I entered the three words in the text box (note, I placed an asterisk next to each word so I could focus on the root word instead of a specific term).

This image shows the bubbles toolbar in Voyant Tools
Figure 4.4.23

The results indicate the majority of Douglass’s literacy pursuits occurred after chapter 6. And writing in particular, appears primarily in chapter 7.

This image shows the bubbles tool in Voyant Tools
Figure 4.4.24

Dreamscape

What is the Dreamscape Tool?

Dreamscape is an experimental tool for exploring geospatial aspects of texts. A primary weakness of Dreamscape is that current tools for automatically identifying locations in texts usually produce a significant number of errors, tagging locations that aren’t a location (false positive) and not tagging locations that are a location (false negative).

Why is this tool useful?

The tool tries to identify locations (especially city names) mentioned in texts, and suggests patterns of recurring connections between locations, patterns that might help identify travel of people, ideas, goods, or anything else. The notion of travel here is to be interpreted loosely and critically: a sequence of locations may or may not signify anything at all, but Dreamscape seeks to help study them.

How to use the Dreamscape Tool

The names of cities mentioned in the text as represented by the circles, the larger the city the more often it’s mentioned (hovering over a city shows its name and frequency in the corpus). The frequency of connections between two locations in a document as represented by the arcs, in other words when two locations are mentioned together (Paris to Montreal to London would create two connections: Paris->Montreal and Montreal->London). The individual occurrences of connections as represented by the animated arc and the ticker text at the top (each connection is “read” in sequence in the text)

 

 

This image shows the dreamscape tool in Voyant Tools
Figure 4.4.25

I used Dreamscape to take account of specific places mentioned across Douglass’s narrative.

A primary weakness of Dreamscape is that current tools for automatically identifying locations in texts usually produce a significant number of errors, tagging locations that aren’t a location (false positive) and not tagging locations that are a location (false negative). Language is messy and the computer doesn’t understand its meaning.

This image shows the dreamscape tool in Voyant Tools
Figure 4.4.26

For example, Dreamscape picks up the name “Fairbanks” and plots it in Alaska. In actuality, this term refers to a person “Wright Fairbanks.” I found this out after clicking “View occurrence.” I removed the term by clicking the dot and selecting “Remove Location.”

This image shows the dreamscape tool in Voyant Tools
Figure 4.4.27
This image shows the dreamscape tool in Voyant Tools
Figure 4.4.28

In another instance, the word “Delaware” is associated with a town in Ohio. This again throws off the reading of Douglass’s narrative slightly. When I clicked on the location and chose “Select Alternative Location,” it read, “No alternative location found.” So, I chose to remove it from the map as well. I also removed the removed the “Virginia” reference from the map since Douglass only mentions the word once when it’s referenced in a song.

This image shows the dreamscape tool in Voyant Tools
Figure 4.4.29
This image shows the dreamscape tool in Voyant Tools
Figure 4.4.30

I selected the “Easton” referenced made that is connected to Rhode Island, and clicked “Select Alternative Location.” I selected the location that is in Maryland in order to align with Douglass’s narrative. The end result provided me with a map that plotted crucial locations in Douglass’s narrative.

 

This image shows the dreamscape tool in Voyant Tools
Figure 4.4.31
This image shows the dreamscape tool in Voyant Tools
Figure 4.4.32

Even though this map isn’t perfect, it offers an overview of the various locales that are mentioned in the text. This allows us to see the range of space that Douglass covered in his escape from slavery in the South to freedom in the north.

 

Extended Reading about Voyant Tools & Distant Reading


By Kenton Rambsy


This chapter was adapted from “Voyant Tools’s Tutorial/Workshop” page.

 

Media Attributions

License

Icon for the Creative Commons Attribution 4.0 International License

The Data Notebook by Peace Ossom-Williamson and Kenton Rambsy is licensed under a Creative Commons Attribution 4.0 International License, except where otherwise noted.

Share This Book