Skip to Main Content

Tutorials + Resources

Voyant workspace

Voyant Tools Tutorial

Allen Brown, Bryan Tor | Fall, 2021

Contents

When to Use Voyant

Voyant Tools is an online application used for text analysis. Created with the digital humanities in mind, it is intended to facilitate reading and interpretive practices for its users. Voyant can generate word clouds from given documents, show word frequency or collocation, and perform other text mining functions with plain text, HTML, XML, PDF, RTF, and MS Word files. Researchers have used Voyant Tools to analyze texts in a wide range of contexts including literature, language teaching, healthcare, and system architecture.
Certain functionalities outlined in the official Voyant Tools Help documentation do not reliably work. However, these problems do not impair use of Voyant as a text analysis environment, and this tutorial takes these issues into account.

Voyant can be found here: http://voyant-tools.org

Starting with Voyant

When first navigating to the website, Voyant will need to have the text it should analyze. Users can paste in the text to be analyzed directly into the text area, paste the URL of the website to be analyzed. Alternatively, you can open an existing corpus created with Voyant, or upload a file to analyze from your computer.

Uploading a file onto Voyant-Tools

 

 

 

 

 

 

 

 

Stopwords

You can edit Voyant’s stopword list by opening the option menu, accessible by hovering briefly over the top bar of the tool and then clicking the options icon (looks like a switch) once it appears.

Options icon in the upper right corner, circled in red

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Voyant - modifying stopwords

Voyant keeps a list of words, called stopwords, that should be excluded from searches. Including stopwords could clutter out more interesting results from text analysis, but important terms should not be classified as stopwords.

After accessing the options menu, you can select “Edit List” to directly modify what Voyant thinks is a stopword, or you can change Voyant’s “Auto-detect” stopword option to a preselected list of stopwords from a specific language

Exporting

The Voyant export feature allows you to export the entire session of Voyant Tools or specific tools from that session. Click the Export icon to access the Export menu. You have three to four different choices when exporting:

 

  • a URL for this view (tools and data), default option: 
  • an HTML snippet for embedding this view in another web page, under “Export View (Tools and Data)”: Returns a snippet of HTML code, which can be used to embed this session of Voyant Tools into a webpage.
  • a bibliographic reference for this view, under “Export View (Tools and Data)”: Returns a bibliographic reference for this session of Voyant Tools.
  • export a PNG image of this visualization, under “Export Visualization,” only accessible when exporting a specific tool: creates a PNG image of the current tool, or creates a snippet of HTML code which contains the image.

Exporting a specific tool in Voyant-ToolsLocation of the Export URL option

Cirrus

Cirrus tool from Voyant-ToolsCirrus is a word cloud that visualizes the most frequently used words of the document. The color and absolute position of the words are not significant, although words that appear more often will be positioned approximately towards the center. Words used less often will be located towards the outside perimeter of the cloud.


Word Frequency and the Terms Bar

Hovering over a word will display the number of times that word appears on the document. Changing the bottom-left slider labeled (“Terms”) adjusts the number of shown words.


Options Menu
  1. Stopwords: You can manage the list of words that Cirrus will exclude from the word cloud. Refer to the Stopwords section for more information.
  2. White List: You can set a set of allowed words that Cirrus will only include into the word cloud. If you decide to use this, be sure to set Stopwords to “None”.
  3. Categories: You can assign terms to different categories. Categories can be assigned certain colors and fonts to display on the word cloud. Note that this feature may not work some of the time.
  4. Font family: This changes the default font of the word cloud. Note that this feature may not work some of the time.
  5. Palette: This changes the color of the Cirrus menu, not the word cloud itself. Note that this feature may not work some of the time.

 

Document Terms

Document Tools from Voyant-ToolsDocument Terms is a table view of the term frequencies in the document. The table view has seven data columns:

You can manage your columns by hovering over the title bar for a column and opening the drop down menu. Check off which columns you want and do not want.

 

 

 


#: Document number. If you have multiple documents, this number will change to correspond to which document has which term frequency.


Term: This is the word itself


Count: This is the frequency at which the word appears Relative: This is the relative frequency (per 10 million words) of the term in the document Trend: This is a graph that displays the distribution of the term throughout the document Significance: Significance is measured as a TF-IDF score. It is a common way of displaying how important a term is relative to the rest of the terms in the document. Not displayed by default Z-Score: This is the Z-score (standard score) for the term’s raw frequency compared to other term frequencies in the same document. Not displayed by default. Adding Z-score may hide the Trends column  


You can manage your columns by hovering over the title bar for a column and opening the drop down menu. Check off which columns you want and do not want.

Trends

Trends, also known as the Type Frequencies Chart, shows a line graph reflecting the distribution of the frequency of a word used in the document. Each line in the graph is colored arbitrarily to the word it represents, and there is a legend at the top of the tool showing which colors are assigned to which words. You can hide and reveal terms from Trends by clicking on that term in the legend.

The Search Bar is located towards the lower left-hand corner of the Trends tool. You can select certain terms to only be displayed by Trends.

The Options menu is located on the top bar of the tool, towards the upper right hand corner. From the menu, you can modify the Stopwords list (see the stopwords section of this guide for specifics), frequency type, segment numbers, or the color palette used by Tools (this feature may not work).

The “Frequencies” option on the bottom bar of the Trends tool manages how word frequencies are generated. You can pick between two options:

  1. relative frequencies (default): term frequency in document or document segment per normalized count of 1 million terms
  2. raw frequencies: absolute count for each document or document segment

If you only have one document, that document is split into a number of segments. You can change how many segments that document is divided into. If you have multiple documents in your session of Voyant Tools, then the segments consist of each document, and the number of segments can no longer be changed.

Contexts

Contexts tool of Voyant-ToolsThe Contexts tool, also known as the Document Type KWICs Grid, displays a table contextualizing a selected word with the phrases or paragraphs of text that directly precede and follow each instance of the word in the document.

The Contexts tool has five columns:

  1. Document: This is the document where the term is located.
  2. Left: These are the contextual words to the left of the term.
  3. Term: This is the term itself.
  4. Right: These are the contextual words to the right of the term.
  5. Position: This is where the term is located in the document. Not displayed by default.

Each row has the option to expand. Expanding the row shows additional context as to where the term is located. You can modify how much additional context is displayed with the sliders located on the bottom bar of the Context tool.

  1. Search Bar: You can select which terms are displayed by the Context tool. By default, the term that appears most frequently in the document is displayed initially.
  2. context: Increasing and decreasing this slider changes how much context is shown in the Left and Right columns.
  3. expand: Increasing and decreasing this slider changes how much context is shown when expanding a row.

 

Phrases

Phrases tool of Voyant-ToolsThe Phrases tool identifies common phrases used in the document, and organizes these phrases into a chart organized by frequency of repetition or length of the phrase.

The Phrases tool has four columns:

  1. Term: This is the repeating phrase.
  2. Count: This is the number of times the phrase appears in the document/
  3. Length: This is the number of words in the phrase
  4. Trend: This is the sparkline graph that shows the distribution of relative frequencies of the term across the document.

By default, the phrases are shown in descending order of phrase length.

The bottom bar of the Phrases tool has a search bar, length control, and additional options.

  1. Search Bar: You can search for certain terms and which phrases contain said term, or search for certain phrases altogether.
  2. Length: You can specify the upper and lower bounds of the phrase length by using the slider.
  3. Overlap: The Overlap options ensures that overlapping phrases are filtered out. For example, a phrase “once upon a time” will not also coexist with the phrases “once upon a,” “upon a time,” “once upon,” and so on. There are three options you can use to control overlap:
  • none (keep all): No filtering at all, overlapping phrases are displayed
  • Prioritize longest phrases: Only the longest phrase is kept. All others are filtered out. For example, only “once upon a time” would be kept, as it is the longest phrase.
  • Prioritize most frequent phrases: Phrases that appear frequently throughout the document will be displayed. For example, if “upon a time” was used frequently, it would be displayed over “once upon a time” or “twice upon a time.”

 

Additional Tools

Voyant offers a variety of additional Tools besides those explained in this guide. To change which tools you are using, click the “Windows” button located in the top bar of a tool. This will reveal a drop down menu, where you can select an alternate tool to switch out with that current tool.

Windows icon to see additional Tools in Voyant

The following pages list out alternate tools that you can use, organized under their icon of tools:


Corpus Tools

  • Cirrus: Word cloud that visualizes the top frequency of words
  • Terms: A table view of term frequencies in the entire corpus
  • Bubblelines: Visualization of frequency and distribution of terms in a corpus. Looks like a timeline containing bubbles corresponding to a term, varying by size and color according to certain characteristics of that term.
  • Correlations: Sees how often a term appears with other terms
  • Collocates: Represents terms that occur in close proximity
  • DreamScape: A preliminary attempt to explore how texts might be represented geo-spatially. The tool tries to identify locations (especially city names) mentioned in texts, and suggests patterns of recurring connections between locations, patterns that might help identify travel of people, ideas, goods, or anything else.
  • Mandala: A conceptual visualization that shows the relationships between terms and documents. Each search term (or magnet) pulls documents toward it based on the term's relative frequency in the corpus.
  • MicroSearch: Visualizes the frequency and distribution of terms across the document or documents
  • StreamGraph: a visualization that depicts the change of frequency of words in a corpus
  • Phrases: A table view of repeated phrases in the document
  • Documents: The assembled documents in the Voyant session, if you are analyzing more than one document
  • Summary: A simple, textual overview of the current document(s)
  • Trends: A line graph depicting the distribution of a word’s occurrence across a document
  • ScatterPlot: Visualization on how words cluster in a document
  • TermsRadio: a visualization that depicts the changes of the frequency of words in a document(s)
  • Topics: Generates term clusters from the document
  • Veliza: An experimental tool for having a (limited) natural language exchange (in English) based on your corpus
  • WordTree: A tool that allows you to explore how words are used in phrases

Document Tools

  • Bubbles: Visualization of term frequencies by document. Looks like a bunch of bubbles on a screen, with a word in each bubble.
  • Cirrus: Word cloud that visualizes the top frequency of words
  • Contexts: Shows each occurence of a keyword with their context
  • Document Terms: A table view of term frequencies for each document
  • Reader: Lets you read the document being analyzed
  • TextualArc: Visualization of terms that include a weighted centroid of terms and an arc that follows terms in document order
  • Trends: A line graph depicting the distribution of a word’s occurrence across a document
  • Knots: Visualization that represents terms as a series of twisted lines
  • Topics: Generates term clusters from the document

Visualization Tools

  • Cirrus: Word cloud that visualizes the top frequency of words
  • Bubblelines: Visualization of frequency and distribution of terms in a corpus. Looks like a timeline containing bubbles corresponding to a term, varying by size and color according to certain characteristics of that term.
  • Bubbles: Visualization of term frequencies by document. Looks like a bunch of bubbles on a screen, with a word in each bubble.
  • Links: The collocation of terms in a corpus displayed in a network
  • DreamScape: A preliminary attempt to explore how texts might be represented geo-spatially. The tool tries to identify locations (especially city names) mentioned in texts, and suggests patterns of recurring connections between locations, patterns that might help identify travel of people, ideas, goods, or anything else.
  • Knots: Visualization that represents terms as a series of twisted lines
  • Mandala: A visualization that shows the relationship between terms and documents
  • MicroSearch: Visualizes the frequency and distribution of terms across the document or documents
  • StreamGraph: a visualization that depicts the change of frequency of words in a corpus
  • ScatterPlot: Visualization on how words cluster in a document
  • TextualArc: Visualization of terms that include a weighted centroid of terms and an arc that follows terms in document order
  • Trends: A line graph depicting the distribution of a word’s occurrence across a document
  • TermsBerry: Provides a way of exploring high frequency terms and their collocates (words that occur in proximity)
  • TermsRadio: a visualization that depicts the changes of the frequency of words in a document(s)
  • WordTree: A tool that allows you to explore how words are used in phrases

Grid Tools

  • Terms: A table view of term frequencies in the entire corpus
  • Collocates: Represents terms that occur in close proximity
  • Correlations: Sees how often a term appears with other terms
  • Phrases: A table view of repeated phrases in the document
  • Contexts: Shows each occurence of a keyword with their context
  • Document Terms: A table view of term frequencies for each document
  • Documents: The assembled documents in the Voyant session, if you are analyzing more than one document
  • Topics: Generates term clusters from the document

Other Tools

  • Reader: Lets you read the document being analyzed
  • Summary: A simple, textual overview of the current document(s)
  • Veliza: An experimental tool for having a (limited) natural language exchange (in English) based on your corpus