Skip to main content

University
Library

DSC Tutorials and Guides

Voyant Tools Tutorial

Written by:  Bryan TorVoyant workspace
Updates provide by: Allen Brown, Bryan Tor
Last Updated: Spring Quarter 2019

Contents:

When to Use Voyant

Voyant Tools is an online application used for text analysis. Created with the digital humanities in mind, it is intended to facilitate reading and interpretive practices for its users. Voyant can generate word clouds from given documents, show word frequency or collocation, and perform other text mining functions with plain text, HTML, XML, PDF, RTF, and MS Word files. Researchers have used Voyant Tools to analyze texts in a wide range of contexts including literature, language teaching, healthcare, and system architecture.
Certain functionalities outlined in the official Voyant Tools Help documentation do not reliably work. However, these problems do not impair use of Voyant as a text analysis environment, and this tutorial takes these issues into account.

Voyant can be found here: http://voyant-tools.org

Starting with Voyant

When first navigating to the website, Voyant will need to have the text it should analyze. Users can paste in the text to be analyzed directly into the text area, paste the URL of the website to be analyzed. Alternatively, you can open an existing corpus created with Voyant, or upload a file to analyze from your computer.

Uploading a file onto Voyant-Tools

 

 

 

 

 

 

 

 

Stopwords

You can edit Voyant’s stopword list by opening the option menu, accessible by hovering briefly over the top bar of the tool and then clicking the options icon (looks like a switch) once it appears.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Voyant keeps a list of words, called stopwords, that should be excluded from searches. Including stopwords could clutter out more interesting results from text analysis, but important terms should not be classified as stopwords.

After accessing the options menu, you can select “Edit List” to directly modify what Voyant thinks is a stopword, or you can change Voyant’s “Auto-detect” stopword option to a preselected list of stopwords from a specific language

Exporting

The Voyant export feature allows you to export the entire session of Voyant Tools or specific tools from that session. Click the Export icon to access the Export menu. You have three to four different choices when exporting:

 

  • a URL for this view (tools and data), default option: 
  • an HTML snippet for embedding this view in another web page, under “Export View (Tools and Data)”: Returns a snippet of HTML code, which can be used to embed this session of Voyant Tools into a webpage.
  • a bibliographic reference for this view, under “Export View (Tools and Data)”: Returns a bibliographic reference for this session of Voyant Tools.
  • export a PNG image of this visualization, under “Export Visualization,” only accessible when exporting a specific tool: creates a PNG image of the current tool, or creates a snippet of HTML code which contains the image.

Exporting a specific tool in Voyant-ToolsLocation of the Export URL option

Cirrus

Cirrus tool from Voyant-ToolsCirrus is a word cloud that visualizes the most frequently used words of the document. The color and absolute position of the words are not significant, although words that appear more often will be positioned approximately towards the center. Words used less often will be located towards the outside perimeter of the cloud.


Word Frequency and the Terms Bar

Hovering over a word will display the number of times that word appears on the document. Changing the bottom-left slider labeled (“Terms”) adjusts the number of shown words.


Options Menu

  1. Stopwords: You can manage the list of words that Cirrus will exclude from the word cloud. Refer to the Stopwords section for more information.
  2. White List: You can set a set of allowed words that Cirrus will only include into the word cloud. If you decide to use this, be sure to set Stopwords to “None”.
  3. Categories: You can assign terms to different categories. Categories can be assigned certain colors and fonts to display on the word cloud. Note that this feature may not work some of the time.
  4. Font family: This changes the default font of the word cloud. Note that this feature may not work some of the time.
  5. Palette: This changes the color of the Cirrus menu, not the word cloud itself. Note that this feature may not work some of the time.

 

Document Terms

Document Tools from Voyant-ToolsDocument Terms is a table view of the term frequencies in the document. The table view has seven data columns:

You can manage your columns by hovering over the title bar for a column and opening the drop down menu. Check off which columns you want and do not want.

 

 

 


#: Document number. If you have multiple documents, this number will change to correspond to which document has which term frequency.


Term: This is the word itself


Count: This is the frequency at which the word appears Relative: This is the relative frequency (per 10 million words) of the term in the document Trend: This is a graph that displays the distribution of the term throughout the document Significance: Significance is measured as a TF-IDF score. It is a common way of displaying how important a term is relative to the rest of the terms in the document. Not displayed by default Z-Score: This is the Z-score (standard score) for the term’s raw frequency compared to other term frequencies in the same document. Not displayed by default. Adding Z-score may hide the Trends column  


You can manage your columns by hovering over the title bar for a column and opening the drop down menu. Check off which columns you want and do not want.

Trends

Trends Tool of Voyant-ToolsTrends, also known as the Type Frequencies Chart, shows a line graph reflecting the distribution of the frequency of a word used in the document. Each line in the graph is colored arbitrarily to the word it represents, and there is a legend at the top of the tool showing which colors are assigned to which words. You can hide and reveal terms from Trends by clicking on that term in the legend.

The “Frequencies” option on the bottom bar of the Trends tool manages how word frequencies are generated. You can pick between two options:

  1. relative frequencies (default): term frequency in document or document segment per normalized count of 1 million terms
  2. raw frequencies: absolute count for each document or document segment

The Search Bar is located towards the lower left-hand corner of the Trends tool. You can select certain terms to only be displayed by Trends.

If you only have one document, that document is split into a number of segments. You can change how many segments that document is divided into. If you have multiple documents in your session of Voyant Tools, then the segments consist of each document, and the number of segments can no longer be changed.


The Options menu is located on the top bar of the tool, towards the upper right hand corner. From the menu, you can modify the Stopwords list (see the stopwords section of this guide for specifics), or the color palette used by Tools (this feature may not work).

 

Contexts

Contexts tool of Voyant-ToolsThe Contexts tool, also known as the Document Type KWICs Grid, displays a table contextualizing a selected word with the phrases or paragraphs of text that directly precede and follow each instance of the word in the document.

The Contexts tool has five columns:

  1. Document: This is the document where the term is located.
  2. Left: These are the contextual words to the left of the term.
  3. Term: This is the term itself.
  4. Right: These are the contextual words to the right of the term.
  5. Position: This is where the term is located in the document. Not displayed by default.

Each row has the option to expand. Expanding the row shows additional context as to where the term is located. You can modify how much additional context is displayed with the sliders located on the bottom bar of the Context tool.

  1. Search Bar: You can select which terms are displayed by the Context tool. By default, the term that appears most frequently in the document is displayed initially.
  2. context: Increasing and decreasing this slider changes how much context is shown in the Left and Right columns.
  3. expand: Increasing and decreasing this slider changes how much context is shown when expanding a row.

 

Phrases

Phrases tool of Voyant-ToolsThe Phrases tool identifies common phrases used in the document, and organizes these phrases into a chart organized by frequency of repetition or length of the phrase.

The Phrases tool has four columns:

  1. Term: This is the repeating phrase.
  2. Count: This is the number of times the phrase appears in the document/
  3. Length: This is the number of words in the phrase
  4. Trend: This is the sparkline graph that shows the distribution of relative frequencies of the term across the document.

By default, the phrases are shown in descending order of phrase length.

The bottom bar of the Phrases tool has a search bar, length control, and additional options.

  1. Search Bar: You can search for certain terms and which phrases contain said term, or search for certain phrases altogether.
  2. Length: You can specify the upper and lower bounds of the phrase length by using the slider.
  3. Overlap: The Overlap options ensures that overlapping phrases are filtered out. For example, a phrase “once upon a time” will not also coexist with the phrases “once upon a,” “upon a time,” “once upon,” and so on. There are three options you can use to control overlap:
  • none (keep all): No filtering at all, overlapping phrases are displayed
  • Prioritize longest phrases: Only the longest phrase is kept. All others are filtered out. For example, only “once upon a time” would be kept, as it is the longest phrase.
  • Prioritize most frequent phrases: Phrases that appear frequently throughout the document will be displayed. For example, if “upon a time” was used frequently, it would be displayed over “once upon a time” or “twice upon a time.”

 

Additional Tools

Voyant offers a variety of additional Tools besides those explained in this guide. To change which tools you are using, click the “Windows” button located in the top bar of a tool. This will reveal a drop down menu, where you can select an alternate tool to switch out with that current tool.

Windows icon to see additional Tools in Voyant

The following pages list out alternate tools that you can use, organized under their icon of tools:


Eyeball Icon

  • Bubbles: Visualization of term frequencies by document. Looks like a bunch of bubbles on a screen, with a word in each buble.
  • Bubblelines: Visualization of frequency and distribution of terms in a corpus. Looks like a timeline containing bubbles corresponding to a term, varying by size and color according to certain characteristics of that term.
  • Cirrus: Word cloud that visualizes the top frequency of words.
  • Knots: Visualization that represents terms as a series of twisted lines.
  • ScatterPlot: Visualization on how words cluster in a document
  • TextualArc: Visualization of terms that include a weighted centroid of terms and an arc that follows terms in document order

Chart Icon

  • Collocates: Represents terms that occur in close proximity
  • Contexts: Shows each occurence of a keyword with their context
  • Correlations: Sees how often a term appears with other terms
  • Documents: The assembled documents in the Voyant session, if you are analyzing more than one document
  • Document Terms: A table view of document term frequencies
  • Phrases: A table view of repeated phrases in the document
  • Terms: Shows a line graph depicting the distribution of a term’s frequency across the document or documents

Circle Icon

  • Mandala: A visualization that shows the relationship between terms and documents
  • TermsBerry: A tool exploring high frequency terms and collocates

Document and Page Icons

  • MicroSearch: Visualizes the frequency and distribution of terms across the document or documents
  • Reader: Lets you read the document being analyzed
  • Summary: A simple, textual overview of the current document(s)
  • Topics: Generates term clusters from the document

Graph Icons

  • StreamGraph: a visualization that depicts the change of frequency of words in a corpus
  • TermsRadio: a visualization that depicts the changes of the frequency of words in a document(s)
  • Trends: A line graph depicting the distribution of a word’s occurrence across a document

Tree, Branch, Speech, and Other Icons

  • Links: The collocation of terms in a corpus displayed in a network
  • Veliza: An experimental tool for having a (limited) natural language exchange (in English) based on your corpus
  • Word Tree: A tool that allows you to explore how words are used in phrases