All Library spaces are currently closed.
Voyant Tools Tutorial
Allen Brown, Bryan Tor | Fall, 2021
Contents
Voyant Tools is an online application used for text analysis. Created with the digital humanities in mind, it is intended to facilitate reading and interpretive practices for its users. Voyant can generate word clouds from given documents, show word frequency or collocation, and perform other text mining functions with plain text, HTML, XML, PDF, RTF, and MS Word files. Researchers have used Voyant Tools to analyze texts in a wide range of contexts including literature, language teaching, healthcare, and system architecture.
Certain functionalities outlined in the official Voyant Tools Help documentation do not reliably work. However, these problems do not impair use of Voyant as a text analysis environment, and this tutorial takes these issues into account.
Voyant can be found here: http://voyant-tools.org
When first navigating to the website, Voyant will need to have the text it should analyze. Users can paste in the text to be analyzed directly into the text area, paste the URL of the website to be analyzed. Alternatively, you can open an existing corpus created with Voyant, or upload a file to analyze from your computer.
You can edit Voyant’s stopword list by opening the option menu, accessible by hovering briefly over the top bar of the tool and then clicking the options icon (looks like a switch) once it appears.
Voyant keeps a list of words, called stopwords, that should be excluded from searches. Including stopwords could clutter out more interesting results from text analysis, but important terms should not be classified as stopwords.
After accessing the options menu, you can select “Edit List” to directly modify what Voyant thinks is a stopword, or you can change Voyant’s “Auto-detect” stopword option to a preselected list of stopwords from a specific language
The Voyant export feature allows you to export the entire session of Voyant Tools or specific tools from that session. Click the Export icon to access the Export menu. You have three to four different choices when exporting:
Cirrus is a word cloud that visualizes the most frequently used words of the document. The color and absolute position of the words are not significant, although words that appear more often will be positioned approximately towards the center. Words used less often will be located towards the outside perimeter of the cloud.
Hovering over a word will display the number of times that word appears on the document. Changing the bottom-left slider labeled (“Terms”) adjusts the number of shown words.
Document Terms is a table view of the term frequencies in the document. The table view has seven data columns:
You can manage your columns by hovering over the title bar for a column and opening the drop down menu. Check off which columns you want and do not want.
#: Document number. If you have multiple documents, this number will change to correspond to which document has which term frequency.
Term: This is the word itself
Count: This is the frequency at which the word appears Relative: This is the relative frequency (per 10 million words) of the term in the document Trend: This is a graph that displays the distribution of the term throughout the document Significance: Significance is measured as a TF-IDF score. It is a common way of displaying how important a term is relative to the rest of the terms in the document. Not displayed by default Z-Score: This is the Z-score (standard score) for the term’s raw frequency compared to other term frequencies in the same document. Not displayed by default. Adding Z-score may hide the Trends column
You can manage your columns by hovering over the title bar for a column and opening the drop down menu. Check off which columns you want and do not want.
Trends, also known as the Type Frequencies Chart, shows a line graph reflecting the distribution of the frequency of a word used in the document. Each line in the graph is colored arbitrarily to the word it represents, and there is a legend at the top of the tool showing which colors are assigned to which words. You can hide and reveal terms from Trends by clicking on that term in the legend.
The Search Bar is located towards the lower left-hand corner of the Trends tool. You can select certain terms to only be displayed by Trends.
The Options menu is located on the top bar of the tool, towards the upper right hand corner. From the menu, you can modify the Stopwords list (see the stopwords section of this guide for specifics), frequency type, segment numbers, or the color palette used by Tools (this feature may not work).
The “Frequencies” option on the bottom bar of the Trends tool manages how word frequencies are generated. You can pick between two options:
If you only have one document, that document is split into a number of segments. You can change how many segments that document is divided into. If you have multiple documents in your session of Voyant Tools, then the segments consist of each document, and the number of segments can no longer be changed.
The Contexts tool, also known as the Document Type KWICs Grid, displays a table contextualizing a selected word with the phrases or paragraphs of text that directly precede and follow each instance of the word in the document.
The Contexts tool has five columns:
Each row has the option to expand. Expanding the row shows additional context as to where the term is located. You can modify how much additional context is displayed with the sliders located on the bottom bar of the Context tool.
The Phrases tool identifies common phrases used in the document, and organizes these phrases into a chart organized by frequency of repetition or length of the phrase.
The Phrases tool has four columns:
By default, the phrases are shown in descending order of phrase length.
The bottom bar of the Phrases tool has a search bar, length control, and additional options.
Voyant offers a variety of additional Tools besides those explained in this guide. To change which tools you are using, click the “Windows” button located in the top bar of a tool. This will reveal a drop down menu, where you can select an alternate tool to switch out with that current tool.
The following pages list out alternate tools that you can use, organized under their icon of tools:
The land on which we gather is the unceded territory of the Awaswas-speaking Uypi Tribe. The Amah Mutsun Tribal Band, comprised of the descendants of indigenous people taken to missions Santa Cruz and San Juan Bautista during Spanish colonization of the Central Coast, is today working hard to restore traditional stewardship practices on these lands and heal from historical trauma.
The land acknowledgement used at UC Santa Cruz was developed in partnership with the Amah Mutsun Tribal Band Chairman and the Amah Mutsun Relearning Program at the UCSC Arboretum.