Eric Chapdelaine
Student at Northeastern University Studying Computer Science.
Notes
Projects
Articles
Resume

Email
GitHub
Instagram

Class Website

# The Class

The Cannon wars: the idea that we usually only read material from white men

What does it mean to broaden whose voices we hear.

• The more inclusive that is, the more just the world is

The digital world wold algorithm bias as it represents the biases of the past.

The 3 Units of the Class:

• Exploration of ‘Text Encoding’ (very close reading)
• Text analysis
• we will be using the programming language R for machine learning
• Digital archive creation

# The Tempest

## Text Encoding

How has the text evolved from when it was originally written to the online version?

The production of our Tempest edition

• 1611 play is performed, circulates in manuscript
• 1623 First Folio
• Print Folger editions (synthetic)
• TEI edition
• Web edition

## Context

Inspired by a true ship wreck

The Great Chain of Being: Represents the hierarchy of things during the time of Shakespeare

• God
• Angles
• Humans
• Beast
• Plant
• Flame
• Stone

Online Text

### Act i, Scene i

Class is very huge in this society–they are not equals.

• It starts off with a dispute between the aristocrats and the workers

In the beginning of the book, there are different ideas of authority

• The workers see themselves as more powerful because they have the power to save the ship
• The aristocrats see themselves are more powerful because they are higher class

### Act ii, Scene ii

Magic is introduced

## Gender in The Tempest

Why was Miranda’s virginity so important throughout this play?

• Virginity – and general purity – were seen as valuable during the time
• Possibility

Marriage was more of a transaction instead of falling in love (as it is today)

## Summary and Themes

Act 2, Scene 1

• Prospero tries to control the narrative

Themes:

• Loyalty and Betrayal
• Visibility
• Ambition

Act 2, Scene 2

• Caliban looks for a new leader
• Is attracted to Stephino and Trinculo

Act 3, Scene 1

• Ferdidnand does labor for Prospero
• Discusses how virginity is important

Act 3, Scene 2

• When Caliban, Stephino, and Trinulo are drunk and planning to kill Prospero
• Objectifying Miranda

Themes:

• Loyalty
• When Prospero no longer has a grip over what Caliban does

Act 3, Scene 3

• The prisoners on the island are very hungry and a magical table of food comes down, but disappears. Ariel, in costume, acts as a god to make them feel guilt.

Act 4, Scene 1

• The engagement of Ferdinand and Miranda using spirits
• The ‘play’ of the engagement represents Prospero’s power
• Caliban, Trinculo, and Stephino are drunk and trying to act out their plans to kill Prospero

Act 5, Scene 1

• Prospero forgives all and breaks his wand and drowns his books.

# TEI Text-encoding

TEI is a particular flavor that describes how to use XML to fit their standards.

Stripping down the document to the bare bones

• But removes the ambiguity

Model: Useful distortion so that something can be studied (We need to cut information to make something practical)

### What is XML?

Governing the rules for elements – looks similar to HTML.

Elements have start tags <...> and end tags </...>

Examples: <speaker name="Bob">

• Attributes: name
• Name: speaker

• Examples: Publication date, title, etc.

## TEI

Has a header (holds meta data)

Has the text (has the body of the document)

• Required to have a body
<teiHeader>
...
<text>
<body>
...
</body>
</text>


Example:

    <head>EPILOGVE,</head>
<sp>
<stage type="delivery">spoken by <persName>Prospero</persName>.</stage>
<lg rhyme="abababa"><l>NOw my Charmes are all <rhyme>ore-throwne</rhyme>,</l>
<l>And what strength I <choice><orig>haue's</orig><reg>have's</reg></choice> mine owne.</l>
<l>Which is most faint: now 'tis true</l>
<l>I must be heere confinde by you,</l>
<l>Or sent to <placeName>Naples</placeName>, Let me not</l>
<l>Since I haue my Dukedome got,</l>
<l>And pardon'd the deceiuer, dwell</l>
<l>In this bare Island, by your Spell,</l>
<l>But release me from my bands</l>
<l>With the helpe of your good hands:</l>
<l>Gentle breath of yours, my Sailes</l>
<l>Must fill, or else my proiect failes,</l>
<l>Which was to please: Now I want</l>
<l>Spirits to enforce: Art to inchant,</l>
<l>And my ending is despaire,</l>
<l>Vnlesse I be relieu'd by praier</l>
<l>Which pierces so, that it assaults</l>
<l>Mercy it selfe, and frees all faults.</l>
<l>As you from crimes would pardon'd be,</l>
<l>Let your Indulgence set me free.</l></lg> <stage type="exit">Exit.</stage></sp>


Whenever you want to tell TEI the alignment of the text, use the attribute style (CSS).

Example:

<p style="text-align:center">Some text here. </p>
<p style="text-align:right; font-size:italic">More text here.</p>


Identification

<rs type="person-group" style="font-style:italic">White men</rs>
<rs type="person">Our fearless TA</rs>
<hi style="font-style:italic">test</hi> <!-- highlight --!>
...
<persName xml:id="anchor1" corresp="#note1">Alanna</persName>
...
<fw type="signature">B1</fw>
<fw type="catchword">More</fw>
<pb />
<fw type="pagenumber">2</fw>
...
<choice><orig>Liberty</orig><reg>liberty</reg></choice>
...
<back>
<div type="editorial">
<note xml:id="note1" target="#anchor1"><p>Alanna Price is a PhD student in English at Northeastern</p></note>
</div>
</back>



We aren’t restricted to early texts anymore. We can use social media.

NOTE: The following section was not created by me. It was provided to me by Professor Connell in my ENGL2150 class at Northeastern University.

# Introduction to R by Laura Johnson, Sarah Connell, and Ash Clark:

R is a programming language that can be used for a wide range of both textual and statistical analyses. This introduction covers some key concepts and processes for working with R in RStudio.

RStudio is an “integrated development environment” for R. Similar to how word processing software provides an editor for writing and editing text documents, RStudio is an editor for writing and running R code. Essentially, R is the programming language and RStudio is an interface that you can use to work with R.

## RStudio environment

The RStudio workspace has several sections, each with a different purpose. On the top left, where you’re reading this now, is the Source pane, the place where you can write R scripts (a collection of code). You can open different kinds of files in the Source pane, including R Notebooks, R Markdown files (like this one), and even files using other programming languages. The Source pane is also the space where you can view any tabular data that you generate.

The bottom left section is the Console, where the code from the Source pane is run by R. To follow the code as it runs in the console, check for the > (greater-than sign) character. This indicates that the code has been executed successfully. You can tell that a section of code is finished being run when there is a new line with > and the cursor, indicating that the console is “waiting” for new code.

The top right section will default to show the Environment pane, a section that stores any of the data objects that you have defined in your R session. Data objects—like data frames and variables—will be visible in this pane as you are running code.

The section on the bottom right will default to show the Files pane. In this section you can view, navigate, and access all of the files in your working directory. This section also has four other tabs for Plots, Packages, Help, and Viewer. In these tabs, you can view plots that you generate, install, and load packages, and access help and documentation about R.

## R Markdown and running code

This is an R Markdown file, a format that contains both text (what you are reading now) that can be formatted for display on the web or as a PDF file, and snippets of code that you can run right from the file.

Before running any code, you should do a quick check on your preferences for how RStudio will handle R Markdown files. Go to the Tools menu above, select “Global Options”, then select “R Markdown Preferences,” and make sure that “Show output inline for all R Markdown documents” is not selected.

To run a single line of code from an R Markdown file, put your cursor anywhere in that line of code and then hit command-return or control-return. If you want to run all of the code in a code snippet, you can hit the green triangle button on the right. If you want to run a particular section of code, select the section you want to run and hit command-return or control-return.

Much of the code you’ll need will run almost instantly, but some things will take a few seconds or minutes, or even longer. You can tell code is still running if you see a red stop sign in the top-right corner of the console—if you’d like to stop a process, you can hit this stop sign. You will know that the code has been run or successfully stopped when you see a new > prompt in the bottom of the console.

If you don’t see the stop sign but want to cancel a process, you can also hit control-C.

If you are running code in a block line by line, your cursor will automatically move down to the next line, so you can move through the block by repeatedly hitting command-return or control-return.

Try this now with some simple calculations. The answers will appear in the console below.


4+10
615*2
81^2
10/2


Alternatively, you can also run code directly in the Console by typing or pasting it in and hitting return. You will get the same results, but if you want to save code that you have written, it is better to keep it in the R Markdown file, since edits there will be saved. On the other hand, if you prefer to run some code but not make changes to your file, you can just run that in the console.

Try this out by running a simple math operation in the console below.

## Projects

Projects are a way to organize your work in RStudio. If you opened the “WordVectors” project file first, then you should already be working in the “WordVectors” project space—and, as long as you have this project open, your files should be where you expect them to be.

To confirm that you have the correct project open, check the top-right corner of the RStudio screen. Going forward, when you open RStudio, it may also open this project. If it does not, however, you can open the project by going to File then Open Project... in the menu bar at the top of RStudio, or by clicking on the project file. Always check at the beginning of a session to make sure you have the project open; if you don’t, it will likely cause errors.

## Working directory and file structure

When you open RStudio to begin a new session, the first thing that you want to do is check your working directory. A working directory is, essentially, the starting location for the set of files that you are working in. Think of your file structure as a tree where the levels of nested folders are the places where branches split. If you were to climb a tree, your position on a particular branch would be your current working directory or location within the entire directory. The direction of the folder structure, or branch, is the file path.

You should check your working directory because if the working directory is not where you are expecting, then not much else in your files will work. Any time you see an error message that says a file does not exist in the current working directory, that’s a good sign your working directory isn’t where you think it is.

##Checking the working directory If you opened this file from within the “WordVectors” R project, then your working directory should be in the right place. Just to be sure, the code below will help you confirm this.

The first line of code will allow you to check your working directory and the second will allow you to set your working directory, if you need to change it. If you run the first line of code and the results in the console show that your working directory is the “WordVectors” folder, then you don’t need to run the second line of code.

If you ever do need to change your working directory, use the setwd() function. As you can see, we’ve provided you with some template text that you can replace with a file path specific to your computer.

Navigating file paths can be a bit confusing, but, fortunately there is a trick you can use. If you try deleting the text inside of the quotation marks below and then hitting tab you should get a pop-up with a view of the folder system you’re used to navigating; you can use this to fill in file paths whenever you need to.

# How to check your working directory (this is also an example of how you add a comment to your code—by typing "#")
getwd()

# How to set your working directory (do not run this unless you actually want to change your working directory!)
setwd("path/to/your/directory")



## Functions and variables

While we covered how to run code above, this section will give you a chance to practice editing code directly while learning about a core concept in R: functions. Functions are code that you use to perform a specific task.

To demonstrate this, let’s use some basic mathematical functions. Run this code line by line and try changing the numbers to practice editing code.

# The sum function calculates the sum of a set of numbers in the parentheses. Make sure to separate by commas.
sum(c(90,87,65,86,40,88,90))

# The mean function calculates the mean (average) in a dataset.
mean(c(90,87,65,86,40,88,90))



Functions require a specific syntax to be run successfully. When you add a function to a block of code, it will automatically add open and close parentheses; these contain the arguments, or the things the functions operate on. In addition, some functions require a pair of quotation marks (“”) within the parentheses.

R has a wide array of functions that each perform specific tasks. To understand how functions work, it is important to introduce another key concept: variables.

Variables are things you define to store data for use in later processing. Whatever you choose to define as your variable, R will treat it as a data object and store it in working memory, designated by the label you assign it for later processing. In the code below, the <- assigns the data on its right to the variable on the left. In this case, we are storing a simple sentence in the variable demo_text using the paste() function.

demo_text <- paste("It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife.")



Once you run this code, you will see that there is a new variable in your environment pane (the top right section). Once you have defined a variable, it is easy to use it again.

Now that we have defined the variable demo_text, we can use a simple function to manipulate it. For a very basic example, we can use the print() function to print the saved text into the console:


print(demo_text)



In R, the names of variables are arbitrary. You can choose whatever makes sense to you (with a few limitations; for example, you shouldn’t use the names of functions for your variables). It helps to pick something specific that you will remember. Variables can be reassigned (or, “overwritten”) at any time.

Try this by going up to the code above where you created the “demo_text” variable and typing different text inside of the quotation marks, then printing it.

There are some rules for writing variable names in R: for example, they must be a combination of letters, digits, periods, and underscores, and they must start with a letter or a period. Variables that start with a period cannot have a digit as the second character. Also, you can’t use any of the words from R’s own syntax for variables. Apart from these rules, you technically can use any variable names you like. For instance, we might have had instead:

giant_amazonian_river_otter <- paste("It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife.")



However, that variable would not be very helpful to us if we were trying to remember and use it later. Generally, you want to make your variable names clear and descriptive, and use consistent systems for things like marking word boundaries and the deciding how much detail to include.

## Saving and quitting a session

After you are done with an RStudio session, you need to take two important steps. First, save any changes to your files you will want to access later. Next—especially when you are working on a RStudio Server—you need to quit the session. There are two options for this: 1) navigate to the “Session” tab of the menu and click “Quit Session” or 2) run the following chunk of code or enter the command in the Console. If you go with the first option, it will ask you whether you want to save your workspace image; you can say “no” to this.

q()



Or, if you run this command, it will prompt a question about whether or not you want to save your workspace; you can respond ‘n’ (“no”) to this query.

On our RStudio Server instance, this step is very important because the session will continue in the browser until you end it. If there are multiple sessions ongoing, it can bog down the servers. So always remember to save and quit RStudio!

## Reminder

The next time that you want to start a new RStudio session, here is the order of basic housekeeping steps.

• Open the “WordVectors” project
• Check working directory with get(wd)

## Key concepts and terminology

As a programming language, R has a set syntax and grammar. We won’t be covering all of these concepts directly, but we’re including a full list of terms that might come up in our class discussions.

• Console: the pane in RStudio that displays the code being run by R.
• Data Frame: data frames are the R version of tabular data, like CSVs or TSVs, that can contain numerical and textual data, and be created and edited by code.
• Environment: a coding environment is the context in which code is being run or executed, meaning all of the variables and working directory for a machine—whether it is local or on a server.
• Function: functions in R are a collection of code that performs a specific task.
• Pipe: a pipe is a small piece of code “%>%” that allows you to apply the sequence of multiple operations at once.
• Programming language: a programming language is a vocabulary and set of grammatical rules used to instruct computers to perform specific tasks.
• Project: a project in R is a file that is associated with a working directory, meaning that it preserves the layout and files within your R environment and can be saved and re-opened in a new session.
• R: is a programming language and environment used for statistical computation, textual analysis, and graphics.
• RMarkdown: R Markdown is a document format that contains both text and snippets of code that you can edit and run directly in the file.
• RStudio: RStudio is an integrated development environment for R, or an interface editor where you can access the console, terminal, environment, working directory, and source files.
• Script: a script in R is a collection of code that, usually, is in a text form and contains all of the code that you would enter in the command line.
• Stop Words: stop words are the most common words in a language, often prepositions, pronouns, and conjunctions. For example, words like: the, and, of, for, a, she, etc.
• Terminal: the direct interface with the console where you can write text-based commands, or do command-line coding.
• Tokens: in computation, “tokens” are single units of textual data, usually words, that are created through a process called tokenization. Tokens allow for textual and computational analyses of textual data.
• Variable: a variable is a character or group of characters (like x,y,x or words like “text” or “words”) that you define and that R treats as a data object, stored in working memory for later processing.
• Working Directory: a working directory for computing is a hierarchical file system, or the set of files that you are working in.

## References

Arnold, Taylor and Lauren Tilton. “Basic Text Processing in R.” The Programming Historian vol. 6, 2017, https://doi.org/10.46430/phen0061.

Dewar, Taryn. “R Basics with Tabular Data.” The Programming Historian vol. 5, 2016, https://doi.org/10.46430/phen0056.

## Credit and thanks

This tutorial was adapted from one developed as part of the Word Vectors for the Thoughtful Humanist series at Northeastern University. That walkthrough was informed by tutorials from The Programming Historian, and exercises created by Thanasis Kinias and Ryan Cordell for the “Humanities Data Analysis” course.