Tuning in to Soundwriting

Sound and Access: Attuned to Disability in the Writing Classroom

by Dev K. Bose, Sean Zdenek, Prairie Markussen, Heidi Wallace, & Angelia Giannone

4. Captions, Engineered

Captioning can be used in a variety of instructional contexts. In this section, we analyze data from three contexts (digital storytelling, teaching practicum, and first-year writing). Rather than going into a deep analysis of each class, however, we briefly explore how instructors integrated captioning lessons into their curricula, how students responded to captioning, and some implications for sound studies.

Transforming Audio, Image, and Caption in a Digital Storytelling Course

Digital Storytelling & Culture is a junior-level course housed within the School of Information at the University of Arizona. The purpose of the course is to explore the theory and practice of creating stories digitally. The major projects include a digital technology story, photo story, interactive narrative (in the style of choose your own adventure), audio mashup, and documentary. The documentary assignment is the course capstone and synthesizes audio and visual concepts that are covered throughout the course. The course emphasizes image description alongside closed captioning. In short, Digital Storytelling & Culture aims to center access as its core pedagogical objective while featuring reflective writing about digital artifacts.

In this audio clip, Angelia (coauthor as well as the course instructor) explains how captioning was included in the audio mashup assignment.

Teaching Audio Mashups through Captioning: Transcript

Dev: What do you do with the audio mashup portion?

Angelia: So for that, I'm having them create a song. And in this one I did give them some more stringent requirements. Like, I actually gave them a length requirement. But I think that that's really the only thing I have, other than that they have to connect their mashup to some core experience—

Dev: Oh, okay.

Angelia: —that they will articulate in one to two paragraphs.

Dev: Okay.

Angelia: And this is kind of borrowed from my experience with game design because the way I teach game design is centering games around experiences—

Dev: Okay.

Angelia: —which is pretty common in the literature. So for this mashup, I actually borrowed the assignment from an assignment that I had in undergrad. But this one's a little bit more structured. So for that, I basically just asked them to come up with a core experience, like think about something that's specific, and then think about how they can use audio tracks—whether it's songs, or just people talking, or whatever it might be, or just instrumental music—to convey that feeling, or idea, or whatever the experience is.

Dev: Oh, so like if I was going to write a playlist of some significant event in my life or something, or create a playlist of songs?

Angelia: It could be something like that. Yeah. So when I did this project, I think my partner and I, I want to say that we did—we made a mashup. And the core experience for that was kind of like the stages of grief, I believe. So it was kind of like a song that takes you through those five typical stages of grief or whatever.

Dev: Oh, okay.

[audio quickly fades out]

Sound studies has thematic potential for curricular design across an assignment arc. It is also cross-disciplinary. For example, one might teach a curriculum based around a pervasive social theme (such as food politics). Students would then create multimodal recordings about this topic and investigate the topic from their own disciplinary perspectives by integrating scholarly research from their fields. In the track above, which was recorded after she taught her students about captioning, Angelia notes that she borrows this assignment from her academic experience studying game design, which she explains as being most effective when taught through a user-centered perspective. For the audio mashup, students create a sonic design inspired by a core experience that has affected their lives. They use any kind of prerecorded audio clips, including (but not limited to) instrumental music, spoken words, or dialogue to shape and edit the mashup. Students' topics were broad, from a historical analysis of music to a mashup of motivational speeches. The instructor stressed that students should study the conventions of the mashup, especially when scaffolding towards other major projects in the course. In the interest of access, throughout the unit, the instructor required fully composed descriptions, or annotations, of any included sounds, and students were presented with the option of either text-only or text-audio annotations (although the latter option was not selected by any of the students in this particular class). Annotations served as alternatives for the overall assignment for students who could not or did not want to pursue the audio mashup option. Students articulated the purpose of their mashups through brief reflective writing.

Our approach to sound studies is multimodal. Instead of placing image, sound, and text in different categories of accommodation, we considered the extent to which access to both sound and image is grounded in rhetorical and creative acts of written inscription. To this end, we have explored how image description for the purpose of nonvisual access can provide scaffolding for other forms of description, such as the captioning of sounds. In the following track, Angelia describes how she taught image description.

Framing through Image Description: Transcript

Angelia: So the first time I did the [image] captioning exercise in class, I didn't frame it too super well, [Dev and Angelia chuckle] but I had basically kind of given them just a little bit of background about it. Like I kind of connected it to feminist theory—I connected captioning to feminist theory and disability studies a little bit.

Dev: Oh!

Angelia: And then I had them caption one of the images from their photo story project just so that they would have an image that's on hand, assumedly, for them.

So I asked them to caption something from their project and then I looked at some New York Times articles that were just there on the front page. And so the one that we focused on—what was that? The title of the article was "IBM Now has More Employees in India Than in the U.S." (Goel, 2017). So I gave them the title of the article, and then I just showed them the first picture, which is basically a bunch of people sitting at, like, this modern-looking table in an IBM office. And I had them caption just kind of generally like, what do you think this picture is about?

Dev: Oh, interesting.

Angelia: And then we kind of talked about, afterwards, what students included in the captions, how are their captions sort of functioning rhetorically or saying something about the image, what did they think was important in it, and what did they leave out.

The first time Angelia taught captioning—defined broadly—was through image description. The course considered image captioning much like alt text—in which images are described for screen-reader users—under the presumption that the practice of image description, to the extent that it is rhetorical, would prepare students for video captioning. Students discussed the rhetorical functions of image captions in terms of what was included and what was left out of their captions. The image captions exemplified creativity by articulating the stories behind the image, although in some cases students considered the stories behind an image where the description was overly literal in a way that missed an image's context or subtext. They also considered the idea that creating image captions functioned as a decision-making process in which writers must work within constraints and make decisions based upon the context and function of the images. The context provides "clues" to support decision-making.

The second time Angelia taught captioning was while screening a documentary film that had already been captioned by a previous set of students. During certain scenes layered by light piano composition, the students currently watching the documentary as it was screened debated whether or not to include this musical description in their caption track. Next, students were shown a second documentary without captions and given the task of captioning it as part of an in-class exercise. An interesting finding was observed: students began to talk amongst themselves about whether it would be appropriate to describe facial expressions in the captions. In other words, what emerged was a deeper discussion of how to create an accessible experience for a wide range of viewers. While facial expressions are not usually captioned in a film context, they may be crucial to interpreting the manner in which words are spoken. Manner of speaking—how words are pronounced—is an important aspect of nonspeech captioning.

In the following track, Angelia reflects upon the next part of the exercise.

Comparison of Different Instructional Contexts: Transcript

Angelia: The Pizza Passion documentary, I think that there was kind of a lot going on in that video, and they didn't really have enough context or interest in what was going on.

Dev: Yeah.

Angelia: That one was a little trickier, and I think that they were kind of just not really interested in that point.

Dev: Yeah, yeah. No, that makes sense. Yeah.

Angelia: Yeah.

Dev: I mean, the second one, it seems like it was taken out of context, and I think what you're saying too is that there was just a lot more happening in there.

Angelia: Yeah.

Dev: Yeah.

Angelia: Yeah. Whereas the student project was a lot more simple and kind of easier for them in terms of a barrier of entry into captioning, I think.

Dev: Oh, I see. Okay.

Angelia: Yeah.

Here, Angelia explains that students had more difficulty with the second part of the lesson, where they were asked to caption a content-rich video (without additional context). Angelia indicates that there may have been additional barriers for students in the content-rich video-captioning activity, which may have led to a loss of interest.

Considering the Captioner's Rhetorical Agency in a Graduate-Level Practicum

The practicum course in the writing program at the University of Arizona is designed to prepare first-year graduate teaching associates (GTAs). The captioning exercise took place during a unit on lesson planning and activity development. As part of this unit, students were asked to participate in an activity centered around accessibility. This activity was presented to the GTAs as a potential lesson plan and a means to acknowledge and practice accessibility within the college writing classroom. The activity is based on Chad Iwertz and Ruth Osorio's (2016) research on captioning pedagogy.

Figure 1. Image description: A woman wearing an apron is reading on her cell phone. Behind her are shelves lined with various convenience store items such mouthwash bottles, cough syrup bottles, and batteries. A cash register sits on the table in the left foreground of the image. The image is in color. A closed caption centered at the bottom of the image reads "silence" in brackets. (Cord et al., 2017)

Dev (coauthor as well as the course instructor) asked his practicum students to analyze two scenes from from an episode of the television show Master of None (Cord et al., 2017) in which audio was either (a) completely removed for the purposes of the lesson by the instructor or (b) completely removed from the show's postproduction as a directorial decision. The second scene (access Figure 1) in particular recalls H-Dirsken L. Bauman and Joseph J. Murray's (2014) Deaf gain in that the hierarchy between hearing and deafness is inverted. As coauthor Sean noted in Reading Sounds: Closed-Captioned Media and Popular Culture,

silence sometimes needs to be closed captioned. Captioners not only inscribe sounds in writing but must also account for our assumptions about the nature, production, and reception of sounds…. Sustained sounds, including sounds that are captioned as continuous or repeating … may need to be identified in the captions as stopped or terminated if it's not clear from the visual context. (Zdenek, 2015, p. 183)

When a phone has been ringing and then stops ringing, the silence needs to be captioned if it's not clear from the visual context that the ringing has stopped.

In the first scene from the episode, vocal conversation between hearing individuals is a key focus: Two doormen gossip, followed by a doorman and a tenant, the latter engaging in racially motivated language about another tenant. The scene is rich not only in terms of conversation, but in terms of movement, as the camera follows each character's path of motion from an entry desk down through a hallway. In the following recording, students respond in groups after being asked to caption the scene while watching it a few times with the sound turned off.4 4 Please note that the first few seconds are difficult to hear due to conversation in a crowded room; essentially, students brainstormed how to approach the assignment and then discussed their own captions for the scene. Most significantly, a student pointed out the "exasperated tone" of one of the characters and directly transcribed that into her captioning.

Brainstorming: Transcript

Speaker: [crosstalk] Some of the things that I thought would be important to include would be, like, the sound of the taxi [crosstalk] and the traffic. And then I would probably include in the caption the exasperated tone that the protagonist has when his friend tells him there's a twist. [crosstalk] So you can understand— [audio quickly fades out]

In this audio clip, one student discusses how and whether to caption the doorman's exasperation with the racist tenant. Another student responds, discussing how the tone of the character speaking may not necessarily be reflected in the caption and how the use of quotation marks can help capture feelings that are challenging to communicate in writing. This experience is comparable to coauthor Angelia's, whose students had pondered describing facial expressions and tone. In class, students continued to engage with the complexity of the task, which is important in terms of helping them consider captioning as a function of rhetorical decision-making. As coauthors, we acknowledge that captioning should follow well-established guidelines and conventions (access, for example, Described and Captioned Media Program, 2021). At the same time, we would argue that captioning guidelines have little to say about the rhetorically situated nature of captioning that our students routinely grappled with.

In the next audio clip, a student explains the use of punctuation as further evidence of rhetorical decision-making.

Purposeful Decisions: Transcript

Speaker: Also, I just thought it was interesting the way they used quotation marks in the captions because I didn't use quotation marks, and I didn't really hear that in the dialogue. So it's kind of an interesting way, I think, for the people that actually captioned it to, like, distance themselves from that—the kind of racist conversation.

Dev: They put the word "Indians" in quotation marks.

Speaker: Yeah. Yeah, they put all of the specifically racial terms in quotation marks, so I think it's a way for—yeah, through that, I think I saw, like, an actual human captioning it and being uncomfortable with the language—

Dev: Right, right.

Speaker: —whereas that's not necessarily read in the actual scene.

Dev: What do you think in terms of— [audio quickly fades out]

Quotation marks in the captions, the student explains, are used to distance characters from remarks that carry clear racial overtones. When the term "Indian" is captioned with quotation marks, the student questions whether the captioner may have felt uncomfortable with the language being used.5 5 Note that at the time of the classroom discussion, we were unaware of the exact language from the script (whether quotations were included in the original script, for example), and we acknowledge that captioners may be directly copying from the script, as per Sean's interviews with one captioner who stated that 60% of captions are script-replicated (Zdenek, 2015). Even though we are reluctant to speculate with this student about the captioner's motives, we highlight here how captioning, unlike speech, makes punctuation hypervisible, as well as the student's recognition of the role that punctuation can play for caption readers (for an extended discussion of em-dashes and ellipses in captioning, access Zdenek, 2015, pp. 152–162).

Students come to realize that the act of captioning is powerfully rhetorical. The caption writer possesses a sense of agency (as we define it, action-based decision-making) when deciding how and what to caption. As Sean pointed out in Reading Sounds, "we can draw upon our situated sense of how punctuation functions … to make predictions about the future. Closed captions allow us to see what is hidden or veiled in speech—to see punctuation—and to do so ahead of the speaker's natural pauses" (Zdenek, 2015, p. 154). Our classroom example seems to suggest how quotation marks can inform viewers of an additional layer of clues pointing towards the irony of the racism taking place in the story. The class further analyzes agency during this captioning lesson, showing once again the importance of decision-making when it comes to producing the "right" kinds of captions. As one student notes:

Captioning and Agency: Transcript

Speaker: Yeah. Well, and it's, like, more agency on your part because you're literally interpreting what's being said. You're running it through rather than just listening to someone.

Dev: Yeah.

Speaker: It's kind of like you have more agency because you're filtering it and then imagining someone saying it, so it, like, physically has to filter through you rather than if you're just speaking and, like, get that all at once.

Dev: Oh, I see. So the listener, you're saying, has more agency.

Speaker: Uh—well, I think when you're reading, like reading the subtitles, it feels like you're more implicated in it, I suppose.

Dev: Yeah, yeah.

The student recognizes the captioner's agency because of the captioner's conscious decision to filter the character's dialogue through quotation marks. According to the student, reading captions makes the viewer feel more "implicated" such that a duty exists for captioners to produce captions that are ethically responsible. It would be interesting to consider whether deaf or hard-of-hearing students might share this insight and whether preference for captioned speech over nonspeech (or purposeful punctuation in captions) may reflect an able-bodied understanding of human perception. Later in the class session, the students returned to their own writing and teaching practices, reflectively recollecting on the same responsibilities in their own work.

After this discussion, the lesson transitioned to the second scene from an episode of Master of None (Cord et al., 2017) in which a deaf woman and hearing man communicate with sign language, hand gestures, facial expressions, and text messages in a funny, flirtatious dialogue. Misunderstanding through language barriers is key to this scene, with the man failing to understand what the woman is communicating through sign language; eventually, she resorts to using a note-taking app on her phone. It is important to note that the scene is silent (as indicated by a "[silence]" caption; access Figure 1 above). The signed and gestured communication between the pair occurs without any captioned or subtitled interpretation.

After viewing the clip in class, one student asked Dev whether he had turned off the captions during this scene.

No Sound in Postproduction: Transcript

Speaker 1: Did you turn the caption on when you were showing it?

Dev: That's a great question. No, so I didn't turn the caption on on that. So those were actually automatic. Those were actually automatic subtitles that—I guess not automatic. They were written—they were written in this part of the original script.

Speaker 1: Oh, okay.

Dev: Yeah.

Speaker 2: It was purposeful to have no sound at all.

Dev: Right. That was purposeful it had no sound. Right? And have it—

Speaker 1: I guess the emphasis for the first scene is, like, incomprehension because they can't communicate, so there's no caption and then neither of them understood each other. And then for the second one, there's a couple of captions so that it keeps the reader sort of immersed in their communication and comprehension.

Dev: Oh, yeah. Yeah. That's—yeah.

The student's question about whether captions were turned off during this scene is particularly important because it calls attention to the claim that even silences (e.g., manufactured silences) sometimes need to be captioned. Therefore, the absence of sound, and not just the absence of captioning, becomes centralized in this powerful scene, which we stipulate is a purposeful endeavor on the part of the show's producers to immerse viewers into a form of communication (sign language) that is more purely visual. Viewers who don’t understand the woman when she signs "I'm Deaf" and "children," or feel frustrated when the man's lip movements are not captioned or accompanied by speech sounds, may come to identify with the couple's own struggles to communicate across a language barrier. Another student noticed that, because the scene is silent, viewers become immersed in the communication between the characters.

Rhetorical agency is an integral part of composition pedagogy courses, in which instructors are often students who are learning about teaching theories while simultaneously putting these theories into practice in the courses they teach. Given the nature of this teaching practicum, students were consciously thinking of their roles as instructors of first-year writing and were asked to continuously reflect on how activities learned during practicum might be applied within their own classrooms. In this audio clip, Dev asked the students how they might adapt this activity in their own first-year writing classes.

Adapting Captioning Towards Other Lessons: Transcript

Speaker: Maybe it can be done in smaller groups with different video clips of different shows—

Dev: Oh, okay.

Speaker: —in theory, to help emphasize that, like, the genre of captioning can be used in many different settings or contexts.

Dev: In many different contexts or settings. Right? So maybe you could have— [audio quickly fades out]

As in the previous clip, we find evidence to support the idea that teaching captioning as a pedagogical tool in the graduate composition practicum can lead to instructors fostering their own undergraduate students’ rhetorical agency. The instructor in this clip discusses the possibility of exploring with students how captioning may be genre-dependent. Video clips from different movie or television genres could be assigned to different groups to compel students to reflect on the influence of genre. Considering that the textbooks and assignments used by several writing instructors at her university are genre-based, this student strategically communicates one application of this lesson in terms of the curriculum that she is responsible for teaching. More importantly, the student acknowledges captions as objects of genre analysis and the closed-captioning exercise as a useful first-year writing classroom activity. In short, this graduate student instructor positions access as a central component of her pedagogy.

Captioning and Prior Instructional Experience: Transcript

Speaker: I just know—I mean, I was a Spanish major, so we use this all the time, kind of like a way of testing ourselves. So we would watch a clip and we'd try to understand what's going on. But then my professor would play it again with the captioning so that we could kind of check, like, were we right or—

Dev: Oh, cool.

Speaker: Oh yeah. So it was kind of like a review slash assessment tool in my degree a lot. Yeah, all the time.

Dev: Absolutely. Now so when you did that— [audio quickly fades out]

In this recording, another student cites prior experiences in a classroom with exploring captioning as a tool to be used for review and assessment. She discusses her experience as a Spanish major, where her instructor would play clips of scenes with and without captions and, between playing scenes, ask students to write dialogue to verify whether their transcriptions were correct. This example reminds us that captioning can serve multiple learning styles. In this instance, the student found it to be particularly useful in a language course, demonstrating a connection between closed captioning and auditory communication. A course that is universally designed would draw on multiple modes of communication in order to meet the needs of a diverse array of learners.

Reading Film Closely in a First-Year Writing Course

Foundations Writing is a first-year writing course at the University of Arizona. In this course, students read and analyze written, aural, and visual texts in order to develop close-reading skills and rhetorical awareness. During a film analysis unit, Heidi (coauthor as well as the course instructor) teaches students how visual and sonic film techniques convey specific messages to the audience. To practice closely "reading" a scene from a film, Heidi directs students to caption dialogue and sounds. The class then discusses how their choices in the captioning exercise are rhetorical: what sounds/dialogue do the students emphasize through their captioning and why?

In the following two audio clips, Heidi discusses how she teaches students the rhetoric of film through cinematic techniques. Students analyze one or two cinematic techniques as those techniques pertain to a film and consider how a director communicates visually through those specific techniques. Although the film we describe in this chapter is not silent, the featured scene contains minimal dialogue and voiceover narration, so viewers mostly rely on visual cues to understand the subtleties of the plot. This lesson effectively became an exercise in deep listening.

Cinematic Technique in First-Year Writing: Transcript

Dev: So, what was the context of the lesson? What assignment are you teaching or project are you having them do, and, like, how did it fit in?

Heidi: Okay, so it actually worked out really well for this unit because I'm doing film analysis.

Dev: Oh, cool.

Heidi: For the film analysis assignment, I'm giving them two silent films—

Dev: That's right.

Heidi: —and they have to choose one, and it's split up into two different parts. Part one is analyze one or two cinematic techniques that we've gone over in class; how does that work in the silent film? Part two: choose any movie that you like at all—any movie that you've seen, is your favorite movie or whatever—and find those same cinematic techniques in that movie.

Dev: The same cinematic techniques as used in the silent film.

Heidi: Yeah. Yeah. Let's say montage—we were talking about montage today. One silent film is full of montages. If you find a montage in one of your favorite movies, compare and contrast how the montage works in each movie. So that's the basic assignment for the essay that we're working on.

Close Reading in First-Year Writing: Transcript

Heidi: [audio fades in midstatement] —if you had some sort of hearing or visual impairment, what kind of things would you need to know in order for the narrative to fit, right?

Dev: Oh, okay.

Heidi: And what it turned into—just for the purposes of the essay that we're doing—it turned into a very good close-reading exercise. So— [audio quickly fades out]

Figure 2. Image description: A door with a window blind is partially open in the center background. On either side of the door are windows with partially open curtains. Two tables are in the image: one in the background which appears out of focus with a lamp nearby, and the other holding a vase full of flowers. The image is in black and white. A closed caption centered on the bottom of the image reads "CRASH" in brackets. (Tourneur, 1947)

The scene that students captioned was a love scene from the 1947 film Out of the Past (Tourneur, 1947; access Figure 2). One of the underlying goals of the film analysis unit, and particularly its consideration of the film noir genre, is to help students read the subtlety of older films, which are often, unbeknownst to younger students, full of risque moments. Students are presented with a chance to analyze a film genre that likely falls outside of their immediate knowledge. Because these films were not originally captioned, they demonstrate an ideal teaching moment in which the legal context of captions (and other on-screen text) can be taught alongside the practical aspects of caption production. As Heidi elaborates,

Older Films and Cultural Analysis: Transcript

Heidi: I'm trying to expose them to a lot of black-and-white movies because I feel like they're not used to it. So I showed them a Mae West movie, who—she's really saucy, and, like, sexual, and funny.

Dev: Totally.

Heidi: So I was trying—and this was, I showed them a movie that the Hays Code, the censorship, wasn't implemented until after this movie was produced. So it's very risque, like, sexual innuendos all over the place. And I wanted them to just kind of realize that old movies can be funny and really kind of off-color too, you know?

Dev: Yes.

This audio clip suggests how captioning can be used to teach cultural analysis when tied to the goals of close reading and deep listening. Teaching students how to perform close readings of texts is a common goal in first-year writing classes (access, for example, the first-year writing course descriptions at the University of Arizona). Teaching students to closely read texts allows for thoughtful analysis of a text's message. Close reading, when combined with research strategies, allows students to draw connections to context. Reading and writing are intertwined, correlational practices, and should therefore be taught together (Foster, 1993). We believe that close reading should be taught alongside writing activities and that closed captioning is an effective teaching tool because it can engage writers' reflective self-awareness.

Closed captioning allows students to understand the relationships between reading and writing because it simultaneously engages students in both forms of discourse at once. In doing so, students engage with concepts that are critical to an understanding of film characters' motivations and the cultures that shape these motivations.

Captioning and Voiceovers: Transcript

Heidi: [audio fades in midconversation] And this one particular scene I chose [from Out of the Past] after just—I probably researched for 20 minutes, looking at different clips. In this particular one, there's a voiceover as the man's walking into the room. So he's not talking, his lips aren't moving; and then it's a love scene too. So they're flirting, and at one point they just ran out of a rainstorm, and he puts a towel over her head, and she's screaming, but again, you can't see her mouth.

Dev: Okay.

Heidi: So she's screaming playfully, so she's like, "Oh, don't rustle my hair too hard" or something. [Dev chuckles] And then the wind opens the door and the lamp breaks, and they're obviously making out or something, and then they get up, and that's it.

Dev: Interesting.

Heidi: So that's why I chose it: because of all the different moments where the mouth would be covered, and you wouldn't hear the—

Dev: Where the mouths of the actors would be covered.

Heidi: Exactly.

Dev: And of course there's a voiceover. So not having a voiceover might have, like, almost—possibly a detrimental effect, or at the very least, the listener wouldn't really be able to understand, like, what's going on.

Heidi: Exactly. Yeah. And it's a memory. So the voiceover is recounting something from the past.

Dev: Oh, okay.

Heidi: So that's also something that is important for the narrative. So there are a lot of, you know, sonic things going on that you wouldn't necessarily hear if the sound wasn't on, or if you couldn't hear it, that were very important. And they did talk about the visual stuff too.

Dev: Yeah.

Heidi: But I don't really— [audio quickly fades out midstatement]

Here Heidi explains that she chose this scene because of graphic visual content that communicates far more than the dialogue and voiceover narration. In the context of the Hays Code and the beginnings of censorship in film in the 1930s, viewers must fill in the blank that the characters are making love. Heidi discusses the visual and sonic cues with her students and asks them to create captions as part of a lesson on close reading.