AI in the archives: Fondren Library explores new tech
From Grammarly and Quizlet to SparkNotes and Spotify, artificial intelligence is now a major feature of nearly every website — and the archives of Fondren Library are no exception. The use of AI has been a notoriously hot-button topic for the last few years, involved in artist exploitation debates and the terms of the Writers Guild of America strike, but in the Woodson Research Center, its role has been to facilitate greater ease and expediency in many of their preservation and transcription processes.
In the Woodson Research Center, AI has made possible a new digital collections website, which launched in May 2023. Norie Guthrie, the archivist and special collections librarian, said that Woodson staff previously used the “D space” — now the Rice Research Repository — where they would send audio files to an off-campus service to transcribe, edit and upload the returned PDF. With AI, this is expedited: Transcribing hours of audio now takes only 10 minutes, when previously the process had taken days.
“We can push a button and then it creates a transcript. It is very helpful for audio because audio transcription takes time and this makes it so that the process can be a lot faster,” Guthrie said. “It’s a lot more user-friendly, and it’s a lot easier on us so we can be more efficient with our work.”
Now, the most laborious part of the audio transcription process is cleanup and correction of the completed transcription — for example, correcting the spelling of Wiess from “ei” to “ie” every time it is mentioned.
AI has also been utilized by students in Fondren-associated research projects. Zoe Katz, a Will Rice College senior, applied to the project through Fondren Fellows, a year-long program through the library that provides research opportunities for Rice undergraduate and graduate students.
“I applied because I kind of want to go to grad school and I was interested in seeing what research options they had. They had one about topic modeling, which is a subfield of computational linguistics,” Katz said. “I’m a linguistics and computer science major, so I was really interested in how both of those two topics would overlap.”
Topic modeling is a type of statistical modeling that uses unsupervised machine learning to identify clusters or groups of similar words within a body of text. This text mining method uses semantic structures in text to understand unstructured data without predefined tags or training data.
Katz chose to focus her project on using OCR, or optical character recognition, to convert old issues of the Thresher from PDFs into text-searchable versions. OCR recognizes text in scanned documents to convert a physical document into an accessible electronic text version. Due to her experience writing for the Thresher, Katz understood the need for an easily searchable database of past publications both for student journalistic research and historical value. Because the project was only finished early last semester, Fondren Fellows is still in the process of publishing the data, but her work will soon be available for public access.
Steven Loyd, the processing assistant at Woodson, uses ChatGPT to write Python scripts to perform basic tasks. He combines his knowledge of algorithmic thinking and ability to conceptualize where a coding program would be useful with AI’s code-writing abilities. Often the code he collaborates with ChatGPT on is used to automate online chores, such as counting, organizing and renaming file types and folders based on contents and other variables by thousands or tens of thousands of files at a time.
“These are tasks that humans could do with enough time, but there is no shortage of things at the Woodson that only humans can do, so having useful code on hand saves time for more challenging, provocative work,” Loyd wrote in an email to the Thresher. “Ultimately, for our purposes at the archive, ChatGPT is a very helpful tool that nonetheless requires significant human input to function usefully … As far as the future goes, I see AI as a technological advancement akin to the internet, new storage formats, etc., that help archivists process materials more efficiently and reliably.”
The Fondren Library Artificial Intelligence Task Force was created to discuss possible limitations around AI use in issues of academic judgment, but also how it can be better utilized in research and professional life. Their meetings are open to students, who can contribute to the discussions as Rice continues to grapple with the questions of ethics surrounding the use of artificial intelligence. These discussions will hopefully help find ways to use AI to support rather than supplant human experience and learning, Loyd said.
“Archives, I think, are ultimately humanistic, requiring [an] informed, passionate understanding of given materials and the cultural context surrounding them,” Loyd wrote. “I don’t foresee archival AI rising above its station as a timesaver to anything with actual responsibility.”
More from The Rice Thresher
Nets Katz on skipping grades and solving problems
Nets Katz has always liked numbers. As a child, he played with numbers in his head and quickly learned to add and multiply. Katz’s elementary school grouped students in classes based on test scores. However, Katz didn’t land on the top track.
Coordinating change: former and future coordinators on O-Week
After her Orientation Week experience, Alessa Elkareh knew she wanted to advise. However, when she told her friends, she was met with doubt.
A historical hangover: rating past party themes
Prepare to be horrified, amused and maybe a little concerned as Rice’s most cringe-worthy party themes are revisited. The past remains littered with themes tried and failed, yet necessary for the birth of the parties Rice now enjoys.
Please note All comments are eligible for publication by The Rice Thresher.