A Lab for the Humanities
How 黑料正能量 Libraries Powers Digital Scholarship
By Sarah Bender Email Sarah Bender
For generations, scholars of early modern Europe have faced a stubborn problem: thousands of books and pamphlets were printed without names, masking the identities of those who produced them. These anonymous works 鈥 often politically or religiously controversial 鈥 left behind few clues about their origins, creating gaps in the historical record.
Now, a long-running partnership between the University Libraries and Dietrich College of Humanities and Social Sciences has produced a new digital resource to help close those gaps. uses machine learning to analyze the minute details of 17th-century printing, offering scholars a powerful new way to identify who printed what.
It鈥檚 the kind of tool that could only exist today. And it鈥檚 exactly the kind of work happening in the University Libraries, where librarians, researchers and technologists are building new tools for humanities scholarship.
Building a New Kind of Evidence
The CDT grew out of the project, which uses computational tools and methods to detect new evidence in early printed books. Situated at the intersection of book history, computer vision and machine learning, the project seeks to discover letterpress printers whose identities have eluded scholars for several hundred years.听
听
The project is led by Professor of English and Department Head Christopher Warren, along with Taylor Berg-Kirkpatrick, an associate professor in the Department of Computer Science and Engineering at University of California San Diego. Curator of Special Collections and Posner Center for Special Collections Director joined the team when he first started working with the Libraries in 2020.
听
The team has worked to identify anonymous printers of controversial books and pamphlets published during an era of censorship and political unrest, investigating pieces like John Milton鈥檚 1644 pamphlet on freedom of the press, 鈥,鈥 and Thomas Hobbes鈥 1651 exploration of social contract theory, 鈥淟eviathan.鈥 They even solved the mystery of who published Shakespeare鈥檚 Fourth Folio, which was highlighted in the Libraries鈥 exhibition 鈥.鈥
But this kind of detective work wasn鈥檛 easy, and the team had to build their own tool to power their investigation.
鈥淲e learned early on that there wasn鈥檛 a good resource to help us match distinctive type in an anonymously printed work with type that we know came from a particular printer or print shop,鈥 Warren said. 鈥淪o for three years, the NEH provided funding so that we could produce the CDT, along with other tools and methods to aid in this research.鈥
The process, Warren explained, could be described as 鈥渂uilding the haystack.鈥 The team used optical character recognition to sort and segment different types of characters, inspect them, and determine which were the most distinctive images. Then, once the data was compiled, scholars could begin searching for needles in the haystack: damaged printing type that matched a known printer.
鈥淭his is an entirely new kind of evidence,鈥 Warren added. 鈥淚t鈥檚 rare to encounter something at this scale and magnitude that can answer the questions we have about 17th century print culture.鈥
From Research Project to Public Resource
But building the catalog was only part of the challenge. Making it publicly accessible 鈥 and ensuring it would remain available long-term 鈥 required infrastructure, planning and institutional support. That鈥檚 where the Libraries played a significant role.
The Libraries hosts the CDT site, providing server space and technical coordination to ensure it is stable and accessible. The work also goes beyond hosting 鈥 through its digital publishing services, the Libraries helped formalize how the project will be maintained over time, including agreements around updates and preservation.
鈥淚t鈥檚 a real credit to the Libraries that they are willing to take on the challenge of hosting a project like this,鈥 Warren said. 鈥淭he resource couldn鈥檛 exist in anything other than digital form, and the fact that anyone with an internet connection can access it is enormously valuable for researchers.鈥
The CDT was designed primarily for researchers working in bibliography, book history, and early modern studies 鈥 a relatively small but deeply engaged scholarly community. But the team hopes the project鈥檚 impact will extend far beyond its immediate users. In addition to making the catalog openly accessible online, the project team has also , allowing other scholars to adapt the methods for different time periods, languages and regions. What began as a tool for studying 17th-century London printing could eventually help researchers investigate print histories across cultures and centuries.
鈥淢uch of what's preserved in Special Collections remains effectively invisible to machine learning tools because it has rarely been digitized or otherwise packaged for this kind of inquiry,鈥 Lemley explained. 鈥淲e鈥檙e at a moment when these tools can help scholars discover patterns and evidence in historical materials that were previously crushingly difficult to see.鈥
Even though the CDT is now available, the Print & Probability project continues. In December 2025, the group from Schmidt Sciences to further explore how AI technology can be applied to the research.
鈥淧rint & Probability is a great example of how machine learning and AI can transform how scholarship in the humanities is practiced,鈥 Lemley said. 鈥淚t鈥檚 also a great example of how Libraries and Special Collections can be valuable partners in that kind of research.鈥