« Back to Archives

EBSCO and Digital Humanities: Data and Text Mining

Posted June 23, 2015 in Insights & Research

Tags: |

As technology continues to play an ever more central role in both education and research, we are witnessing innovations in the realm of information retrieval and analysis that were previously thought impossible. Increasingly, humanities scholars are embracing the ideas and capabilities of “big data,” and are using new data analysis tools for their own research. Large research projects that might have seemed impossible 10 or 20 years ago are now feasible for those with the right resources and technological know-how.

A small but growing number of academics are leveraging technology for their work in the relatively new field of “digital humanities.” Broadly defined, the digital humanities movement seeks to expand the intellectual discourse and discovery by introducing systematic research techniques into academic fields like literature and history, which previously had little use for scientific methodologies. These technologies are enabling the humanities to embrace quantitative analysis; and there is a lot of data as well as exciting new ways to approach it.

For the most part, digital humanists are focusing on two main approaches: data mining and text mining.

Data Mining

Data mining, as the name suggests, is the extraction of large amounts of data from existing fields of metadata, such as the author title or place of publication. This usually involves obtaining information from bibliographic records (Dublin Core, MODS and MARC); however, other associated metadata can also be analyzed using data mining techniques such as abstracts or indexes.

At EBSCO, the enhanced metadata we have created for our Digital Archives products is ideal for data mining. It allows scholars to analyze large groups of manuscript and printed archival sources, in turn allowing a keen researcher to uncover broader patterns of intellectual thought and change over time.

For instance, the graphics below pull data from the American Antiquarian Society Periodicals Series to show the geographical concentration of magazine publications. The data was extracted from the bibliographic records of the American Antiquarian Society and plotted on an 1865 county map of the United States.


Figure 1: AAS Periodical Publications


Figure 2: Southern New England Periodicals

Not only can digital humanists mine information from standard bibliographic information, but EBSCO’s internal metadata provides unique possibilities for research as well.

For example, the Civil War Primary Source Documents collection highlights how data can be mined from EBSCO’s enhanced metadata. This data is so robust that scholars can build a virtual 19th-century social network, revealing who was writing letters to whom (and how often) over time. Before, this is something that would have taken years to do — not to mention the countless hours of labor — but it can now be accomplished in a matter of minutes:


Figure 3: Civil War Primary Source Documents Social Network

Text Mining

Text mining is another approach used by scholars working in the digital humanities. But unlike data mining, which looks at information about specific documents, text mining analyzes the words and images within the documents themselves. Whereas a scholar interested in data mining a magazine will examine the metadata (the title, editor, year and place of publication, and so on), one who wants to text mine will examine the entire content of that magazine.

Within the digital humanities, much of this work has traditionally focused on topic modeling or word counts with the output being either word clouds or Ngram charts.

Increasingly, scholars working in the digital humanities are redefining what role technology will play in their work. Their innovative technological approaches are spurring us here at EBSCO to reconsider how we look at our products and deliver our data sets.

You can find a lot more information on EBSCO’s Historical Digital Archives on the website.


Other EBSCO Sites +