EBSCO and Digital Humanities: Data and Text Mining
As technology continues to play an ever more central role in both education and research, we are witnessing innovations in the realm of information retrieval and analysis that were previously thought impossible. Increasingly, humanities scholars are embracing the ideas and capabilities of “big data,” and are using new data analysis tools for their own research. Large research projects that might have seemed impossible 10 or 20 years ago are now feasible for those with the right resources and technological know-how.
A small but growing number of academics are leveraging technology for their work in the relatively new field of “digital humanities.” Broadly defined, the digital humanities movement seeks to expand the intellectual discourse and discovery by introducing systematic research techniques into academic fields like literature and history, which previously had little use for scientific methodologies. These technologies are enabling the humanities to embrace quantitative analysis; and there is a lot of data as well as exciting new ways to approach it.
For the most part, digital humanists are focusing on two main approaches: data mining and text mining.
Data mining, as the name suggests, is the extraction of large amounts of data from existing fields of metadata, such as the author title or place of publication. This usually involves obtaining information from bibliographic records (Dublin Core, MODS and MARC); however, other associated metadata can also be analyzed using data mining techniques such as abstracts or indexes.
At EBSCO, the enhanced metadata we have created for our Digital Archives products is ideal for data mining. It allows scholars to analyze large groups of manuscript and printed archival sources, in turn allowing a keen researcher to uncover broader patterns of intellectual thought and change over time.
For instance, the graphics below pull data from the American Antiquarian Society Periodicals Series to show the geographical concentration of magazine publications. The data was extracted from the bibliographic records of the American Antiquarian Society and plotted on an 1865 county map of the United States.
Not only can digital humanists mine information from standard bibliographic information, but EBSCO’s internal metadata provides unique possibilities for research as well.
For example, the Civil War Primary Source Documents collection highlights how data can be mined from EBSCO’s enhanced metadata. This data is so robust that scholars can build a virtual 19th-century social network, revealing who was writing letters to whom (and how often) over time. Before, this is something that would have taken years to do — not to mention the countless hours of labor — but it can now be accomplished in a matter of minutes:
Text mining is another approach used by scholars working in the digital humanities. But unlike data mining, which looks at information about specific documents, text mining analyzes the words and images within the documents themselves. Whereas a scholar interested in data mining a magazine will examine the metadata (the title, editor, year and place of publication, and so on), one who wants to text mine will examine the entire content of that magazine.
Within the digital humanities, much of this work has traditionally focused on topic modeling or word counts with the output being either word clouds or Ngram charts.
Increasingly, scholars working in the digital humanities are redefining what role technology will play in their work. Their innovative technological approaches are spurring us here at EBSCO to reconsider how we look at our products and deliver our data sets.
You can find a lot more information on EBSCO’s Historical Digital Archives on the website.
CommentsCommenting is not available in this channel entry.
** We welcome your feedback on the above article. Please note that all comments will be reviewed by the moderator. Comments with offensive language or otherwise deemed inappropriate by the moderator will not be published. Thank you.