Skip to Main Content

Digital Humanities: Technology

Support for researchers using the Library's collections with Digital Humanities techniques.

Text and data mining

Some of the Library's collections offer a search interface that include more advanced text searching tools. These may include

  • proximity searching (when two words occur relatively close together).
  • clustering similar articles together in a search.
  • fuzzy matching (allowing for spelling, typing or printing variance).

The following links may be useful for those looking to conduct text and data mining on content that has been downloaded or is otherwise available on a hard drive. See also the Collections page of this guide for which collections are most suitable.

USC Libraries' Content Mining Research Guide

This guide provides information about text mining resources and tools at the University of Southern California and whether or not their subscription databases support content mining. The details are similar for the resources at The University of Manchester Library. 

Data analysis packages

There are many packages to consider for containing your texts for analysis. The list below includes some of the packages available to researchers at The University of Manchester. Some may already be installed on your PC, otherwise you will need to find them in the Software Centre icon on your desktop or from IT Services.

Creating your own database

If an existing data analysis package is not appropriate for your needs, you may need to create your own database. For example, if you are using a collection of XML files which contain the full-text content for news articles, and you make your own interpretations of each text, you will need to store these in a sensible way.

It is not possible to discuss the many considerations in this guide, but some of the desktop tools commonly used are Microsoft Excel (or other spreadsheet package), or databases such as MySQL (relational), MongoDB (document), Neo4j (graph) and Microsoft Access.

Programming

If you need to perform a difficult or repetitive task on your data, or if you want to interpret data designed for use by machines like XML, it might be necessary to write a script. Some common scripting and programming languages for Digital Humanities are Python, R and Javascript.

The following links may be useful when considering programming:

Geographic Information Systems (GIS)

You may have spatial or geographic data, for example a country of publishing for each text in your collection, or your collection may consist of maps. You may be able to visualise your content using an advanced tool such as ArcGIS or the free QGIS.

Links to support resources for Geographic Information Systems (GIS) are listed below.

Visualisation

Increasingly, publishers are including visualisation tools directly in the web platforms used to access library databases, such as charts showing the rise and fall of key terms with publication date, or topic modelling. 

Programming languages like Python and R have libraries which can be utilised for displaying the data you have collected in interesting and informative ways. 

Other tools exist with the primary purpose of visualising data, like Tableau and Gephi.

Linked Data

Linked Data is about publishing structured data on the Web that can be interlinked and become more useful by semantic queries. More specifically, Wikipedia defines Linked Data as "a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF".

As library collections emerge that offer support for Linked Data, they will be listed below.

Crowd sourcing

Citizen science projects use the efforts and ability of volunteers to help scientists and researchers deal with the flood of data that confronts them.

The best known example is Zooniverse, which offers a tool to build your own crowd-sourced project. The British Library's crowd sourced projects are part of the LibCrowds platform.

Computer vision

Computer vision is a term which covers automated extraction, analysis and understanding of useful information from one or more images.

Some examples of tools used or developed by groups such as the University of Oxford Visual Geometry Group follow.

Creative Commons Licence This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International Licence.