Webinar Recap: It’s All About That Data

Technology Workflow | June 17, 2020

Arkivum recently presented a webinar on digital preservation. Read about topics discussed, including data management automation and how to streamline discovery and re-use of assets.

In April, EBSCO and Arkivum, provider of digital preservation and archiving solutions, hosted an informational webinar about digital preservation. The presenter was Sinéad McKeown, Vice President of Product Management at Arkivum. Sinead has more than 20 years of experience bringing innovative data-centric software solutions to life.

Sinéad laid out the tasks involved in planning for long-term data use and reuse, including tips for making the process effective and more affordable. She walked through and clarified the FAIR data principles. Watch the webinar replay for more details and useful resources to assist with preserving your data and collections.  

Keep reading to see what librarians asked during this session.

Q: Is there an advantage of using an archival system that maintains your data (cloud) versus maintaining your own data?

A: There are many advantages such as the costs of having to maintain and support the infrastructure in-house for the long term. Additionally, with a multi-tenanted cloud deployment you can benefit from elastic processing capabilities so you can have more resources to process data as you need it.

Q: Can you spell the name for that metadata extraction tool you mentioned during your presentation?

A: Apache Tika is embedded in the Perpetua system. Other metadata extraction tools used in Perpetua for documenting AIPs (Archival Information Package) include ffprobe, ExifTool, mediaInfo, Sleuthkit, and bulk_extractor.

Q: How does "original" in ALCOA apply for born-digital content?

A: Original refers to a true copy of the content which is maintained throughout the retention period — it should be in the original format, and not something that is transcribed at a later date, regardless of whether on paper or in a digital system.

Q: How are AI and machine learning techniques playing a role in data preservation?

A: There is a lot of scope for artificial intelligence to be used in digital preservation, but for the most part, it’s still largely a research topic. The main areas where artificial intelligence and machine learning will become relevant include metadata extraction and storage calculations (for example, gauging the need for access based on previous behavior for similar content).

Q: What value do you ascribe to crowdsourced metadata for digital objects?

A: Crowdsourcing metadata can be a very scalable and cost-effective way of harnessing a user community, especially for public access archives. The challenge is how crowdsourcing of metadata can deliver high-quality metadata that can be trusted and is consistent. It may not be possible to meet the same standards and levels of quality through crowdsourcing that can be achieved through expert curators. Some form of quality control may be needed, which could also be crowdsourced, or other mechanisms used to promote good metadata such as the use of rewards/points/badges. Crowdsourced metadata can have a lot of value, but it will often complement rather than replace the need for cataloguers and curators.

Q: For archiving big data, which solution is best and reliable?

A: For big data, the answer really depends on numerous things such as the size and types of files (is it large files, e.g. long video content, or is it many small files, e.g. pdf content?), do they require preservation, what are the access needs, etc. Perpetua can handle many types of configurations.

Q: What did you mean with the suggestion to create metadata within the same system in which the data was created?

A: Many systems where data is collected and used for either initial or ongoing access will often have the capability to curate the data (e.g. institutional repository systems such as Hyku, Samvera, etc. or eTMF systems for clinical trials). It’s best if the data can be curated as close to possible to the data creation so if data is being generated in these systems, then it’s good to capture that metadata as soon as you can. It can then be automatically included during the ingest process to the long-term data management system. However, in many other cases this isn’t possible or practical with the Perpetua system. In addition to capturing at the point of entry, we allow metadata updates for individual data entries and bulk updates so all data can be fully curated as and when the details are available.

Q: What are some information retrieval tools that can be used for archived data?

A: Retrieval is available through EBSCO Discovery Service™, FOLIO, Stacks, Access to Memory (AtoM), etc. All of these depend on metadata, which is why a range of techniques are needed (e.g. bulk import of metadata from catalogues, metadata extraction from files, curation and tagging by users).

Q: I manage two databases, one (mostly) consists of born digital work, through Digital Commons, and the other, digitized archival material with Access to Memory through Artifactual. Would these databases be considered good long-term storage systems?

A: Both solutions are good at what they do; for example, allowing content to be organized and shared online. However, neither has built-in support for robust data safeguarding (e.g. multiple copies of data in different geographic locations with fixity checks) or digital preservation (e.g. creating extra versions of content in preservation formats). Therefore, both should be augmented by using them in conjunction with solutions such as Perpetua, so content is safely stored and preserved, as well as made accessible. Perpetua already integrates with AtoM.

Q: What if we must preserve data but we are not “technical people?” Which platforms should we use?

A: With Perpetua, we have enabled the configuration and best practices for most common file formats as part of the automated, out-of-the-box workflow, so you don’t need to be technical to understand the working or configuration. As part of your ongoing managed service, we keep a close eye out for any new community recommendations so we can guide you with future planning.

Q: Is EZ proxy able to access digital contents preserved in server?

A:  Perpetua supports SAML, OIDC and other standards so it can be integrated into federated identity management environments. OpenAthens is one example of this.

Q: How can we preserve the data with fewer resources?

A: It’s less about having fewer resources and more about enabling those resources to use their valuable time in other areas and help reduce the more labor-intensive efforts usually taken up as manual tasks. As we discussed in the webinar, this can be done in several ways with automation (e.g. the automated preservation workflow covering the majority of common file formats, integrations making data flow seamlessly with no human intervention needed). Many of these will reduce the drudge work for users and allow them to focus on other tasks which still require intervention.

Q: Do you have a linkable source for ALCOA+? All I can find online are articles/blog posts about the guidelines.

A: https://www.fda.gov/regulatory-information/search-fda-guidance-documents/data-integrity-and-compliance-drug-cgmp-questions-and-answers-guidance-industry

Q: I work in a museum and am new to data management. We use a combination of physical backups as well as Extensis Portfolio. Do you know if there are aspects that are missing in Extensis and which direction I should start looking in?

A: Extensis provides a Digital Asset Management (DAM) solution, we can’t answer whether any functionality is missing within the system, but Perpetua does integrate with other DAM solutions to provide long-term data management. However, backups are very different to a long-term data management solution as you will have seen in the webinar,( e.g. physical storage corruption, file formats go out of date). You may want to check on your backup policies, preservation needs, etc.

Q: What is your opinion about Dspace for RDM?

A: I’m by no means a DSpace expert. It is a good tool used by several of our customers. Perpetua has an out-of-the-box integration available to provide long-term data management and preservation capabilities.

