Why don’t archivists digitize everything?

20170529_124550[1]


Today on the blog we’re tackling one of our most frequently asked questions: “Why don’t you digitize everything?” and its related runner-up, “When will you be putting all your records on the web?”

As archivists we like these questions because they tell us that people are eager for access to archival records. They also show that people realize that not everything is digitized. Indeed only a tiny fraction of the world’s primary resources are available digitally. This doesn’t mean that undigitized records are inaccessible or not worth consulting, but you will need to visit us archivists to use them.

In fact, archivists and librarians themselves are behind the abundance of primary sources already available on the internet. From rare books to official records and from diaries to sound recordings, digitized resources have spread the word (literally) that the past informs our present and our future. In the meantime, both non-profit and commercial organizations whose main mission includes digitizing material (like the Internet Archive, or Ancestry.com) have raised public expectations about access to historical resources.

In this post we’ll share some of the behind-the-scenes realities of digitizing and uploading rare materials. We hope this boosts awareness about some important facets of document digitization and sharing. One is the vast army of largely anonymous labourers out there whose work makes these valuable resources available. Another is the existence of the original records behind the images, which archivists continue to steward.

We also hope that people who are informed about digitization will advocate for archives in the opportunities and challenges they face.

But first, a basic question.

Why do archivists digitize records?

It’s important to understand what digitization can and can’t do. A common assumption is that digitization preserves analogue (non-digital) archival records. In some cases – say, when the record is in imminent danger of becoming unusable – this is true, in a way. Think about a paper map disintegrating into fragments, a letter faded almost to illegibility, or a cassette tape turning brittle and unplayable.  In such cases digitization – the production of an electronic image of these records – saves information gleaned from the record.  But it doesn’t produce a clone of the record (more on this later). At best it results in a digital “surrogate,” an approximation (even if a very good one) of a dimension of the record.

Archivists commonly digitize records to facilitate access. Easily copied electronic files help people consult records at a distance in multiple locations. Of course, consulting digital files instead of originals also aids preservation by sparing originals from repeated physical handling – a vital function that was once (and still is) served by microfilming records.

silvering

Silvering is a reflective sheen that can gradually develop on certain types of old photographs. This effect can make digitization difficult as these four attempts show. (George W. Gordon family fonds, Photographs, Region of Peel of Archives)

What’s special about digitization in archives?

Archivists will often say that mass digitization in particular is costly, both in money and time (which is also money). Sometimes people are skeptical about this. After all, it’s so easy to take a picture of your high school yearbook and share it on Facebook, or to throw some old postcards on a scanner and upload them to a blog.

Below is an overview of some of the factors archivists deal with in digitizing records. Here we’ll concentrate on two-dimensional archival records like paper documents and photographs. Some of these challenges relate to the complexity of the material itself; others are due to the digitization process. All show that large-scale digitization in an institutional setting is not your average home scanning operation. And the challenges for analogue media like old sound recordings or film are even more acute (for one thing, it’s getting increasingly difficult to find equipment that will play old media).

Dealing with volume

As you read on, keep in mind the vast amount of material held by archives. Even a modestly-sized archival institution measures its holdings in kilometres of shelf space. The boxes on these shelves can variously hold between 700 to 1800 individual pieces of paper and even more photographs, negatives, and slides. Digitizing even a small fonds (a type of archival collection) is a big commitment.

Dealing with dimension

Many archival record groups are not easy to scan quickly. The fastest way of scanning a stack of pages is with an automatic feeder; but feeders only work with same-sized pages in good condition. Even then, the benefits of speed have to be weighed against the risk of a one-of-a-kind document being mangled by a paper jam.

For unique or fragile records (and most archival records fall under one of these headings), manual scanning is one of the only responsible options. For each scanned item, there can be dozens of associated tasks, from removing staples and positioning the item, to processing images and  entering metadata (see below). That adds up to a lot of work: scanning a single archival box of records can take days.

2017-800px-opt

This individual file of interrelated government records contains a surprisingly varied mixture of sizes, shapes, and formats. This diversity makes digitization of the file more challenging. (Toronto Township fonds, Planning Department project files, Region of Peel of Archives)

If the records in a file are various sizes and shapes, constant readjustments to scanning parameters add even more time.  If items are really large, they may have to be scanned in sections and digitally stitched together.

Sometimes the best option is to take a photograph instead, which then necessitates a high-quality photographic set-up including lighting, document holders, and a camera with an appropriate lens. Items that are torn, wrinkled, thick, or reflective will also require skilled handling and digital manipulation.

20170530_153347[1]

The information written on this slide won’t be captured simply by digitizing the slide image.

Capturing physical evidence

We mentioned above that scanning doesn’t produce an exact copy of a record but only an impression of certain aspects of it.  As archives researchers often find out, records speak to us in ways that go beyond their original intended use.

For example, annotations accumulated, say, in the margins of reports or on the backs of photographs often provide essential or at least illuminating information. It’s important to think about whether (and how) these additions should be digitally captured. Physical characteristics like thickness and type of paper, marks of wear, and enclosures also can be “read” for evidence of the past, but are even harder to convey in a digital file.

Capturing context

And this brings us to the most under-recognized aspect of digitally capturing an archival record: linking its digital image to crucial information which tells us what it is. We call this information “metadata” (data about the data). Some metadata is technical information about the digital capture. Other metadata is part of the archival description of the record itself.

20170523_115724

Sticky notes make digitizing modern records very difficult. The notes may contain important information linked to other information beneath them.

Some archival metadata is short and sweet such as the date a record was created. Other information is more complex, such as the story of the person or organization that created it. Most complex of all is a description of the place the records occupy in nested groups of records.

Archivist Laura Millar gives us a great example of how context can turn the most mundane slip of paper into an important source of insight. She points out that a single sticky note scrawled with the words “Meet Joe” tells us so little that by itself, divorced from any other context, it would have little value. But if the sticky note is attached to a page in the day timer of Barack Obama’s secretary a few weeks prior to Obama’s announcement of Joe Biden as his running mate, the same simple note reveals an important moment in the history of the US government. [i]  (Incidentally, sticky notes are a digitization nightmare.)

An individual record within an archival collection does not tell us its whole story. And here lies the part of digitization that many people never see: the prior work of organizing and documenting collections to make them intelligible and searchable in the first place. Without this vital descriptive work the electronic files produced by digitization would be little more than an undifferentiated and unusable mass of thousands of files.

For other examples of the importance of context and how archivists describe it, check out our posts How do archivists organize records? and How do archivists describe records?

shadows

This bylaw was digitized with a folded corner; now essential information is missing from its digital surrogate. (Township of Caledon, Bylaws)

Ensuring quality

Because digitization involves an investment of time and resources, we need to make sure we get it right – that the electronic files we produce are adequately representing the archival originals. That means our process will need to incorporate quality control checks.

Quality results depend on a host of factors from scanning resolutions to photographic skill to typing accuracy.  Quality isn’t just a matter of aesthetics. As archivists, we’re responsible for making sure that people are getting a reliable and authentic view of records. Important decisions may (and often do) rely on the information found in them.

Maintaining digital files

It’s tempting to think of digitization as a fix-and-forget proposition: that once information is captured digitally, it’s automatically pinned down for the long-term. It isn’t. And this means digitization presents archivists with a new set of files to maintain.

Because stored digital files are in a way intangible – huge amounts can fit on tiny thumb drives and multiple identical copies can be piped through electrical wires – it’s easy to think of them as non-physical and incorruptible. In fact, digital files are physical states of physical things, and they’re subject to decay and disorder just like their analogue counterparts. Digital data fundamentally exists as millions of minute magnetic or electric charges. A tiny shift on a subatomic level can cause a cascade of errors. Even data just sitting unused on a drive is subject to “bit rot,” random degradation over time.

Besides the problem of data degradation, archivists also have to think about the future readability of current file formats. There’s no point in investing a lot of time in digitizing a body of material if no one will be able to open the files as software and hardware inevitably becomes obsolete.

drives collage

The 19th-century letters on the left have already lasted over 170 years and will outlive all of the hardware pictured on the right. As the letters sit quietly in their archival boxes, images of them stored on this hardware will need comparatively frequent checks and migration to survive a decade. (Left, Magrath family fonds, Correspondence)

Archivists themselves are on the forefront of pushing the boundaries of digital longevity. Technologies that can neutralize errors are improving; agreed-on standards for file formats are being developed. And refreshing, migrating, and copying digital data can help protect it. Still, the average lifespan of a hard or flash drive is still a fraction of that of a piece of paper stored in optimal conditions (and incidentally, digital media have temperature and humidity requirements too).

So when archivists digitize anything they commit to maintaining that file as well as the original on which it’s based. This labour needs to be factored in to decisions. (For all the reasons outlined above, archivists seldom use digitization alone as a reason for disposing of originals.)

internet archive - digital artifact

This digital file available on the Internet Archive appears to be corrupted.

Marshalling resources

Digitization depends on significant quantities of technical equipment and human labour.

High-resolution scanners and cameras that can adequately capture large materials or negative images are very expensive. Image processing software can also be costly, as can adequate secure digital storage.

2017-800px-rep

Scanning negatives at the Peel Archives at PAMA.

To make a dent in an average archival collection, a scanner (or several of them) needs to be working every day, all day, sometimes for months – and often that scanner is also needed for everyday operations. Some large archives maintain digitization units staffed by specialists. Archivists at smaller institutions fit digitization in where they can amid their other duties. This is why digitization of record groups is often conducted as discrete projects funded by grants or partnerships.

In this post we’re aware that we’re representing archivists in general, so we feel it’s also appropriate to point out a reality about staffing in the 21st century. Around the world today the number of staff in many archival institutions is often no greater (and is sometimes smaller) than in the pre-digital era. This means archivists need to carefully manage their limited resources while taking care of perennial core tasks like accessioning and processing records, and helping researchers. And as populations increase, the number of incoming records continues to increase exponentially.

Sharing responsibly

Even after a set of records is digitized, responsibly sharing them on the web also calls for a process and resources.

First, archivists have to make sure that they are free to share the records in the first place. Some donors of archival records don’t want them to be available for a certain period of time; other records (such as those the government keeps about you) are kept private for legislated periods; and sensitive information about still-living people might be tucked away in personal papers. Copyright (ownership of the intellectual property in the records) may also prohibit widespread sharing.

We’ve already seen that the full meaning of records isn’t necessarily available in the image of those records. It has to be provided by archivists and linked to that image. That information will also need to be conveyed to users on the internet, so the archivist will have to arrange for software or online platforms that can manage this function, and for internet servers to mediate it.

wires

Digitization as a process

Given the factors we’ve outlined, it’s no wonder that archivists approach digitization projects methodically. Rather than running into unexpected problems with any of these challenges (from removing paper clips to clearing copyright), we usually assess an archival collection beforehand to see whether it’s a good candidate for digitization and sharing. Of course this process also takes time, so even if we can mobilize an inexpensive pool of labour, digitization is still a big investment of time and resources.

We hope this overview helps explain why digitization within archival institutions proceeds the way it does – and why we may never, in fact, digitize everything. The triumphs and trials of digitization are themselves a constantly unfolding process. New models are being explored and old ones reconsidered. Nevertheless, access is important to archivists, so digitization is too. You can be sure that archivists in institutions large and small will continue to grapple with this immensely powerful way to broadcast the knowledge we steward.

How can you help?

Here are some ways everyone can help support digitization.

1) Whenever you share a photograph from an institutional collection, share the information that goes with it. Cite the archives, library, museum, or gallery that takes care of the original item (as well as the collection it comes from, if you know it). By doing so you’re revealing a layer of labour that is otherwise hidden: you’re spreading the word that there are people behind such images who have saved the originals for the future and made copies available. You’re also passing along an essential part of images that will help viewers understand them.

2) Be curious about what archivists, information professionals, and cultural workers do. It’s usually more than people know. Ask questions and spread the answers around. The more people understand the value of our work, the more support we’ll get and the more support we’ll be able to provide.

[i] Laura Millar, Archives: Principles and Practices (New York:  Neal-Schumann, 2010), p. 7

by Samantha Thompson, Archivist

All photographs by Peel Archives except image of Wikimedia Foundation server.

28 responses to “Why don’t archivists digitize everything?

  1. Pingback: Peel Art Gallery: Why Don’t Archivists Digitize Everything? | ResearchBuzz: Firehose·

  2. Excellent explanation. Should be shared as counterpoint to recent CBC news piece about Canadian Government Secret Archives. Well done PAMA.

    Liked by 1 person

    • Thanks for spreading the word, Iona, we appreciate it. This post was already “in production” as the debate hit but we too thought it would address some of the issues that arose.

      Like

  3. Pingback: An archivist explains why archivists don’t digitize everything | Genealogy à la carte·

  4. Pingback: NatSCA Digital Digest – June | NatSCA·

  5. Pingback: Omni Magazine, ProPublica, GMail, More: Thursday Afternoon Buzz, June 1, 2017 – ResearchBuzz·

  6. Pingback: Why don’t archivists digitize everything? | LibraryLady1000·

  7. Pingback: Historical Highlights 092·

  8. Re:sharing responsibly
    Sometimes digital images are not shared on the web because for-profit image services take free images and sell them, without permission or correct source attribution and without passing any of that profit on to the institution that did all the work in the first place.

    Liked by 1 person

    • This is true. Every archives has to make its own tough decisions about when or how to recoup the high costs of digitization. We’ve had many debates about whether, having freely shared a public domain image, we have any say in what happens to a copy of it. Attribution is good practice but not a legal requirement for public domain images unless you can make it part of a contract. For now it certainly seems that if we don’t want an image to be used commercially it’s best not to put it on online at all. As you know, some institutions attempt to control use by making low res images freely available but not high res (these come with a fee which helps support the institution). Thanks for commenting, this is an important point.

      Liked by 1 person

  9. Wonderful explanation and a “behind the curtain” look at the larger picture. Also an answer to the frequent question of why an archives collections aren’t readily available online in this digital age.

    Like

    • Thank you for reading; we do hope it’s useful for archivists, researchers, and supporters, each in their own way.

      Like

  10. Thank you for this careful, thorough description. I have had to answer similar questions for years and after reading this I am glad to discover, I was in accord with what you wrote. I always told people that there are no icebergs the above waterline portion of which are small enough to describe what has been digitized to what has not.

    Like

    • The iceberg metaphor is a great one. We too hope researchers understand the as-yet-undigitized riches that are nevertheless available to use. Thank you for taking the time to comment.

      Like

  11. Pingback: Reading Today – John Dewees·

  12. Thank you for this interesting and very complete article. Sometimes I share interesting articles on facebook, I shared yours and some people who aren’t archivist asked me some questions about it ! It was nice to see that this subject interest people.

    Like

  13. Pingback: Canadian History Roundup – Weeks of May 28, 2017 and June 4, 2017 | Unwritten Histories·

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s