Why don’t archivists digitize everything?

20170529_124550[1]


Today on the blog we’re tackling one of our most frequently asked questions: “Why don’t you digitize everything?” and its related runner-up, “When will you be putting all your records on the web?”

As archivists we like these questions because they tell us that people are eager for access to archival records. They also show that people realize that not everything is digitized. Indeed only a tiny fraction of the world’s primary resources are available digitally. This doesn’t mean that undigitized records are inaccessible or not worth consulting, but you will need to visit us archivists to use them.

In fact, archivists and librarians themselves are behind the abundance of primary sources already available on the internet. From rare books to official records and from diaries to sound recordings, digitized resources have spread the word (literally) that the past informs our present and our future. In the meantime, both non-profit and commercial organizations whose main mission includes digitizing material (like the Internet Archive, or Ancestry.com) have raised public expectations about access to historical resources.

In this post we’ll share some of the behind-the-scenes realities of digitizing and uploading rare materials. We hope this boosts awareness about some important facets of document digitization and sharing. One is the vast army of largely anonymous labourers out there whose work makes these valuable resources available. Another is the existence of the original records behind the images, which archivists continue to steward.

We also hope that people who are informed about digitization will advocate for archives in the opportunities and challenges they face.

But first, a basic question.

Why do archivists digitize records?

It’s important to understand what digitization can and can’t do. A common assumption is that digitization preserves analogue (non-digital) archival records. In some cases – say, when the record is in imminent danger of becoming unusable – this is true, in a way. Think about a paper map disintegrating into fragments, a letter faded almost to illegibility, or a cassette tape turning brittle and unplayable.  In such cases digitization – the production of an electronic image of these records – saves information gleaned from the record.  But it doesn’t produce a clone of the record (more on this later). At best it results in a digital “surrogate,” an approximation (even if a very good one) of a dimension of the record.

Archivists commonly digitize records to facilitate access. Easily copied electronic files help people consult records at a distance in multiple locations. Of course, consulting digital files instead of originals also aids preservation by sparing originals from repeated physical handling – a vital function that was once (and still is) served by microfilming records.

silvering

Silvering is a reflective sheen that can gradually develop on certain types of old photographs. This effect can make digitization difficult as these four attempts show. (George W. Gordon family fonds, Photographs, Region of Peel of Archives)

What’s special about digitization in archives?

Archivists will often say that mass digitization in particular is costly, both in money and time (which is also money). Sometimes people are skeptical about this. After all, it’s so easy to take a picture of your high school yearbook and share it on Facebook, or to throw some old postcards on a scanner and upload them to a blog.

Below is an overview of some of the factors archivists deal with in digitizing records. Here we’ll concentrate on two-dimensional archival records like paper documents and photographs. Some of these challenges relate to the complexity of the material itself; others are due to the digitization process. All show that large-scale digitization in an institutional setting is not your average home scanning operation. And the challenges for analogue media like old sound recordings or film are even more acute (for one thing, it’s getting increasingly difficult to find equipment that will play old media).

Dealing with volume

As you read on, keep in mind the vast amount of material held by archives. Even a modestly-sized archival institution measures its holdings in kilometres of shelf space. The boxes on these shelves can variously hold between 700 to 1800 individual pieces of paper and even more photographs, negatives, and slides. Digitizing even a small fonds (a type of archival collection) is a big commitment.

Dealing with dimension

Many archival record groups are not easy to scan quickly. The fastest way of scanning a stack of pages is with an automatic feeder; but feeders only work with same-sized pages in good condition. Even then, the benefits of speed have to be weighed against the risk of a one-of-a-kind document being mangled by a paper jam.

For unique or fragile records (and most archival records fall under one of these headings), manual scanning is one of the only responsible options. For each scanned item, there can be dozens of associated tasks, from removing staples and positioning the item, to processing images and  entering metadata (see below). That adds up to a lot of work: scanning a single archival box of records can take days.

2017-800px-opt

This individual file of interrelated government records contains a surprisingly varied mixture of sizes, shapes, and formats. This diversity makes digitization of the file more challenging. (Toronto Township fonds, Planning Department project files, Region of Peel of Archives)

If the records in a file are various sizes and shapes, constant readjustments to scanning parameters add even more time.  If items are really large, they may have to be scanned in sections and digitally stitched together.

Sometimes the best option is to take a photograph instead, which then necessitates a high-quality photographic set-up including lighting, document holders, and a camera with an appropriate lens. Items that are torn, wrinkled, thick, or reflective will also require skilled handling and digital manipulation.

20170530_153347[1]

The information written on this slide won’t be captured simply by digitizing the slide image.

Capturing physical evidence

We mentioned above that scanning doesn’t produce an exact copy of a record but only an impression of certain aspects of it.  As archives researchers often find out, records speak to us in ways that go beyond their original intended use.

For example, annotations accumulated, say, in the margins of reports or on the backs of photographs often provide essential or at least illuminating information. It’s important to think about whether (and how) these additions should be digitally captured. Physical characteristics like thickness and type of paper, marks of wear, and enclosures also can be “read” for evidence of the past, but are even harder to convey in a digital file.

Capturing context

And this brings us to the most under-recognized aspect of digitally capturing an archival record: linking its digital image to crucial information which tells us what it is. We call this information “metadata” (data about the data). Some metadata is technical information about the digital capture. Other metadata is part of the archival description of the record itself.

20170523_115724

Sticky notes make digitizing modern records very difficult. The notes may contain important information linked to other information beneath them.

Some archival metadata is short and sweet such as the date a record was created. Other information is more complex, such as the story of the person or organization that created it. Most complex of all is a description of the place the records occupy in nested groups of records.

Archivist Laura Millar gives us a great example of how context can turn the most mundane slip of paper into an important source of insight. She points out that a single sticky note scrawled with the words “Meet Joe” tells us so little that by itself, divorced from any other context, it would have little value. But if the sticky note is attached to a page in the day timer of Barack Obama’s secretary a few weeks prior to Obama’s announcement of Joe Biden as his running mate, the same simple note reveals an important moment in the history of the US government. [i]  (Incidentally, sticky notes are a digitization nightmare.)

An individual record within an archival collection does not tell us its whole story. And here lies the part of digitization that many people never see: the prior work of organizing and documenting collections to make them intelligible and searchable in the first place. Without this vital descriptive work the electronic files produced by digitization would be little more than an undifferentiated and unusable mass of thousands of files.

For other examples of the importance of context and how archivists describe it, check out our posts How do archivists organize records? and How do archivists describe records?

shadows

This bylaw was digitized with a folded corner; now essential information is missing from its digital surrogate. (Township of Caledon, Bylaws)

Ensuring quality

Because digitization involves an investment of time and resources, we need to make sure we get it right – that the electronic files we produce are adequately representing the archival originals. That means our process will need to incorporate quality control checks.

Quality results depend on a host of factors from scanning resolutions to photographic skill to typing accuracy.  Quality isn’t just a matter of aesthetics. As archivists, we’re responsible for making sure that people are getting a reliable and authentic view of records. Important decisions may (and often do) rely on the information found in them.

Maintaining digital files

It’s tempting to think of digitization as a fix-and-forget proposition: that once information is captured digitally, it’s automatically pinned down for the long-term. It isn’t. And this means digitization presents archivists with a new set of files to maintain.

Because stored digital files are in a way intangible – huge amounts can fit on tiny thumb drives and multiple identical copies can be piped through electrical wires – it’s easy to think of them as non-physical and incorruptible. In fact, digital files are physical states of physical things, and they’re subject to decay and disorder just like their analogue counterparts. Digital data fundamentally exists as millions of minute magnetic or electric charges. A tiny shift on a subatomic level can cause a cascade of errors. Even data just sitting unused on a drive is subject to “bit rot,” random degradation over time.

Besides the problem of data degradation, archivists also have to think about the future readability of current file formats. There’s no point in investing a lot of time in digitizing a body of material if no one will be able to open the files as software and hardware inevitably becomes obsolete.

drives collage

The 19th-century letters on the left have already lasted over 170 years and will outlive all of the hardware pictured on the right. As the letters sit quietly in their archival boxes, images of them stored on this hardware will need comparatively frequent checks and migration to survive a decade. (Left, Magrath family fonds, Correspondence)

Archivists themselves are on the forefront of pushing the boundaries of digital longevity. Technologies that can neutralize errors are improving; agreed-on standards for file formats are being developed. And refreshing, migrating, and copying digital data can help protect it. Still, the average lifespan of a hard or flash drive is still a fraction of that of a piece of paper stored in optimal conditions (and incidentally, digital media have temperature and humidity requirements too).

So when archivists digitize anything they commit to maintaining that file as well as the original on which it’s based. This labour needs to be factored in to decisions. (For all the reasons outlined above, archivists seldom use digitization alone as a reason for disposing of originals.)

internet archive - digital artifact

This digital file available on the Internet Archive appears to be corrupted.

Marshalling resources

Digitization depends on significant quantities of technical equipment and human labour.

High-resolution scanners and cameras that can adequately capture large materials or negative images are very expensive. Image processing software can also be costly, as can adequate secure digital storage.

2017-800px-rep

Scanning negatives at the Peel Archives at PAMA.

To make a dent in an average archival collection, a scanner (or several of them) needs to be working every day, all day, sometimes for months – and often that scanner is also needed for everyday operations. Some large archives maintain digitization units staffed by specialists. Archivists at smaller institutions fit digitization in where they can amid their other duties. This is why digitization of record groups is often conducted as discrete projects funded by grants or partnerships.

In this post we’re aware that we’re representing archivists in general, so we feel it’s also appropriate to point out a reality about staffing in the 21st century. Around the world today the number of staff in many archival institutions is often no greater (and is sometimes smaller) than in the pre-digital era. This means archivists need to carefully manage their limited resources while taking care of perennial core tasks like accessioning and processing records, and helping researchers. And as populations increase, the number of incoming records continues to increase exponentially.

Sharing responsibly

Even after a set of records is digitized, responsibly sharing them on the web also calls for a process and resources.

First, archivists have to make sure that they are free to share the records in the first place. Some donors of archival records don’t want them to be available for a certain period of time; other records (such as those the government keeps about you) are kept private for legislated periods; and sensitive information about still-living people might be tucked away in personal papers. Copyright (ownership of the intellectual property in the records) may also prohibit widespread sharing.

We’ve already seen that the full meaning of records isn’t necessarily available in the image of those records. It has to be provided by archivists and linked to that image. That information will also need to be conveyed to users on the internet, so the archivist will have to arrange for software or online platforms that can manage this function, and for internet servers to mediate it.

wires

Digitization as a process

Given the factors we’ve outlined, it’s no wonder that archivists approach digitization projects methodically. Rather than running into unexpected problems with any of these challenges (from removing paper clips to clearing copyright), we usually assess an archival collection beforehand to see whether it’s a good candidate for digitization and sharing. Of course this process also takes time, so even if we can mobilize an inexpensive pool of labour, digitization is still a big investment of time and resources.

We hope this overview helps explain why digitization within archival institutions proceeds the way it does – and why we may never, in fact, digitize everything. The triumphs and trials of digitization are themselves a constantly unfolding process. New models are being explored and old ones reconsidered. Nevertheless, access is important to archivists, so digitization is too. You can be sure that archivists in institutions large and small will continue to grapple with this immensely powerful way to broadcast the knowledge we steward.

How can you help?

Here are some ways everyone can help support digitization.

1) Whenever you share a photograph from an institutional collection, share the information that goes with it. Cite the archives, library, museum, or gallery that takes care of the original item (as well as the collection it comes from, if you know it). By doing so you’re revealing a layer of labour that is otherwise hidden: you’re spreading the word that there are people behind such images who have saved the originals for the future and made copies available. You’re also passing along an essential part of images that will help viewers understand them.

2) Be curious about what archivists, information professionals, and cultural workers do. It’s usually more than people know. Ask questions and spread the answers around. The more people understand the value of our work, the more support we’ll get and the more support we’ll be able to provide.

[i] Laura Millar, Archives: Principles and Practices (New York:  Neal-Schumann, 2010), p. 7

by Samantha Thompson, Archivist

All photographs by Peel Archives except image of Wikimedia Foundation server.

57 responses to “Why don’t archivists digitize everything?

  1. Pingback: Peel Art Gallery: Why Don’t Archivists Digitize Everything? | ResearchBuzz: Firehose·

  2. Excellent explanation. Should be shared as counterpoint to recent CBC news piece about Canadian Government Secret Archives. Well done PAMA.

    Liked by 1 person

    • Thanks for spreading the word, Iona, we appreciate it. This post was already “in production” as the debate hit but we too thought it would address some of the issues that arose.

      Like

  3. Pingback: An archivist explains why archivists don’t digitize everything | Genealogy à la carte·

  4. Pingback: NatSCA Digital Digest – June | NatSCA·

  5. Pingback: Omni Magazine, ProPublica, GMail, More: Thursday Afternoon Buzz, June 1, 2017 – ResearchBuzz·

  6. Pingback: Why don’t archivists digitize everything? | LibraryLady1000·

  7. Pingback: Historical Highlights 092·

  8. Re:sharing responsibly
    Sometimes digital images are not shared on the web because for-profit image services take free images and sell them, without permission or correct source attribution and without passing any of that profit on to the institution that did all the work in the first place.

    Liked by 1 person

    • This is true. Every archives has to make its own tough decisions about when or how to recoup the high costs of digitization. We’ve had many debates about whether, having freely shared a public domain image, we have any say in what happens to a copy of it. Attribution is good practice but not a legal requirement for public domain images unless you can make it part of a contract. For now it certainly seems that if we don’t want an image to be used commercially it’s best not to put it on online at all. As you know, some institutions attempt to control use by making low res images freely available but not high res (these come with a fee which helps support the institution). Thanks for commenting, this is an important point.

      Liked by 1 person

  9. Wonderful explanation and a “behind the curtain” look at the larger picture. Also an answer to the frequent question of why an archives collections aren’t readily available online in this digital age.

    Like

    • Thank you for reading; we do hope it’s useful for archivists, researchers, and supporters, each in their own way.

      Like

  10. Thank you for this careful, thorough description. I have had to answer similar questions for years and after reading this I am glad to discover, I was in accord with what you wrote. I always told people that there are no icebergs the above waterline portion of which are small enough to describe what has been digitized to what has not.

    Like

    • The iceberg metaphor is a great one. We too hope researchers understand the as-yet-undigitized riches that are nevertheless available to use. Thank you for taking the time to comment.

      Like

  11. Pingback: Reading Today – John Dewees·

  12. Thank you for this interesting and very complete article. Sometimes I share interesting articles on facebook, I shared yours and some people who aren’t archivist asked me some questions about it ! It was nice to see that this subject interest people.

    Like

  13. Pingback: Canadian History Roundup – Weeks of May 28, 2017 and June 4, 2017 | Unwritten Histories·

  14. Pingback: “It must be nice to have summers off!” | The Daily Context·

  15. Pingback: Collaboration between archivists and historians: finding a middle ground – ActiveHistory.ca·

  16. Pingback: Archives and Digital Expectations | INF2307: Crossing the River·

    • Sorry Suzanne, that’s not a question our archives can answer as we don’t hold the Toronto Telegram. Digitizing newspapers from microfilm versions would be time consuming but not technically difficult; the hard part would be hosting the result with an appropriate interface to make it publicly available. For more recent (non-public domain) issues of the paper the rights holder (Post Media) would have to be involved.

      Like

  17. Pingback: Why don't archivists digitize everything? - Most Technology Source·

  18. Pingback: Rétrospective 2017: la liste du Père-Noël | Convergence·

  19. Pingback: Historical Reminiscents EP 09: Demystifying Archival Labour – Access – Krista McCracken·

  20. Much like archivists dislike the language used when researchers or folks online ‘discover’ their collections as if negating the time and effort put into making them discoverable both online and off through cataloguing and metadata, I find this article very tone deaf when it comes to the fact that a lot of work undertaken digitising archives is not done by archivists or librarians, but by professional photographers and imaging technicians, conservationists, and a wide ranging variety of non-archivist specialists working in Archives withing the GLAM sector. I have to say, although there are some valid issues discussed here, you have missed a trick by making it so insular.

    Like

    • Thanks for taking the time to call attention to the diverse highly skilled roles involved in digitization within archives and other institutions (we ourselves have a brilliant reprographics technician who vetted this article). Our posts are addressed to the general public and as such we make choices about how many details to include; perhaps we wrongly ended up sometimes referring to staff within archives in general terms as archivists. On reflection our choice was partially informed by the following 1) As archivists are often in the front lines dealing with the public, we are often the ones being asked, “Why don’t you digitize everything?” We find that many citizens believe that volunteers could always do the task more efficiently and quickly so it was partly this view we were responding to. 2) In our post titles we use the formula “Why do archivists [people] do X” as opposed to “Why to archives [institutions] do X” precisely in order to put a face to the labour involved. We hope that this article shows that the task of digitization is more complex and skilled than many people think.

      Like

  21. Very nice article. But there’s seems to be a general misconception about the medias where archives are stored on. Everybody seems to believe that they are stored on hard-drives (and arrays) and instantly available via direct access or Cloud Servers.
    The less solicited archives are mostly kept on tapes or optical drives but still easily accessible through mechanical robots. These robots insert these medias into the available drives to respond to the ongoing demands. A lot of physical space are needed to keep those additional medias, the robots – and their Backups which for the most part reside at a Different Locations – So more space is again needed.
    The truth of the matter is that, contrary to the general belief, there is absolutely no gain in physical space when copying Archives on digital media. On the contrary, not only the original archives are kept, but we must constantly enlarge the physical space for the digital archives and the ones needed for their Archival – which makes everything seen ironic.
    Yes there are many advantages to store Archives on digital medium and I agree that the effectives needed to do so must be critically chosen and well brought about for research use.
    I remember when Mainframe computers and PCs came about. The misconception then, was to think that this new technology would diminish the consumption of paper. The truth was much the opposite. The paper use skyrocketed for the next 20 years. Only recently has it diminished.
    We constantly need to rethink how archives are kept and how accessible they will be on future technologies and software because the work done today may partly be in vain.

    Like

    • Thanks for fleshing out some of the technical aspects. Indeed all digital media – whether drives, tapes etc. – are in essence also physical, with formats constantly changing. What a preservation challenge, and one we have not conquered.

      Like

  22. Im working as an archivist in the Netherlands, dealing with digitizing of original records and just can say wonderfull blog you wrote. All problems and work you described are (partly sadly enough) familiar

    Like

    • Thanks for your perspective – it’s nice to hear from the Netherlands. Saving documentary memory truly involves an international community of archivists and their skilled partners in IT, conservation etc.

      Like

  23. Pingback: Why Don’t Archives Digitize Everything? – Museums and Art Galleries in A Digital Age·

  24. Thank you for this excellent explanation of our collective challenges. I intend to assign this post as a required reading in my course session on digitization. It’s a must-read for up and coming archivists and researchers who ask this question.

    Like

    • Thank you very much, Miriam, for your encouraging words and for your work informing new archivists and researchers. We all need to work together, not only to find new ways to meet the challenges of digitization but to temper public expectations with a dose of reality.

      Like

  25. Pingback: Vintage Miscellany – February 4, 2018 | The Vintage Traveler·

  26. Pingback: Why Everything is Not Digitized – Tell my story.·

  27. As a historian who has run archives and museums, and has digitised hundreds of thousands of archival objects for online public access, I still find it puzzling that professional archivists regularly produce these two questions – “Why don’t you digitize everything?” and “When will you be putting all your records on the web?” – as evidence of the apparent ignorance of the general public.
    I am also puzzled by the way that professional archivists, who are responsible for material that may be centuries old, and will be planning its survival for centuries more, frequently answer these questions in terms of the fact that, to have any significant impact, they would have to work “for months”.
    For me, the interesting questions are “In a hundred years, will we have digitised everything?” and “In a hundred years, will all our records be on the web?” To answer these, we need to remember that “digitisation” as currently understood is a process with a limited lifetime, as it is overwhelmingly the transfer into digital form of data originally stored on paper. In a hundred years’ time almost all the material being dealt with by professional archivists will have been created digitally, and users will expect virtually instant access in unmediated form. The volume of digital creation will indeed be so great that any cataloguing involving the human mind will impose such restrictions of time and access that users will rightly view it as censorship. The rules for ordering digital material will either be embedded in documents at the moment of creation, or imposed after dissemination through complex algorithms and artificial intelligence. Archivists will be able to offer only contextualisation, links, and some sense of relative importance.
    So, what of our current arguments over digitisation? Well, in a world where information is defined and valued by instant unmediated access, non-digital records will simply be invisible, except to an insignificant number of passionate antiquarians. So let’s have two new questions for professional archivists: “What are you doing to get all of your analogue records digitised?” and “How can you justify not putting all of your records on the web?”

    Like

    • Hi Peter,
      Thanks for taking the time to comment and raise your concerns. We believe we addressed some of them in the post, but we welcome the opportunity to make a few points that were beyond the scope of this post. One involves the challenge of appraising and preserving born-digital records and large data sets. We touched on this point above as it applies to the digital facsimiles produced by digitization (which is otherwise a separate problem). Archivists are very aware of the challenges on this front including the redundancy of some traditional approaches. We may write a separate post on this issue.

      We also believe it’s important to point out that many, if not most, archives around the world – especially those in developing countries – are small and beleaguered, perhaps with one or two staff. As such, they are forced to make choices about what to digitize; doing it all is not something they can entertain. Most would love to be able to do more than they are.

      Lastly, many highly experienced and thoughtful archivists believe that even given infinite resources of money, staff, and equipment, it would nevertheless be imprudent to digitize all existing analogue records. The reasons can’t appropriately be debated in a brief comment here, but only one of them is that digitization involves a large carbon footprint.

      Thanks again for sharing your thoughts.

      Like

  28. Pingback: Why don’t #archivists digitize everything? | bluesyemre·

  29. Why are academics so long winded? I did several Veterans History Project interviews when David Jackson was the archivist, and sent in dozens of digital images, along with audio tape, which has a limited shelf life, no matter how well preserved.

    When Clay County had everything on microfilm, I read every stored issue of the Liberty Tribune, on a fisch reader. The state’s digital archive of that record is very thin. It should all be digitized, if only as a backup.

    Like

    • Thanks for commenting. You’re right that some analogue media like audiovisual tapes of various sorts are so prone to deterioration that digitization is the only way to rescue the sound or image content.

      Like

  30. Superb explanatory article on the problems – human, mechanical and financial. Shall pass this article on to non professional archivists who constantly enquire … ‘ why don’t you just digitise the collection!

    Like

    • Thank you for your kind words. There’s a lot packed into the “just” in “just digitize it”, isn’t there?

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s