Our course has moved from the philosophical nature of knowledge and history to the more practical issues of digitizing, posting, and making meaningful historical information.  I have some experience with a large digitization project and I thought I would lay out some of my experiences (January-August 2012) and how they parallel issues raised in Digitizing History.

The Veterans Curation Program (VCP) is a contractor-executed, Army Corps of Engineers (ACoE) initiated and overseen program that seeks to serve two purposes: 1) provide veterans with term full-time employment in a civilian office setting and while doing so teaching digitization, database use, etc.  The program also aims to 2) provide industry standard curation for the artifact and associated documentary collections possessed by the ACoE.  More information on the program is available at http://www.veteranscurationprogram.org/, the Alexandria (local) office website is at http://www.veteranscurationprogram.org/#/alexandria-va/4554896248, and their facebook page is https://www.facebook.com/VeteransCurationProject.

While at the VCP we recorded data regarding the artifacts in the collection, populated that data into databases, and digitized all of the paper records (field notes, maps, forms, etc.) associated with the dig sites.  Finally, all examples of certain artifact types and a representative sampling of less significant artifact types were photographed with state-of-the-art technology resulting in publication or museum quality photographs.

All of this information was, of course, public domain (as far as I am aware), and one of the ongoing high-level discussions was how to get this data online, how much of it should be presented, in what format, aimed at what audience, and at what expense.  The paper records were scanned, but not yet marked up, and the entire collection to be digitized is thousands of boxes – I had heard that at current rates (three offices with roughly a dozen technicians apiece running several classes a year) there were still many years of work to be done.  Marking up those documents, many of which were standard forms, but which also included poorly written field notes, would be a monumental additional undertaking.  At the moment, we were simply saving the documents using the identifying dig site information.  Was this sufficient?  Did the notes warrant transcription or markup?  To a large extent that depends on the audience, which I’ll return to in a moment.

The photographs were another issue.  We took extremely high quality images, ones that could use the entire screen of a large television to view.  That quality would allow researchers to zoom in very close – close enough to get a meaningful look at the texture of the edge of a projectile point or a thin pottery sherd.  The flip side was, naturally, the enormous size of those photographs.  Saving and putting those pictures up on the web will require an enormous amount of space and will require a very quick internet connection to use effectively.  The solutions to this largely come back to cost, intended use, and audience.

Thirdly, the way that artifacts were described were a conscientious compromise which, as all compromises are, solved a problem but left some unhappy with the results.  Experts can provide extremely detailed (and thus, to them, useful) descriptions of artifacts.  To use a biology analogy, a botanist can provide scientific taxonomy down to the subspecies of an encountered plant.  To another botanist that specificity is not just useful, but fairly critical.  The level of description we were providing was more analogous to the common name of said plant.  Useful for pedestrian purposes, and enough to give an expert an idea of the possible specifics, but not specific enough for detailed research.

All of these issues came back to budget, purpose and audience.  Further detail required more man-hours, and the purpose of the project precluded shipping the artifacts off to India (were that to be a practical or legal option at all).  Given the enormous size of the collection increasing the specificity required will have a significant impact on the project’s manpower cost.  Similarly, changes in standards for the data storage (marked-up documents, more exhaustive photography) would have massive impacts on both personnel and storage costs.  The project is also inherently inefficient (on the surface) because it rotates through groups of individuals who have no prior experience with curation.  This results in approximately one to two months of the standard six-month course being spent getting up to speed on the job at hand.  The project could be done more efficiently if a standard cadre of full-time curators were working on it, but the program was intended to help provide job experience and placement in addition to the direct curation responsibilities.  That social service aspect of the project (presumably) pays off in the long run, helping the veteran participants to find more stable long-term employment which will have long-term economic benefits.

With regard to purpose and audience, what is the role of the VCP, and ACoE curation efforts more generally?  As I understand it, by law the ACoE is responsible for keeping its collections preserved and available.  Traditionally they have been housed in archives (or the basements of Universities) and have been hard to access.  The VCP is part of an effort to not only meet the minimum requirements of preserving those collections, but rendering those collections useful to the American public and researcher.  So, how much money is justified for such a purpose?  Is it good enough to just store these assets, need they be digitized, and for whom?

The intended audience is a major issue; if these collections were to be maximally useful for researchers then the necessary additional data would be very large.  As it is, some use will be able to be had from the digitized collections (certainly a great deal more than before), but trips to the archive might still be necessary.  If, however, the audience is intended more for the general public or the school system (a much easier sell in a time of constrained budgets!) then the data provided will be sufficient.  The one great caveat is that the records cannot be easily amended later.  Once begun, major changes to things like description standards would essentially require redoing all aspects of the project that have to be amended.

This post has already been longer than I intended, but it has ignored major aspects of the program, including website hosting, the social benefits of the program, the collaboration it has required and sparked, and more.  I hope that this post has accomplished its goal of putting some concrete examples to some of the issues with major historical (or, in this case, archaeological) digitization projects.