New England Historic Genealogical Society
Databases

Massachusetts Vital Records Digitization Project Status


The project to digitize the Massachusetts Vital Records 1841-1910 was begun in 2003, in cooperation with the Massachusetts State Archives. The project consisted of five main subprojects:
  1. Scanning the VR Index books.
  2. Converting the Index scans into a searchable database with links to the appropriate Record Page images.
  3. Scanning the VR Record books.
  4. Converting the Record Page images into a format that can be quickly downloaded.
  5. Finding and correcting errors introduced in the digitization process.
Scanning the VR Index Books
There are 80 Birth, 65 Marriage, and 58 Death VR Index books for the period 1841-1910. Each book contains approximately 524 pages, and each page contains approximately 80 names, for a total of approximately 8,500,000 names.

With the exception of the ten-year period 1841-1850, indexes are grouped in five year intervals. Each index contains records from all towns, alphabetized by surname. For example, ‘Volume 1’, ‘page 1’ of the Birth VR index contains records for the surnames ‘Abbe”, ‘Abbee’, “Abbey’, and ‘Abbot’. Each record in an index book consists of a Surname, Given Name, Town, Year, and Record Book Volume and Page Number.

This portion of the project was completed in 2005.

Converting the Index Book scans into a searchable database
Once the index pages were scanned, each index image was submitted to an Optical Character Recognition (OCR) software process that converted the image to a table of text values. Since the original index pages only have surname entries for the first instance of each surname, it was necessary to manually enter the surnames on those lines without surnames. Additionally, when the indexes were created, repeating given names and town names were represented with double-quote (“) marks. This made it necessary for these marks to be manually replaced by the actual name. Once all manual corrections were made, the tabular data from each page of each index book was added to a database. The database allows the records to be searched by surname and given name, with the optional qualifiers of year (or year range), town, and county.

The index database also contains the volume and page number of each associated Record Book page, which makes it possible to jump from a name to an actual record.

This portion of the project was completed in 2005.

Scanning the VR Record Books
While the Index books contain records for multiple years and are alphabetically organized by surname, the VR Record Books contain records for one year and are alphabetically organized by town name. Each Record Book volume contains the records of towns in a set of counties. For example, Birth Record book ‘43’ contains records for all towns in Hampshire, Middlesex, Nantucket, Norfolk, and Plymouth counties in 1850. Page 1 of Volume ‘43’ lists births in the town of Amherst.

There are 597 Record Books for Births, Marriages and Deaths in the period 1841-1910, and an additional 805 Death Record books for the period 1903-1910. (In mid-1903, death records were converted from one page per town, per year, to one page per death certificate, per year.)

This portion of the project was completed in 2005.

Converting the Record Page images into a downloadable format
The Record Books are physically quite large (17” x 22”), the entries on each page are handwritten and many of the pages are faded. This results in the scanned images being very large (E.g., 3 MB each). These image files are about 50 times bigger than all other images on NewEnglandAncestors.org and would be too slow to download for the majority of users. In addition, the images are several times larger (2800 x 2500 pixels) than computer displays, making the images difficult or impossible to view.

The solution selected for the ‘large image’ problem was the ‘MrSID’ viewer. This viewer is a free download that adds the capability to view MrSID images to internet browsers such as InternetExplorer. MrSID image files are much smaller than the original image files (making download much faster) and offer the capability of image zooming and panning (making it possible to see very large images.)

Once the record book images were scanned, they were converted into MrSID format for download from NewEnglandAncestors.org.

This portion of the project was completed in 2006.

Finding and correcting errors introduced in the digitization process.
This portion of the project was begun in 2004.

Large and complex projects are subject to a variety of errors and this project is no exception. Errors were made at each project stage and some systemic errors related to the OCR process have also been found.

The problems being addressed fall into three categories:
  1. Index pages were incorrectly scanned, misnumbered, or missed altogether.(This problem is complicated by the fact that the original index pages are not numbered.)
  2. Record pages were incorrectly scanned, misnumbered, or missed altogether.
  3. Transcription errors were introduced in the OCR process.
Currently, teams are working at both NEHGS and Massachusetts Archives to find and correct problems. Missing or incorrectly scanned images, where available, are being rescanned. The OCR error correction process is being handled primarily by volunteers who manually correct records where incorrect information has been found and the correct information can be identified. Their work is being done at the Massachusetts Archives, at NEHGS, and with volunteers working from home. This is an ongoing project.