Indexing shortfalls.

By Bob McAllister.

The relationship between family history researchers and those who transcribe and/or index records is a complex one. When a previously unknown document is revealed, the selfless toilers who facilitated the discovery are lauded. On the other hand, when a cryptic squiggle has been misinterpreted, digits transposed, or entries from two records tangled; the oafs responsible are roundly condemned as being if not actually well meaning then barely competent for the task.[1]

Yet, deep down we recognise that these Jekyll and Hyde characters are no different from ourselves. In fact, many of them, peering at eroded monumental inscriptions in a rain-swept churchyard or pouring over a pdf print of dubious quality from an indistinct micro form image, are researchers themselves. They employ their time, effort and expertise to assist colleagues who they never expect to meet, to achieve their research goals. So why are we so quick to judge?

There was a quantum shift in the relationship when family history research was commercialised and industrialised. An “index” was transformed from a slim printed volume to a product to be advertised and sold on-line by family history companies. For some researchers the on-line database was no longer seen as a finding aid but assumed the totally incorrect status of a record in its own right. Entries in an index were attached to personal trees as evidence (perhaps, even proof) of its claims.

Inevitably, generational change saw a drift away from the traditional practice of family history embodied in local societies as folk were drawn towards the big subscription sites. Plaintive cries that “It is not all online, you know” were still to be heard but largely ignored.

Rapid and extensive development in communications technology eventually made it feasible for the societies to attempt to compete in the same arena. They were able to re-purpose existing resources for digital delivery to members and to seek additional content to value-add. In many cases, this was seen not as a distinct product to generate revenue but as an additional benefit of membership in order to encourage renewal.

Whatever their underlying motivation, this places a growing number of organisations in the same commercial niche as the major subscription sites. We expect their products to be fit for purpose and to meet fundamental standards sometimes summarised as merchantable quality. What are the features that infuriate users?

Dates that are not
In essence, a vital record printed onto paper is no more than an array of dark and light shades in a pattern. But it is a pattern that we have learned to process visually and then mentally interpret as a date. When digitised this is transformed into an array of electrical signals that can be tagged as representing a moment in time but may be left untagged as a generic string of alphanumeric characters. When I search an on-line index for a particular name and am returned four or five results distinguished by different “dates” of birth, I expect to be able to sort them into generational order.

Inappropriate authenticity
When civil registration was introduced into England and Wales in 1837, a system of numbered volumes of registers was established to uniquely associate each event with a registration district. Each volume was assigned a distinguishing roman numeral. For example, every birth in the district of Patrington was recorded in Vol XXIII.[1] That practice continued until 1851, when a revised scheme relabelled it as Volume 9d.[2] Why then does a search for an 1845 event on GRO UK (General Record Office, the official agency responsible for these records) return the description Vol 23 (i.e. Arabic numerals)? Because the use of Roman numerals is incompatible with the nature of modern computing devices. It would have been second nature for a scribe in a scriptorium to sort folios into numerical order so that VIII preceded IX; for a computer it represents a major challenge that is usually ignored. The General Register Office recognises this shortcoming and transforms the original information to a more useful form because the index should assist users in gaining access to the actual underlying records, not hinder them. Yet many other publishers choose to preserve the archaic form of the information even though it lessens the accessibility of the content.

Example of a General Register Office (GRO) search

Non-standard names
From our earliest days, colonial surveyors sought to have official designation of place names used instead of informal local names. Generations of citizens and officials overturned that intent by (mis)recording every imaginable variant of the official name at different times. When every error and variant is faithfully copied into a digital index, they reduce the possibility that the searcher will access all of the relevant original documents. Family Search leads the way in defining a single set of standardised place descriptors for every location world-wide. They do not pretend to go in and “correct” the original source but rather ensure that searchers have the opportunity to make their own judgement of its intent.

Archaic punctuation
Sixty years ago, I learned (painful) lessons on the only correct way to represent my name with initial letters instead of my full given names (viz R. J. McAllister). I doubt that I had used full-points in addition to separating spaces for more than half a century, until a digital index insisted that a search for R J did not match a record that I knew was there. A deeper investigation revealed that within the same index every variant practice could be found <X. Y.> <X Y> and <XY>. Clearly fashions in handwriting changed over time and in order to use this particular index, I was required to guess which fad applied to the moment in which the record I sought was generated.

Abbreviations
There was a time when Thos, Geo, Chas and Jno were “names” in common written use.[4] So it is not surprising that that there are many instances found in historical records. Whether this justifies their literal inclusion in a digital index is another matter. While an experienced researcher (that is, one caught out before) might know to employ wild card searches to overcome this barrier, it would be preferable to adopt standard spelling throughout.

My list of gripes with on-line indexes arise from a single central concern. The underlying design principle should be to minimise the number of false negative results (that is, no record that matches the search criteria should be omitted from the returned results). False positive results (those that might, but do not actually, match the specification) should be regarded as an acceptable cost to avoid false negatives. The reward of identifying a single great-great-grandfather outweighs any effort “wasted” in assessing and rejecting a dozen potential candidates.

Editors Note: Readers are reminded that this blog is the view of the writer and not necessarily that of the organisation.

[1] https://getsatisfaction.com/familysearch/topics/how_do_i_fix_online_indexing_errors
[2] Patrington, East Riding of Yorkshire
[3] https://www.genuki.org.uk/big/eng/civreg/GROIndexes
[4] https://en.wiktionary.org/wiki/Appendix:Abbreviations_for_English_given_names

GSQ Blog

Indexing shortfalls.

Comments

Indexing shortfalls. — No Comments

Leave a Reply Cancel reply