Hi, this is a really good improvement, especially with the ‘confidence’ ratings, and the color coding.

Thank you!

Éva

From: bslwac-bounces@mailman.xmission.com [mailto:bslwac-bounces@mailman.xmission.com] On Behalf Of Nate Cothran
Sent: Thursday, April 12, 2012 5:54 PM
To: Backstage Library Works Authority Contol Listserv
Subject: [BSLWAC] Near Match Reports (R00) - Now Available

In a previous post on March 23, 2012, we talked a little about our efforts to create a more useable report for unmatched headings. We have added more functionality to the report that we hope helps clarify the results. Also, we plan to continue to refine the algorithm we use for the near matches as well as the confidence level we have assigned to each near match.

Here are a few examples of the report (from its current build):

ocm05472887
100 1_ $a Allen, Junius Mordecai.
99.5%	no 95045186	400 1_	$a Allen, Junius Mordecai, $d 1875-1906.
56.5%	no 00103969	100 1_	$a Allen, Junius, $d 1898-1962

ocm77567496
650 _0 $a Adventure and adventurer $v Fiction.
97.1%	sh 85001072	450 __	$a Adventure and adventurers $v Fiction
70.6%	sh2009113774	150 __	$a Adventure and adventurers $z Europe $v Biography

ocm02224738, ocm02464058, ocm02735261, ocm03462153, ocm04493529
490 0_ $a Old West
99.5%	no 96034673	130 _0	$a Old West (Alexandria, Va.)
99.5%	n 99000801	151 __	$a Old West Lawrence Historic District (Lawrence, Kan.)

Not all near matches will be ranked so high on our “confidence level percentage”, but these three should give you a better idea of the report’s results.

We match as much of the original heading to the near match as possible. Whatever matches on the unmatched heading is highlighted in BLUE. Parts of the near match that are potential typos or new additions not contained in the unmatched heading are offset in RED. Then the second near match is also highlighted similar to the first near match, but in GREEN, to help distinguish between the two near matches.

As a next step, we are looking into the possibility of sorting this report based on percentile. So 90 percentile near matches will be listed first (and sorted within that group A-Z). This might take some extra finagling from our programming team to successfully implement, but we will keep you updated on our progress.

While the higher percentile near matches are useful for letting you know what may actually be a valid match, we also want to point out that the lower percentile matches are useful in identifying (or dismissing) headings where there exists no near match. Every unmatched heading will have two near matches listed underneath it, even if those near matches are very low probability (less than 5%). This is due to how our algorithm is setup to generate these near matches for the report.

This report is called:

R00 – Near Match Report.htm

Please feel free to contact your project managers in order to request that we start delivering this report with your Current Cataloging results (at no extra cost):

Judy Archer (email)

Stephanie Hansen (email)

We will still be delivering R07 (Unmatched Headings) and R10 (Multiple Authority Matches), so this R00 – Near Match Report won’t yet replace those. But since every unmatched heading will have two near matches listed underneath, we do want to point out that it can be quite large depending on the size of your Current Cataloging file (and matching results).

We welcome your feedback!

Nate Cothran - nate@bslw.com

Product Manager, Automation

Backstage Library Works

533 E 1860 S, Provo UT 84606

(p) 801.342.5697 - (f) 801.356.8220