Hi, this is a really good improvement, especially with the ‘confidence’ ratings, and the color coding.

 

Thank you!

 

Éva

 

From: bslwac-bounces@mailman.xmission.com [mailto:bslwac-bounces@mailman.xmission.com] On Behalf Of Nate Cothran
Sent: Thursday, April 12, 2012 5:54 PM
To: Backstage Library Works Authority Contol Listserv
Subject: [BSLWAC] Near Match Reports (R00) - Now Available

 

In a previous post on March 23, 2012, we talked a little about our efforts to create a more useable report for unmatched headings. We have added more functionality to the report that we hope helps clarify the results. Also, we plan to continue to refine the algorithm we use for the near matches as well as the confidence level we have assigned to each near match.

 

Here are a few examples of the report (from its current build):

 

ocm05472887

100  1_  $a Allen, Junius Mordecai.

99.5%

no 95045186

400  1_

$a Allen, Junius Mordecai, $d 1875-1906.

56.5%

no 00103969

100  1_

$a Allen, Junius, $d 1898-1962

 

ocm77567496

650  _0  $a Adventure and adventurer $v Fiction.

97.1%

sh 85001072

450  __

$a Adventure and adventurers $v Fiction

70.6%

sh2009113774

150  __

$a Adventure and adventurers $z Europe $v Biography

 

ocm02224738, ocm02464058, ocm02735261, ocm03462153, ocm04493529

490  0_  $a Old West

99.5%

no 96034673

130  _0

$a Old West (Alexandria, Va.)

99.5%

n  99000801

151  __

$a Old West Lawrence Historic District (Lawrence, Kan.)

 

Not all near matches will be ranked so high on our “confidence level percentage”, but these three should give you a better idea of the report’s results.

 

We match as much of the original heading to the near match as possible. Whatever matches on the unmatched heading is highlighted in BLUE. Parts of the near match that are potential typos or new additions not contained in the unmatched heading are offset in RED. Then the second near match is also highlighted similar to the first near match, but in GREEN, to help distinguish between the two near matches.

 

As a next step, we are looking into the possibility of sorting this report based on percentile. So 90 percentile near matches will be listed first (and sorted within that group A-Z). This might take some extra finagling from our programming team to successfully implement, but we will keep you updated on our progress.

 

While the higher percentile near matches are useful for letting you know what may actually be a valid match, we also want to point out that the lower percentile matches are useful in identifying (or dismissing) headings where there exists no near match. Every unmatched heading will have two near matches listed underneath it, even if those near matches are very low probability (less than 5%). This is due to how our algorithm is setup to generate these near matches for the report.

 

This report is called:

R00 – Near Match Report.htm

 

Please feel free to contact your project managers in order to request that we start delivering this report with your Current Cataloging results (at no extra cost):

Judy Archer (email)

Stephanie Hansen (email)

 

We will still be delivering R07 (Unmatched Headings) and R10 (Multiple Authority Matches), so this R00 – Near Match Report won’t yet replace those. But since every unmatched heading will have two near matches listed underneath, we do want to point out that it can be quite large depending on the size of your Current Cataloging file (and matching results).

 

We welcome your feedback!

 

 

Nate Cothran - nate@bslw.com

Product Manager, Automation

  Backstage Library Works

  533 E 1860 S, Provo UT 84606

(p) 801.342.5697 - (f) 801.356.8220

www.ac.bslw.com/community/blog