[BSLWAC] Near Match Reports (R00) - Now Available

12 Apr 2012

      In a previous post on March 23, 2012, we talked a little about our
efforts to create a more useable report for unmatched headings. We have
added more functionality to the report that we hope helps clarify the
results. Also, we plan to continue to refine the algorithm we use for
the near matches as well as the confidence level we have assigned to
each near match.

Here are a few examples of the report (from its current build):

ocm05472887

100  1_  $a Allen, Junius Mordecai.

99.5%

no 95045186

400  1_

$a Allen, Junius Mordecai, $d 1875-1906.

56.5%

no 00103969

100  1_

$a Allen, Junius, $d 1898-1962

ocm77567496

650  _0  $a Adventure and adventurer $v Fiction.

97.1%

sh 85001072

450  __

$a Adventure and adventurers $v Fiction

70.6%

sh2009113774

150  __

$a Adventure and adventurers $z Europe $v Biography

ocm02224738, ocm02464058, ocm02735261, ocm03462153, ocm04493529

490  0_  $a Old West

99.5%

no 96034673

130  _0

$a Old West (Alexandria, Va.)

99.5%

n  99000801

151  __

$a Old West Lawrence Historic District (Lawrence, Kan.)

Not all near matches will be ranked so high on our "confidence level
percentage", but these three should give you a better idea of the
report's results.

We match as much of the original heading to the near match as possible.
Whatever matches on the unmatched heading is highlighted in BLUE. Parts
of the near match that are potential typos or new additions not
contained in the unmatched heading are offset in RED. Then the second
near match is also highlighted similar to the first near match, but in
GREEN, to help distinguish between the two near matches.

As a next step, we are looking into the possibility of sorting this
report based on percentile. So 90 percentile near matches will be listed
first (and sorted within that group A-Z). This might take some extra
finagling from our programming team to successfully implement, but we
will keep you updated on our progress.

While the higher percentile near matches are useful for letting you know
what may actually be a valid match, we also want to point out that the
lower percentile matches are useful in identifying (or dismissing)
headings where there exists no near match. Every unmatched heading will
have two near matches listed underneath it, even if those near matches
are very low probability (less than 5%). This is due to how our
algorithm is setup to generate these near matches for the report.

This report is called:

R00 - Near Match Report.htm

Please feel free to contact your project managers in order to request
that we start delivering this report with your Current Cataloging
results (at no extra cost):

Judy Archer (email
<mailto:jarcher@bslw.com?subject=R00%20-%20Near%20Match%20Report> )

Stephanie Hansen (email
<mailto:shansen@bslw.com?subject=R00%20-%20Near%20Match%20Report> )

We will still be delivering R07 (Unmatched Headings) and R10 (Multiple
Authority Matches), so this R00 - Near Match Report won't yet replace
those. But since every unmatched heading will have two near matches
listed underneath, we do want to point out that it can be quite large
depending on the size of your Current Cataloging file (and matching
results).

We welcome your feedback!

Nate Cothran - nate@bslw.com
<mailto:nate@bslw.com?subject=Automation%20Services%20-%20Query> 

Product Manager, Automation

  Backstage Library Works

  533 E 1860 S, Provo UT 84606

(p) 801.342.5697 - (f) 801.356.8220

www.ac.bslw.com/community/blog <http://ac.bslw.com/community/blog/>