INDEXIA BLOG

AI Can Create Book Indexes: Here's Why

Ben Vagle
AI Can Create Book Indexes: Here's Why

Back-of-book index quality has never been objectively measurable. The field's most prestigious recognition, the ASI Excellence in Indexing Award, is decided by panel judging that rates indexes along subjective criteria such as "elegance". There has not yet been a system that can be employed to audit, reproduce, or compare index quality across books.

This has been a puzzle for us at Indexia, an AI book-indexing service. It has made it challenging to show that AI-generated book indexes are able to operate at a quality comparable to human-generated book indexes. We know our service generates great indexes, but we needed a way to show it.

Will Dinneen, my partner in designing Indexia, and I think the answer to that challenge lies in agent benchmarks — platforms that evaluate the ability of AI systems to do economically valuable work. Benchmarks have been a key component of measuring AI progress in areas ranging from software engineering to, more recently, the law.

To that end, we're announcing the Indexing Standards Benchmark (ISB), the first auditable, rule-grounded evaluation of back-of-book indexes.

The ISB monitors an index's compliance with the best industry guidance on back-of-book indexes: ISO 999:1996, ANSI/NISO Z39.4-2021, and the Chicago Manual of Style, 15th edition, Chapter 18. We extracted hundreds of atomic rules from each of these style guides and turned each one into a tailored evaluation prompt with an explicit unit of analysis. From there, we used Indexia to index public-domain books and proceeded to evaluate those indexes against the ISB.

The results

Across indexes created by Indexia for public-domain books, Indexia scores at 95 percent compliance with the ISB.

The three standards converge tightly. CMOS 15 = 94.7%, ISO 999 = 94.09%, Z39.4 = 95.3% — within a percentage point of each other. In short, no standard is systematically harder to comply with than another; Indexia's output satisfies all three publishing traditions at essentially equal rates.

Indexia ISB compliance across ISO 999, ANSI/NISO Z39.4, and CMOS 15e — all three standards within a percentage point of each other at ~95%.

The benchmark, moreover, allows us to evaluate Indexia's performance across different categories of index quality. Indexia performs especially well in creating well-formed terms, formatting headers appropriately, and generating sub-headers. Consider how this process works in validating cross-references:

Rule (ISO 999 §4(f) — Indicate relationships between concepts): An index should link technical and common-language names for the same referent, reciprocally.

Example book (Lyell, Principles of Geology, Vol. II): barnacles ↔ Balani; Capuchin monkey ↔ Cebus; rock pigeon ↔ Columba livia; argillaceous strata ↔ argillaceous substratum.

Verdict: PASS — common names and Latin binomials are paired in both directions, so a reader looking up either lands at the same content.

Indexia ISB performance broken out by index-quality category — strong on term form, header formatting, and subheaders.

On the other hand, the benchmark has flagged areas in which the platform has shortcomings — for instance, in the proper inversion of names with particles like "von" and "de la." It also highlighted some interesting patterns. In the current benchmark, the books that performed worst concentrated on pre-1920 titles where period naming conventions and dense related-term webs departed from modern practice. Even there, however, we observe broad-based compliance:

Rule (ISO 999 §7.3.1.2 — Form of personal name headings): Names are inverted surname-first, with particles ("von", "de", "St.") placed by national convention and royalty given a title and territorial qualifier.

Example book (Maine, Ancient Law): Austin, John; Bentham, Jeremy; Aquinas, Thomas, St.; Grotius, Hugo; Ihering, Rudolf von; Coulanges, Fustel de; Edward II, King of England.

Verdict: PASS — every sampled name is inverted, particles sit where their national tradition puts them, royalty gets the qualifier.

The human comparison

The ISB is not only able to evaluate Indexia's outputs. The list of rules we have compiled for it can also be used to evaluate human-generated indexes. Across a sample of the same publicly available books we employed to create the benchmark, we extracted their existing, human-generated indexes and evaluated their quality.

Fascinatingly, Indexia's indexes scored better on the ISB than the books' original indexes. In aggregate, across four books in our benchmark that had human-generated indexes, we observed an ISB score of 84 percent — ten percentage points lower than the Indexia-generated indexes. The common types of errors included missing cross-references, improperly entered names, and inconsistent capitalization across entries.

Many of the books we tested are older (and therefore publicly available), so the best practices to which they adhered may not be those captured by the ISB. Nonetheless, the fact that Indexia automatically generates indexes in greater compliance with current best practices than what many human indexes achieve speaks to the promise of our system. Indexia can, in short, cheaply ensure that high-quality indexes become available for every book.

Publicity

We plan to make the ISB publicly available so that indexers and software providers can determine their indexes' quality. Some of the rules we are validating against in the ISB are proprietary, so we are working out how to do this in a manner that respects existing copyright.

Looking ahead

The ISB holds great promise for book indexing and Indexia. By both validating the underlying quality of our existing indexes and providing a clear path for further improvements, we are confident that Indexia — aided by the ISB — will continue to provide unparalleled automated book-indexing services.


A version of this essay originally appeared on Medium.