April 1, 2025

Format: Pulling the Index Together

Over the last few weeks, I’ve been exploring the components of an array—main headings and subheadings, locators, and cross-references. It’s time to now look at how these are all held together.

Formatting the index, by which I mean either run-in or indented format, along with punctuation, has to do with the visual structure of the index. It is what the reader sees on the page. An index is not written like conventional prose, with complete sentences, capitalization, and closing punctuation. Headings and subheadings are instead single words or short phrases which nonetheless need to be strung together in a coherent way that indicates relationships and is easy to search.

There are two ways that indexes are typically formatted: run-in and indented. Run-in format is more space efficient and common in scholarly books, where indexes tend to be dense and space can be at a premium. Indented format is easier to scan, while also being more spread out, taking up more space.

When I write an index, the software I use, Cindex, allows me to easily switch back and forth between run-in and indented format. I find this a huge advantage because I prefer to index in indented format, which is, I suppose, my implicit acknowledgment that indented format is easier to work with. Once I have finished editing the index, one of my last steps, if the press requires run-in format, is to switch the format and export.

The difference between run-in and indented looks like this:

Run-in format

coffee: café culture and, 35, 64-66, 132; comparison to tea, 52-55; fair trade, 22, 32, 58, 89-90;

geopolitics and, 43, 78; industrialization, 33-34, 88, 91-93; roasting, 43, 53, 67; varieties, 29,

45-47. See also caffeine; tea

Indented format

coffee

café culture and, 35, 64-66, 132

fair trade, 22, 32, 58, 89-90

geopolitics and, 43, 78

industrialization, 33-34, 88, 91-93

roasting, 43, 53, 67

tea, comparison to, 52-55

varieties, 29, 45-47

See also caffeine; tea

As you can see, in run-in format the main heading, subheadings, and cross-references are strung together, one after the other. The main heading is separated from the subheadings by a colon, and subheadings are separated from each other by semicolons, with a period between the final subheading and the cross-references. This format allows the array to utilize the full length of the line; once the line is full, the array runs down onto the next line. Subsequent lines are indented to differentiate between arrays. This is what makes run-in format more space efficient, though having all of the lines run together does mean that the reader needs to scan more carefully.

In indented format, in contrast, each element of the array has its own line. The main heading is at the top with all subheadings following below. The subheadings are indented to differentiate between the heading and subheadings, and to differentiate between arrays. The cross-references, in this example, are placed at the very end, also indented. Because each line is indented, less punctuation is needed—no colons, semicolons, or periods. With the subheadings stacked on top of each other, the subheadings are easier to scan, though also taking up more lines.

For both run-in and indented formats, commas are used to separate headings and subheadings from locators, and locators from each other.

In my experience, some publishers, including scholarly, see the value of indented format and are committed to making sure that there is enough space. Indented format also works well for an index that doesn’t have many subheadings, as the space difference between run-in and indented will then be negligible. Otherwise, if the index contains a lot of subheadings and space is an issue, use run-in format to squeeze in as many lines as possible. Sometimes, if I know the publisher is open to either and I’m not sure how the index will fit, I’ll submit the index is both formats for the publisher to try and decide.

A few more points:

Capitalization should follow the text, and generally is only for proper nouns. I believe it used to be a convention to always capitalize headings and subheadings, as in a sentence, but that convention seems to have passed. The exception are the See and See also of cross-references, which are capitalized since they typically follow a period.
Adjust how subheadings are phrased, depending on the format. In run-in format, because subheadings run together, I try to ensure that the subheadings read naturally, without inversion or extra punctuation to chop up the phrase. While I still try to lead with the key term, there is already so much punctuation that I don’t want to make the array even more complicated. In indented format, in contrast, because the stacked subheadings are easier to scan, always lead with the key term. This may require inverting the phrase. For example, see the subheadings “comparison to tea/tea, comparison to” in the examples above. This difference also affects how the subheadings are sorted—another detail to pay attention to. The goal, as always, is to make the subheadings both easy to read and easy to find.
If you use dedicated indexing software, like Cindex, you don’t need to pay too much attention to formatting. The software does it for you. You do need to make sure, though, that you have chosen the correct settings. Having the software handle the formatting is a huge timesaver and frees up cognitive space to focus on analyzing the text and writing the entries. When I first started indexing, I typed everything out in a Word document and needed to manually create the format. That was an excellent way to intimately learn how format works, and I’m also thankful I don’t need to do that anymore.

The format is how the index appears on the page. It helps to pull all of the entries and arrays together in a way that is easy to read and search. Run-in and indented formats are both common and both have their pros and cons. While indexing software typically handles the formatting automatically, as the indexer, be aware of the options and of how the format affects how entries are written and how the index fits the reserved space. How the index appears on the page is also important, alongside the contents of the index, and you may find yourself called upon to advise the client on the best format for their particular index.

Article

March 18, 2025

0 comment

by Stephen

Signposts within the Index

Welcome back to the mini-series on the basic elements of an index! I previously discussed entries and arrays, main headings and subheadings, and locators. Today I am writing about cross-references.

Cross-references are a type of locator, but instead of directing readers to the text, cross-references redirect readers to a location within the index. Riffing off the metaphor of an index as a map, I like to think of cross-references as signposts within the index that ensure readers find the arrays they want.

Cross-references come in two types, See and See also. See references tell the reader, “Good effort, but the information you want is actually over there,” while See also references indicate, “if you liked this, maybe you will also like that.”

How to Use

Cross-references should always point to new information, which is to say, new locators that the reader hasn’t seen yet. If the array the cross-reference is pointing towards has identical locators, then it is a double-post, which is also good, but renders a cross-reference unnecessary.
Speaking of double-posts, cross-references and double-posts are kind of like cousins, complementary tools for making information accessible within the index. Double-posts are when identical information is placed in two or more locations in the index, such as both a main heading and subheading, to accommodate readers searching either way. They work best for shorter strings of locators that are easily duplicated. The downside to double-posts is that they can take up space, especially if subheadings are involved. In that case, a cross-reference is the quicker and more space efficient option.
Cross-references are usually from broader topics to more specific topics. For example, “computers. See also Apple; Microsoft.” The logic is that readers less clear on what they are looking for will likely begin with the broader term, whereas if they already have the specific in mind, they will skip the broader term and go straight to the specific. For those readers uncertain, the cross-reference prompts them that there is more to the topic that they may be interested in. Can cross-references also go in the opposite direction, from specific to broad? Yes, sometimes. I don’t want to create a hard-and-fast rule, but I usually find broad-to-specific to be more meaningful. Think carefully about which direction readers are likely to desire.
Building upon broad-to-specific, cross-references can also be used if a subheading needs to be hived off to create its own array. Say “Apple” is originally a subheading under “computers,” but as the index is created it accumulates a dozen locators, which you decide should be broken down into subheadings. Instead of using sub-subheadings under “computers,” turn “Apple” into its own array and use a cross-reference to redirect readers.
Cross-references can also be reciprocal. Say the book contains two or more related, but not quite identical, concepts. Cross-references are a good way to link these arrays, keeping in mind that readers should find something new in each.
Cross-references can link synonymous terms. Say the book uses the terms “film industry” and “movie industry,” and you decide to make “film industry” the preferred term in the index. A cross-reference should be used to redirect readers who search for “movie industry.” A cross-reference from “Hollywood” may also be a good idea, if the book is specifically about the American film industry.
Cross-references can also be generic. There may be a category of arrays that you want to point readers towards and there are too many to list. For example, in a work of US history that covers multiple presidencies, the following may be useful: “United States of America. See also specific presidents.” The assumption here is that readers will know the names of specific presidents, or will remember which are discussed in the book, and will be able to search accordingly.

Formatting and Placement

Where to place the cross-reference is an interesting question. The cross-reference is usually placed at the end of the array, after the other locators and subheadings, if any. But cross-references can also be placed at the beginning, immediately after the main heading.

For example,

computers: economic advantages; history of; rare earth minerals and; semiconductors. See also Apple; Microsoft; TMSC

and

computers. See also Apple; Microsoft; TMSC: economic advantages; history of; rare earth minerals and; semiconductors

Both approaches have their advantages. Placing cross-references at the end allows the reader to first peruse the array to see if they can find what they want. If not, the cross-references are ready to offer other suggestions. Placing cross-references at the beginning allows readers to quickly see if they actually want to be elsewhere, before they dig into the subheadings. Placing cross-references at the end is more common, and I think is what most readers and publishers expect. However, for reference documents, especially, and for indexes with very long arrays, placing the cross-references at the beginning can be helpful for an audience that wants to search quickly.

Cross-references can also direct readers to specific subheadings. I rarely do this, but that may have to do with the types of books I tend to index. If you need to, the cross-reference can be phrased as either “statistics. See under economics” or “statistics. See also economics—statistics.”

Cross-references can also be attached to specific subheadings, instead of being gathered in a group at the beginning or end of the array. I didn’t use to do this, but the recent NISO indexing standard (ANSI/NISO Z39.4-2021) recommends doing so, which prompted me to give this more thought and to adjust my practice. There are two questions that guide where I place the cross-reference: 1) How specific is the cross-reference? Is it more connected to the main heading or to the subheading? 2) How long is the array? If short, I think readers will easily see the cross-references at the end. If long, then attaching the cross-reference to the subheading, if relevant, allows the reader to be redirected sooner.

In terms of formatting, See and See also are typically upper case and in italics. The exception is if the heading being directed to is also in italics. For example, “Austen, Jane. See Northanger Abbey; Persuasion; Pride and Prejudice.” Multiple cross-references are separated by semicolons. Cross-references are preceded by a period, but, if placed at the end of an array, no closing punctuation is needed. Cross-references attached to subheadings are in lower case and may be placed in parentheses to better differentiate from the surrounding subheadings. For example, “literature: authorship, 34, 53, 122; figurative language, 45, 53 (see also metaphor; similes); poetry, 56-60, 132, 154; translation (see translation)“

Some indexes are thick with cross-references, an interlocking web redirecting readers. This may be due to the book using a lot of synonyms, or similar but different terms, or concepts for which the indexable term is not obvious. Other indexes contain just a handful of cross-references. Either way, the goal is to ensure that readers find the information they desire.

Article

March 4, 2025

0 comment

by Stephen

Cognitive Load and Indexing Oxford UP Titles

My original plan for today was to write about indexing Oxford University Press (OUP) titles, of which I recently indexed two. I will still reflect on OUP, but as I was writing this, I realized that my main issue with OUP’s system is its impact on cognitive load. So partway through I’m going to take a little detour to discuss the cognitive impacts of indexing.

The OUP System

Oxford University Press is unique among publishers, so far as I know, in that it uses a paragraph ID system for indexing. Each paragraph is assigned a unique ID, for example C2P34, which stands for chapter 2, paragraph 34. Each section is also assigned an ID (for example, C3S2), as is each figure (C4F5). In the index, these are used as locators instead of page numbers. When the proofs are finalized, OUP converts these IDs into the appropriate page numbers and ranges.

For me, this system is sort of halfway between traditional back-of-the-book indexing and embedded indexing. I can use the paragraph IDs with my preferred software, Cindex, and I don’t need to fiddle around with embedding tags into the proofs or manuscript. The paragraph IDs also allow the press to output the index for both print and ebook versions.

Are paragraph IDs the best of both worlds? Depends who you ask, perhaps. I suspect some indexers already comfortable with embedding would prefer that OUP fully make that transition, and maybe embedded is better than this hybrid approach. I don’t write embedded indexes, so I can’t compare. Personally, I appreciate being able to use Cindex, though the IDs are not as easy to use as page numbers.

I found it an interesting experience indexing two OUP titles back-to-back. Despite freelancing for about twelve years, I have very little experience with OUP. I haven’t avoided them, per se, but neither have I actively sought out their books. I simply haven’t received many queries, at least until these last few months, when I probably received as many queries as I have in the previous twelve years. So it’s been a crash course for me, figuring out how best to handle the paragraph IDs.

OUP’s indexing instructions are comprehensive, explaining how they want the paragraph IDs used and formatted. So I won’t discuss all of that in detail. Instead, I want to discuss some of the challenges I had, along with some strategies that helped me through.

Impacts on Cognitive Load

The most significant challenge I had with the paragraph IDs was its impact on cognitive load.

Cognitive load is “the amount of information our working memory can process at any given time.” Working memory is the “small amount of information that can be held in mind and used in the execution of cognitive tasks.” These are the pieces of information that you are actively trying to keep in mind while performing a task.

Cognitive load and working memory are relevant concepts for indexing. When writing an index, I am identifying information in the text, deciding if it is indexable, determining how the information relates to other pieces of information, and then adding the entry to the index. All of that is happening within my working memory. Add in that I may read the entire paragraph before I make a decision, or I may read a few paragraphs ahead, and my working memory is suddenly bursting with potential entries waiting for me to decide whether or not—and how—to add to the index.

This is why I pick up entries as I see them. At most, I’ll read ahead a few pages before going back to add the entries I’ve identified. I am aware that if I read too far ahead, I begin to forget the specific details that I previously noticed. So I want to capture those entries right away and make space in my working memory for new information. If you are someone who prefers to mark up the text and type the entries later, underlining terms and making notes in the margins fulfills the same function. You are making notes about your decisions to refer back to later, to make room in your working memory.

When using page numbers for locators, I’ve gained a sense for the limits of my working memory and for when the cognitive load becomes too great. What I did not anticipate from indexing OUP titles is how much more the paragraph IDs added to my cognitive load.

The paragraph IDs added to my cognitive load in a few ways:

Paragraph IDs are longer and more complicated than page numbers. An ID which contains both numbers and letters is more to scan, remember, and type, compared to a digits-only page number.
There are far more paragraphs than pages. A typical book may have 200 pages and maybe 600 paragraphs, assuming three paragraphs per page. That is a significant increase in the number of unique locators to track and ensure accuracy.
Ranges are more common and lengthier. A range can occur on the same page, as in C1P34-C1P35. Or the page span 84-88 may be represented, in paragraphs, as C3P43-C3P56. Because ranges are so prevalent, I found myself constantly scanning ahead, even on the same page, to see if I needed to add a range. For ranges that spanned a few pages, I found myself more focused on identifying the correct paragraph IDs than I was on the contents of the paragraphs. Perhaps I haven’t acclimatized yet to paragraph IDs, but determining a range that spans 14 paragraphs somehow felt like more work than a range that spans 5 pages, even though both ranges are for the same amount of text.
Navigating the PDF proofs is more difficult with paragraph IDs, especially when I’m editing the index and want to refer back to the text to double-check an entry. The locator does not tell me the page, and so I can’t use my usual keystroke shortcut to jump from page to page in the PDF reader. Instead, I need to use the search function. As I mentioned above, typing the ID is more work, as it contains both letters and numbers. Searching for the ID also means that I can’t use the search function to simultaneously search for whichever name or term I want to check, which also makes searching the PDF more cumbersome.
Indexing endnotes is more tedious and time consuming. In OUP’s system, the note number is appended to the paragraph ID where the note originally appears. As in, C1P45 n.27. This means going back to the chapter and searching to find the in-text note number so I know which paragraph ID to use, while trying to remember what the note was about in the first place.

Tips for Handling Paragraph IDs

So I’m not a huge fan of OUP’s paragraph IDs. They are more work, though is it really so much more work? Yes and no.

A single paragraph ID is not that big of a deal. It maybe adds a few seconds extra to the work. The problem is that the book contains hundreds of paragraph ID. The index likely contains at least a thousand locators. All of these add up, to the point where I, at least, start noticing that the work is taking longer and that I’m mentally juggling more than usual.

I am still able to use Cindex, my preferred software, and my indexing process mostly remains the same. But I did have to reset my expectations for how long the work would take, and I made a couple of adjustments to how I worked.

Take a deep breath and accept that the work will take a little longer. For me, this was especially true when editing the index, due to how awkward it was to navigate the PDF proofs.
Multiple passes. When drafting, I found it helpful to make multiple passes, going over the same paragraphs or section a couple of times. One pass would be to determine the broader discussions and where ranges needed to begin and end. Occasionally, if relevant, I used a section ID for the entire section rather than fiddling with a range. Another pass would be to pick up smaller details, like names, that didn’t need a range. This isn’t a new strategy for me, as I often do this when the text is particularly dense or confusing and I want to have a better understanding of the text before I begin typing up entries. But due to how focused I was on ranges and ensuring accuracy with the paragraph IDs, I used this strategy a lot more to make sure I was picking up both the content and the locators. There was too much to focus on in one pass.
Duplicate the endnotes. It was time consuming and frustrating to flip back and forth between the endnotes and the chapter, trying to find the paragraph where the note is found while also trying to remember what the note is about. Much easier to create a duplicate PDF of the endnotes (printing the notes would also work), so that I can see and compare the notes and chapter is parallel.
Search for partial IDs. When searching the PDF for paragraph IDs, I found it usually worked just as well to omit the initial C. So, to search for 3P34 instead of C3P34. A small detail, but I felt like it was a tiny bit quicker.
Charge a little extra. For the extra time and work, I did charge more for these indexes. I was also upfront with my clients about this. I think it is fair to be compensated for extra work.

Will I index an OUP title again? Yes. I’m actually in discussions with another potential client.

Will I go out of my way to find OUP titles to index? No.

I do appreciate that OUP wants to include the index in ebooks. And the paragraph IDs are a good approach, in theory. I just wish that the IDs weren’t so awkward to use, and that there aren’t so many of them. Indexing is already cognitively taxing, and adding to that load isn’t helpful. But with some forewarning and tweaks to my approach, indexing OUP titles is very doable.

Article

January 21, 2025

0 comment

by Stephen

The Building Blocks of an Index

An index is a document that is scanned to find information. It usually spans several pages. But if you had to break an index down into its smallest part, what would that be?

An index is not like most books or documents in that it does not contain a narrative. It cannot be reduced to plot points or the components of an argument. An index doesn’t even contain proper sentences. Instead, an index is a compilation of references. Broken down, the smallest unit within an index is an entry.

An entry has two components. Basically, “what this thing is + where to find it.” Using indexing terminology, this is “main heading + locator.” Or, to add another level of specificity, “main heading + subheading + locator.” From the entry, the reader can identify what they are looking at and where to find it in the text. For example,

Foxconn, 45

semiconductors: geopolitics of, 67

The second building block is an array. I like to think of an array as containing everything that an index—and by extension the book or document—has to say about a particular subject. If you want to learn about Foxconn, you search for the Foxconn array. Want to learn about semiconductors, you search for the semiconductors array.

If there is only one mention, then a single entry can serve as a single array. But more often, there are multiple discusses throughout the book, which lead to the creation of multiple entries. Combined together, the entries create an array.

Foxconn, 45, 49, 51-52

semiconductors: fabrication techniques, 54-57; geopolitics of, 67; history of, 23-25; properties of, 34, 44

Why are entries and arrays so important? No one writes an index composed of a single entry.

But every index begins with an entry, and as the index is written, the entries and arrays accumulate. It is through knitting the entries and arrays together than an index emerges.

Step one to writing an index is to write clear, concise, and specific entries, so that “what this thing is” is clear to the reader. Step two is to combine entries into arrays which are clearly organized and easy to scan. Step three is to sort and organize the arrays—creating the structure of the index—so that the index as a whole is easy to navigate.

Each of these elements—the entry and the array—fit together, like interlocking pieces, to create a coherent whole.

A Note about Terminology

I’ve noticed that not every indexer, including books about indexing, distinguishes between entries and arrays. I’m guilty myself of using the terms interchangeably, though I try to be clear when I’m writing.

But while terminology varies, I do think that the distinction is important. Because an index is composed of hundreds or thousands of pieces of information, it helps to know what these pieces are and how they interact with each other. An index is also easier to edit and organize if these building blocks are clearly written and well thought out.

As you index, how are these building blocks fitting together? How can you be more mindful of each piece of information and how it interacts with the entries and arrays around it? Does it make a difference to think about indexing as building up from the smallest unit to the larger whole?

Article

November 20, 2024

0 comment

by Stephen

Making the Index Invisible

So the 18th edition of the Chicago Manual of Style dropped in September. I have to admit I have not bought a copy. While I think their recommendations are solid, I find I don’t use it very much, since I only index and not edit. But I do know some editors who are very excited about the new edition, and there has been chatter among indexers as well on the changes to the chapter on indexing.

The main change in regards to indexing is 15.66, which states:

Chicago now prefers the word-by-word system of alphabetization over the letter-by-letter system (but will accept either in a well-prepared index).

I think this change makes sense.

I personally most notice the difference in sorting when indexing Asian studies books, where I tend to see a lot of surnames like Chen, Kim, and Liu. Being so short, these names often get mixed up with other headings when sorted letter-by-letter, whereas I think the index is easier to scan if all of the surnames are sorted together. I’ve also received instructions from a scholarly press to sort the index letter-by-letter except for the names, which the press wants force-sorted word-by-word. Which begs the question: why not sort the entire index word-by-word?

For example, here is a comparison of letter-by-letter compared to word-by-word.

Letter-by-letter sorting

Liang Ji

Liang Qichao

Libailiu (Saturday)

Li Boyuan

Li Chen

Life Weekly

Lin Meijing

Li Shirui

List, Friedrich

Liu Denghan

Liu Jiang

Liu, Jianmei

Liu, Lydia

Liushou nüshi (Those Left Behind; film)

Liuxuesheng (overseas Chinese students)

Liu Yiqing

Li Yuanhong

Word-by-word sorting

Li Boyuan

Li Chen

Li Shirui

Li Yuanhong

Liang Ji

Liang Qichao

Libailiu (Saturday)

Life Weekly

Lin Meijing

List, Friedrich

Liu Denghan

Liu Jiang

Liu, Jianmei

Liu, Lydia

Liu Yiqing

Liushou nüshi (Those Left Behind; film)

Liuxuesheng (overseas Chinese students)

The word-by-word sorting, for me, is a lot easier to scan and parse when like surnames are grouped together, and when names are sorted together above other terms. It makes me confident that I am seeing all of the names present, rather than being concerned that I am missing a name that is buried below.

Also note that the Liu names are sorted according to the clarified 15.85, which states:

When the same family name is inverted for one person but not for another (e.g., “Li Jinghan” and “Li, Lillian”), the names may be listed together and alphabetized by first names regardless of the comma.

This also makes a lot of sense and has been my practice for a long time. By ignoring the comma, the second portion of the name is treated equally for all names, whereas if the comma is taken into account, all the names with commas sort to the top and may cause some names to appear out of order. For example,

Liu, Jianmei, 48

Liu, Lydia, 91

Liu Denghan, 148n6

Liu Jiang, 105

Liu Yiqing, 27, 144n13

For another interesting comparison, as a colleague pointed out, try looking for the sorting differences in the indexes between the 17th and 18th editions of the CMOS. And if you’d like to see a full list of the changes to the indexing chapter in CMOS 18,see here.

So will I now unilaterally switch to word-by-word sorting for all of my clients who request that the index follows CMOS? I don’t think so, unless I think that the index will really benefit. I think it is better if I first ask my clients if they want to change, so we are both on the same page and I am not springing a surprise on them. And, to be honest, for most indexes I don’t think that the difference between word-by-word and letter-by-letter sorting will be that noticeable.

This brings me to my larger point, which is that the mechanics of a well written index should be invisible to the user. I doubt that any reader will browse the index and think, “I wonder what the alphabetical sort is?” That is not the reader’s concern. What the reader cares about is quickly finding information.

To facilitate finding information, every aspect of the index needs to work together. This includes the sorting, the structure, term selection, phrasing, and cross-references. When it works, the reader shouldn’t notice how the index works because the reader is too busy digging into the book. When the index does not work—that is the point when the reader is pulled out of the index and is frustrated at their inability to access the information they want. The reader may not be able to articulate whythe index is not working, but something about the contents and mechanics of the index is wrong.

Bringing this back to sorting, for many indexes the difference will be negligible between letter-by-letter and word-by-word sorting. As CMOS states, they will accept either in a well-prepared index. For other books, like for me with Asian studies titles, the difference will be more pronounced.

When indexing, pay attention to when the difference matters. Make decisions based on what will make the user experience the most seamless. Pay attention to how the different elements of the index fit together. Striving to make the index invisible may be an odd way to think about indexing, but to be invisible means that the index works, which is what we ultimately want for our readers.

Article

November 5, 2024

0 comment

by Stephen

Paying Attention to Terminology

I am writing today about some decisions that I needed to make on a recent index. In the grand scheme of the index, these decisions only affected a few entries. I am tempted to brush these off as not very important and not worth discussing. Yet much of indexing is about paying attention to the details without getting lost in the details. And I think this is a unique situation that illustrates an important point about term selection. At least, it made me sit up and think carefully as I was working.

A good index encapsulates two different goals, which can sometimes seem like they are in opposition to each other. The index needs to be both a reflection of what the author has written and be an attempt to clearly communicate with the reader. Lose one of these aspects, and the index ceases to function.

Term selection is key to achieving both of these goals. The terms used in the index need to both match the text and how the reader is likely to search. Ideally the author and the reader are in alignment, but sometimes the author uses different language than what the reader might expect. In those situations, the index may need to bridge the gap.

I recently ran into this issue when writing the index for Saint Paul the Pharisee: Jewish Apostle to All Nations, by Father Stephen De Young (Ancient Faith Publishing, 2024).

If you are familiar with Christianity, the title may be a hint that the author is taking a different tact with terminology. While Paul was a pharisee prior to his conversion, he is now more commonly known as the Apostle Paul, or Paul the Apostle. Yet here Fr. Stephen is emphasizing Paul’s Jewishness.

In the book’s Introduction, Fr. Stephen addresses this question of terminology:

Throughout this book, I have deliberately eschewed certain language. This language is certainly acceptable and has become the usual language of the Church. However, familiar terminology can sometimes be misleading. By using the word Messiah instead of Christ, community instead of church, or Torah instead of law, I hope to unsettle commonly held notions and help the reader reassess Paul in his historical context, rather than project the experience of present-day Christians into the past.

This shift in terminology also extends to names, which is where I noticed the biggest difference in regards to the index.

In addition to “Paul the Pharisee,” Fr. Stephen also frequently refers to Paul by his former name, Saul of Tarsus. Jesus is referred to as “Jesus of Nazareth,” rather than Jesus Christ. A figure such as the Apostle John, also known as John the Evangelist, John the Theologian, or John the Divine, is here referred to as John, the son of Zebedee. None of these names are incorrect, but they are names that are less commonly used. They support the author’s focus on Paul and the early Church’s Jewish context and alerts readers that the author is taking a different approach.

From an indexing standpoint, do I follow Fr. Stephen’s lead? By using these names, I would provide continuity with the text and reinforce the point that Fr. Stephen is trying to make. But will readers still recognize these names in the index, outside of the context of the text? I am not helping anyone if I include names and terms that readers are unlikely to recognize.

In the end, I decided to lean into the author’s terminology. Christians form the primary audience for this book and, I assume, are familiar enough with with these Biblical figures, even if these are not the names typically used.

Paul I simply indexed as “Paul.” As the subject of the book, I decided a gloss was unnecessary. I also included a See cross-reference from Saul of Tarsus, for any readers looking under Saul and to keep all discussions of Saul/Paul in a single array.

I indexed Jesus as “Jesus of Nazareth,” with a subheading for “as Messiah,” to reflect how the author discusses Jesus. I indexed the other Biblical figures as is (“Peter,” “Silas,” “Timothy,’ etc…) except for when a gloss or tag was needed to disambiguate (for example, “James, brother of the Lord’ and “James, son of Zebedee”). This is again following the author’s approach and trusting that readers will recognize these names.

I did, however, include glosses for several of the provinces and cities discussed, such as “Achaia (province)” and “Perge (city),” especially the less well-known places (I didn’t include glosses for cities like Athens and Rome). This may not have been necessary, but I personally like knowing where things are and what things are, so as a reader I would have appreciated the differentiation.

As I wrote at the beginning, these names form a small proportion of the overall index. Was it really worth spending time considering how best to balance the author’s approach versus reader expectations? There are plenty of other discussions in the book, such as discussions about Paul’s missionary journeys, the history of the early Church, and theological issues that Paul addresses in his epistles, that I also wanted to get right.

And yet names matter and terminology matters. The index would have presented a different message if I had used more conventional names for these figures and the index would have appeared disjointed from the text. Writing a good index is often about paying attention to the details so that the entire index works together as a whole and in conjunction with both the text and readers. The trick is to see both the details and the whole. It can be easy to lose sight of the big picture.

For this book, while the author opted to shift the terminology to make a point, I decided that most readers would still be able to follow along in the index. I didn’t need to include much in the way of signposts and clarifications. But for other books, extensive use of cross-references and glosses may be necessary. While reflecting the text and the author’s intentions, the index also needs to be responsive to readers. Thankfully, we have tools to bridge that gap.

The first step, though, is paying attention to the language used by the author. The next step is considering the audience. Do the two match? From here you can select terms and write an index that is clear and recognizable to all.

Article

October 8, 2024

0 comment

by Stephen

My Index Editing Process

Last time I wrote about reading like an indexer and what it is I do and look for when reading a text and writing the rough draft of an index. Today I’d like to reflect on my editing process.

A few months ago I started tracking my time when I index. I had previously done so, but not effectively and I eventually gave up. This time, I’ve created a new system and a new spreadsheet that is much easier to use, and I am a lot happier with the results.

One of my insights so far is that I spend about an equal amount of time drafting and editing. I have to admit that this surprised me. I knew that editing took up a fair amount of time, but I didn’t realize that the time spent is often about 50/50. For some indexes, I actually spend a little more time editing, making the time split closer to 45/55 or even 40/60.

Reflecting further on my process, I tend to spread drafting the index over 3-6 days, depending on the length of the book. Whereas I tend to edit within 2-3 days. When drafting, I am learning what the book is about. When editing, I am fully immersed in the index and I treat it more like a sprint. It probably also helps that by the time I get to editing, the deadline is looming.

I’m realizing that I also tend to draft quickly. I do try to write a fairly clean draft, taking into account context, clarity, and relevance, as I previously discussed. I believe in trying to set myself up for an easier edit. But I also know that this is not my final draft and that some things won’t become clear until I’ve read the whole book, and so I also try to keep moving.

Editing an index, for me, is both seeing the index as a whole and going through the index line by line. I like to give myself space between drafting and editing, which usually means sleeping on the draft and beginning to edit the next day. This helps to give me some distance so I can more clearly see the whole index with fresh eyes.

I usually begin by skimming the index, making note of the larger arrays for the metatopic and supermain discussions. This reminds me of the structure I am aiming for, and is a chance to consider if I want to make any major changes. I then start at the top of the index and work my way down, line by line. I know some indexers edit using multiple passes, each pass looking at a different element. I think I would go utterly cross-eyed and unable to make sense of the index if I tried multiple passes. Instead, my goal is to fully edit the array in front of me before I move on to the next. This may mean jumping around the index to also edit related arrays, and sometimes I will go back to re-edit an array if I change my approach, but generally speaking, I systematically move through the index.

With each array, I am first of all looking for clarity. Does the main heading and any subheadings make sense? If there are subheadings, I look to see if any can be combined or reworded, or if subheadings need to be added for unruly locators. I consider if anything needs to be double posted, and check to make sure that is done properly. I consider and check cross-references. I investigate any notes I may have left for myself. I also spot-check a few locators to make sure I understood the text properly. I may also run a quick search of the PDF to see if I missed any references. I don’t check every locator, which I think would be very time-consuming—to a certain extent, I need to trust that my drafting process was thorough and accurate—but these spot checks do provide peace of mind and I do sometimes find errors.

Reviewing arrays with no subheadings is usually quick, unless I’ve left a note for myself or I decide to spot check. Arrays with subheadings take more time. If an arrays has 20+ subheadings, I may spend as much as twenty or more minutes making sure that the array is in order. I often find the larger the book, the larger the index, the more subheadings there will be, and the longer editing will take.

Considering my process, I do wonder if I can shave off time. I could spot check a little less, especially for simple arrays with no subheadings, trusting that I picked up what was necessary. I can also pay more attention, when drafting, to larger arrays, so that editing them isn’t so onerous. I could also explore using more macros and patterns for batching tasks such as double-posting or removing subheadings. What I like about my process, though, is that it is thorough and I can clearly see what is completed and what is still to come. Editing line by line helps to keep my thoughts in order.

Other Approaches to Editing

My approach to editing is not the only approach, of course. I’ve mentioned making multiple passes. I also know of indexers who do a quick edit at the end of each day, while drafting, so that the draft is cleaner. I’ve also heard indexers who say that they do such a thorough job drafting that the editing process only takes them a couple of hours. I don’t know how that works for them. I seem to need a lengthier editing process for the index to gel and come together. And that’s okay. We are all different. What matters is that you find a process that works for you.

I find it interesting to hear how others index, even if it is not something I would do myself. I hope this glimpse into my process gives you something to think about.

Article

September 17, 2024

0 comment

by Stephen

Reading Like an Indexer

So you are sitting down to write an index. You scroll to the first page in the PDF, or, if you’ve printed out the proofs, you place the first page on the desk in front of you, and then…what? What is your thought process? How do you decide what entries to extract? How do you read?

Reading to index is different than reading to edit, reading to learn, or reading for pleasure. I think of reading to index as a process of disassembly. I try to identify how the author has written and structured the text, and I then pull apart all of those pieces, big or small, and reassemble them into the form of an index. This is very much an active reading, in which I am identifying, analyzing, and making decisions.

I generally look for two types of information when I draft an index.

Specific details. These are names, places, companies, concepts, etc… that are explicitly mentioned and discussed. These are usually fairly obvious. If there are a lot of names or other such details, I may index a few pages, pick up these details, and then go back and re-read to make sure I also understand the larger discussion.
Broader topics. These range from the metatopic—what the whole book is about—to supermain and regular discussions—both themes spanning the book and what specific chapters or sections are about. It is important to have index entries which correspond to these broader discussions, and so in addition to picking up specific details, I try to also understand the big picture. These broader topics are also tied to the structure of the index, as I consider how best to reflect the book’s structure in the index, and as I anticipate that these large discussions will become large arrays, anchoring the index. Depending on the book, as mentioned, I may need to read a section two or more times to properly mine all relevant entries.

Once I have identified the large and small pieces that the book is made of, I need to decide how to translate that into the index. Here are a few tips I find helpful to keep in mind.

Understand what you are reading. This may seem obvious, but I think it is worth stating. The temptation, at least for me, is to guess if I am unsure and to create an entry anyway. And sometimes guessing is the best I can do in that moment. I flag the entry for revisiting later and I move on. What can be more effective, though, is to read ahead a few pages until I do understand, and then go back and create the entry. It’s okay to be patient. Taking the time to understand can pay off later with better understanding of what comes next in the text and with less editing due to a stronger draft.
Place the information in context. Are you looking at a specific detail or a broader topic? How does the detail or topic relate to other details or topics? Can this be turned into a subheading? Should it be double-posted? Is a cross-reference necessary? What other entries does this suggest? While subheadings, cross-references, and double-posts can all be revisited later, when editing the index, I like to start thinking about them while writing the rough draft. The information in the book is an interconnected web, which the index should reflect. So as part of your thought process, get in the habit of looking for these connections.
Filter for relevance. In addition to understanding the larger context, also pay attention to relevance. Think about the audience before you begin writing the index. Consider how much space is available for the index. What should the index focus on? Sometimes I am not sure if an entry is relevant and so I pick it up anyway, labeling it for possible deletion later. But the more I can filter out now, the less I need to cut later.
Communicate with clarity. This is especially true for subheadings. Make sure that readers understand what this entry means. Be concrete and, where relevant, link back to the larger context. You don’t want to leave readers guessing, nor do you want to leave yourself guessing when you come around again to edit.

All combined, this is a lot to do while reading and indexing. It can be difficult to identify both specific details and larger discussions, while also weighing relevance, and paying attention to the context, and thinking about related entries, and thinking about how best to phrase for clarity. Reading to index is a skill that takes practice.

Remember too that the rough draft does not need to be perfect. My drafts are certainly not perfect, and while I am thinking about all of this while drafting, I spend about an equal amount of time editing.

How you read is up to you. I tend to start reading and I type entries into Cindex, the indexing software that I use, as the entries come to mind. Other indexers prefer to first mark up the proofs, identifying what is indexable and making notes for themselves, before they go back and type up the entries. There is no right or wrong approach, so long as you are paying attention to all aspects of the text, both big and small.

If you are newer to indexing, you may find marking up the proofs to be a good way to visualize or make concrete this thought process. I marked up proofs the first 3-4 years that I indexed, which in hindsight was necessary for me to engrain this way of reading. Once indexing started to become habit, I stopped marking up, though I still read ahead sometimes to better understand what the text is about.

Writing an index is a unique way to interact with the text. It does require a shift in how you read and see the text. Once you make that shift, indexing becomes easier.

Article

August 13, 2024

0 comment

by Stephen

Is AI Indexing Nearly Here?

No surprise, publishing continues to react and interact with artificial intelligence. A couple of colleagues recently raised AI on a couple of indexing email lists. I get the sense that many indexers are concerned about the potential for AI to replace us, or at least that publishers will believe that AI can replace a human-written, thoughtfully constructed index. I have to admit I also feel uncertain about what the future holds. I wrote about AI and indexing last year, and I think it is worth considering again.

Is Indexing by AI Nearly Here?

One colleague flagged this article from The Scholarly Kitchen, “AI-Enabled Transformation of Information Objects Into Learning Objects,” by Lisa Janicke Hinchliffe. Hinchliffe reviews three new AI tools which purport to help readers access and understand academic writings. Of particular interest to indexers is what Hinchliffe writes about Papers AI Assistant:

When exploring the functionality as a beta tester, I was curious how the results compared to my pre-AI tool practice of making heavy use of CTRL-F to locate keywords in lengthy texts. I found that, not only did the Papers AI save me a great deal of time by providing me with an overview annotated with links to specific sections of the text, it also often alerted me to places in the text where my topic of interest was conceptually discussed without the use of the specific keywords I would have searched.

Did you catch that last bit? Papers AI Assistant can apparently identify discussions of interest without the use of a keyword search. That is what a good index is supposed to do. Is this the beginnings of an AI that can replace indexers? Hinchliffe also writes that, “I am excited by the possibilities these AI tools offer for moving the focus from access to information to comprehension of it.”

A few thoughts: I have to admit that I am skeptical of the claim or hope that Papers AI and similar tools will help readers comprehend information. My sense is that AI works best as a tool, with the user clearly understanding its strengths and limitations, and with the user making the final decision on the quality of results and how best to use the results. That is similar to how I use the search function when indexing. Search is useful for double-checking facts and mentions, but I know that it doesn’t catch everything and isn’t good at providing context; I still need to read and understand the book. My fear is that many users will uncritically accept whatever the AI tool tells them, turning a program like Papers AI into glorified CliffNotes and enabling an even shallower engagement with the text.

I think it is also worth pointing out that what is described here is not an index. An index is a static document that is browsable. That is very different from an AI highlighting a handful of potentially relevant passages. Browsability is key to an index because it allows the user to serendipitously find information they didn’t know they wanted to find. Being handed a few options leaves the rest of the text opaque and unaccessible. I imagine a user can keep asking the AI new questions, but that puts the onus on the user to know what they are searching for and how to ask relevant questions.

Of course, if an AI can identify concepts and discussions in the absence of clear keywords, then a logical next step could be to ask that AI to generate an index. I can see value in the ability to create an index on the fly, for any document. I don’t know how much I would trust such an index, though. Hallucinations is one issue. Another is that AI, essentially, is built upon algorithms. Answers are always going to be follow a certain pattern. While indexing is built upon rules and conventions, the indexer also plays a key decision-making role as they shape the contents, phrasing, and structure. These judgement calls extend beyond the formal rules of indexing to take into account elements such as the audience and usability. I am skeptical that an AI would be able to understand and produce these nuances.

Another issue is that these AI tools are entirely digital. They will not work on a print book, though, of course, an AI-generated index could be published in print. Is the future of publishing and of engagement with texts entirely digital? Perhaps in academia and other specialized fields, in which there is so much information to access and consume. Print sales remain strong, however, and I am hopeful that there will continue to be a place for print indexes. Perhaps the future—finally arrived?—is what embedded indexing has long promised, which is one index capable of being used in multiple formats.

Besides AI replacing indexers, I think it is also worth considering how we as indexers can use AI in our own work. I am aware of one colleague who uses ChatGPT to summarize complicated books and to answer queries about the text, which helps that indexer comprehend the book more quickly. Which sounds very similar to what Papers AI claims to do. I think that is a legitimate use of AI. So long as the indexer is in control—using the AI as a tool, understanding both indexing best practices and the contents of the book, and is actively shaping the index—then why not use AI? I’m also open to having AI index elements which are time-consuming to pick up, such as scientific names, so long as the indexer is providing quality control. What I don’t want to see are indexers—or anyone else—passively accepting an AI-generated index, assuming that it is accurate and functional when it is actually not. That is my worse nightmare about AI, that we abdicate our critical thinking and decision making skills, potentially leading to errors and disasters because we have lost the ability to assess what AI is telling us.

Author Pushback

In contrast to the gold rush to embed AI into publishing, another colleague pointed out that some books are beginning to be published with prohibitions against AI and machine learning listed on the copyright page. I also recently noticed this in a book I am indexing.

I’ve also heard from a trade client that their authors are starting to insist that book contracts include a clause that their books will not be uploaded or otherwise used to train AI. By extension, this means that all freelancers hired by this press, including myself, are not allowed to use AI tools while working on their manuscripts and proofs (which isn’t a problem for me, since I wasn’t doing so anyway).

Will authors and publishers win against AI? Will publishers find ways to enforce their contracts and prohibitions? Will publishers change their minds, or will AI developers sufficiently address the fears that authors have? I suspect this may be an area where the publishing industry goes in two different directions: some segments, such as academic publishing, which prize easy access to information (provided you can get behind the paywall), will embrace AI, while other segments, which care more about the author and which sell directly to readers, will reject AI.

Or, maybe AI in publishing is a bubble and these new applications will fail to live up to their hype.

I still think that someone will try to develop an AI capable to writing an index. Some publishers will probably adopt it for the sake of saving time and money, even if the resulting indexes are useless. I am also hopeful that the value of the human touch will remain. Even if AI is incorporated into our work, I think there is still place for human guidance and discernment. Machines may be capable of generating an approximation, but only humans can create what is truly useful for other humans.

Article

June 25, 2024

0 comment

by Stephen

When Subheadings Are Not So Useful

I love subheadings. They add so much to an index, breaking down long strings of locators into smaller chunks, highlighting meaning distinctions, and gathering related entries into lists so readers only need to search in one place. As I discuss in my last reflection, subheadings can also reflect the story that the text is telling. Well-written subheadings are clear, specific, and meaningful.

But…in indexing there is always a but. Occasionally, a project comes along that proves the exception.

This happened with a recent index I wrote, for To See What He Saw: J.E.H. MacDonald and the O’Hara Years, 1924-1932, by Stanley Munn and Patricia Cucman (Figure 1 Publishing, 2024). J.E.H. MacDonald was a Canadian painter and a member of the Group of Seven. He fell in love with the landscape around Lake O’Hara, in the Rocky Mountains, and spent several summers there painting. This book takes an interesting approach to MacDonald. Over the course of almost twenty years, the authors sought to identify the exact locations where MacDonald painted. The bulk of the book is composed of a brief discussion of each of the O’Hara paintings, alongside a photograph of what the scene looks like today. The rest of the book is composed of an introduction, an overview of each of MacDonald’s eight trips, and excerpts from MacDonald’s diaries and other writings. The result is a beautifully illustrated coffee-table book.

The instructions from the press were to only index the paintings, people, and places. While narrow in scope, there isn’t too much else discussed, and these are what readers are most likely to want to find, so I thought the instructions reasonable. Figure 1 Publishing is also very good at providing clear specifications for how long the index can be. For this book, the specs were 55-60 characters per line, for 675 lines total.

I quickly realized that the book mentions a lot of paintings and places. The book discusses 226 paintings, almost all of them by MacDonald. With each painting taking up at least a line, some of them more, the paintings alone fill up about a third of the index. The rest of the index is mostly places—mountains, lakes, creeks, trails, huts—in and around Lake O’Hara that MacDonald either painted or visited. In comparison, only a few people are mentioned.

I also realized that the book contains a lot of repetition. For example, the same mountain may appear in a couple dozen different paintings. That mountain is mentioned again in the overviews of MacDonald’s trips, and then again in MacDonald’s diaries. This kind of repetition makes sense given how the book describes the same events and paintings from different angles, but it does mean that the mentions add up. Arrays with especially long strings of locators include Cathedral Mountain (49 page references), Hungabee Mountain (39 references), Odaray Bench (34 references), Lake McArthur (32 references), and Lake Oesa (27 references).

Normally, I would add subheadings to these arrays. Asking readers to look up each page reference is a big ask. But for this index, I left those strings, for paintings and places, intact.

Not using subheadings was a conscious decision, and one I didn’t make lightly. My initial instinct was to find subheadings. But as I indexed and considered the entries, I also realized that subheadings would not be so useful in this particular index. Wanting a second opinion and to avoid surprising the press with a departure from my usual approach, I also queried the editor I was working with and got their approval.

I decided to not use subheadings for two reasons. One, I realized that too many subheadings would quickly make the index too long. Unfortunately, space constraints can sometimes mean putting aside the index that you want to write for the index that fits. In these situations, I need to be strategic about picking and choosing the subheadings that will have the biggest impact, while also being okay with other arrays not having subheadings.

More importantly, though, for this book, I couldn’t think of subheadings that I was satisfied with. For subheadings to be effective, they need to clearly articulate additional information that readers can use to narrow their search. But what if there are no clear distinctions between locators? In that case, I think the long strings of locators should be left alone. It is not helpful to introduce artificial distinctions or to get so granular that context is lost.

As I mentioned, this book contains a lot of repetition. Places either appear in MacDonald’s paintings, are places that MacDonald visited, or both. This doesn’t provide much to hang a wide range of subheadings.

I briefly considered listings all of the paintings that each mountain or other feature appears in, along with a subheading for MacDonald’s presence at. For example,

Cathedral Mountain: MacDonald at; in painting 1; in painting 2; in painting 3; in painting 4; in painting 5; etc…

But this approach presents a few problems. Some arrays would have been enormous, with a dozen or two subheadings for each of the paintings. Besides the space issue, I’m not convinced that listing each painting would have been meaningful to readers. Would readers remember the titles of individual paintings? In many cases, multiple paintings shared the same title. Thankfully, the authors give each painting a unique alphanumeric code, which I included in the index to differentiate. For example, “Lake O’Hara (25-1.3(S))” and “Lake O’Hara (30-3.1).” But I imagine it would still be difficult remembering which is which. Alternatively, I could have created a subheading for “in paintings,” but that would have still resulted in a long string of locators, as would the subheading “MacDonald at.” “MacDonald at” also isn’t very useful since readers can presumably assume that MacDonald was there, as that is the focus of the book.

Given the space constraint and that either way—with a couple of generic subheadings or without subheadings—the arrays would have long strings of locators, I decided it was best to keep the arrays simple and to forego subheadings. This does mean that readers will need to search through each locator, though readers should also quickly notice the repetition, and it is all there for the dedicated searcher.

This isn’t to say that I avoided subheadings entirely. I did use them in a few places, mostly for people, though even with people I found it difficult to avoid longer strings of locators. Many of these references are brief mentions and again reflect the repetition throughout the book. For example, here are two arrays for MacDonald’s friend, George Link, and wife, Joan.

Link, George K.K.
about, 234, 343n83
Lake O’Hara Trails Club and, 340n25
MacDonald and, 82, 91, 104, 107, 113, 114, 143, 215, 224, 233, 234, 239, 243, 249, 252, 253, 254, 256, 258, 264, 294, 301, 307, 308, 310
photographs, 246, 260

MacDonald, Joan
encouragement from to travel west, 13, 202, 205
letters to, 93, 96, 115, 120, 121, 122, 131, 167, 175, 200, 203, 204, 205–6, 211–12, 229, 230–31, 236, 240, 256, 265
Links and, 341n47
MacDonald’s departure west and, 250
mentions in MacDonald’s diary, 304, 308
O’Hara trip with MacDonald, 36, 123, 191, 217, 221, 222, 223–24, 259
photo album, 224

While I highly encourage you to include subheadings and to make sure that subheadings are clear, specific, and meaningful, I think it is also worthwhile considering the exceptions to the rule. I hope that my approach to the index for To See What He Saw, about J.E.H. MacDonald’s paintings in and around Lake O’Hara, is helpful for considering when subheadings may not be useful. If there is a lot of repetition in the text, if it is difficult to find meaningful distinctions, and if there is a hard space constraint, then it is okay to have long strings of undifferentiated locators. It is not ideal, but it may still be the best solution for that particular text and index.

Stephen Ullstrom

Indexing & Writing

All Posts Filed in ‘Work Process’

Format: Pulling the Index Together

Signposts within the Index

How to Use

Formatting and Placement

Cognitive Load and Indexing Oxford UP Titles

The OUP System

Impacts on Cognitive Load

Tips for Handling Paragraph IDs

The Building Blocks of an Index

A Note about Terminology

Making the Index Invisible

Paying Attention to Terminology

My Index Editing Process

Other Approaches to Editing

Reading Like an Indexer

Is AI Indexing Nearly Here?

Is Indexing by AI Nearly Here?

Author Pushback

When Subheadings Are Not So Useful