I've had this nagging question in the back of my mind for a while. It started when I worked at the local library long time ago. Some books would be sorted by "The" and others wouldn't, sometimes "the" is simply put at the end of a title or name when sorted, preceded by a comma. Take the following example, a list of books.
Which is the right way to sort The Good Soldiers and THE AGE OF WONDER: How the Romantic Generation Discovered the Beauty and Terror of Science. Notice that even these two are slightly different in the way "The" is used, also for this example we're not considering the authors.
- BOTH WAYS IS THE ONLY WAY I WANT IT
By Maile Meloy
By Jonathan Lethem
A GATE AT THE STAIRS
By Lorrie Moore
HALF BROKE HORSES: A True-Life Novel
By Jeannette Walls
THE AGE OF WONDER: How the Romantic Generation Discovered the Beauty and Terror of Science
By Richard Holmes
- THE GOOD SOLDIERS
By David Finkel
The actual books on the shelves in a library are sorted in one of two methods: the Dewey Decimal System or the Library of Congress system; any other cataloging system for books right now is really inconsequential in the grand scheme of things. The reality is that even the Library of Congress system is not widely adopted so the Dewey system really is the defacto standard in sorting books.
When speaking of other items, outside of books and libraries often times, It seems that list sorting might be a preference set by whomever is publishing the items in question. I finally decided to write a little bit about this when I looked up the membership list for the W3C. I had guessed that The Apache Software Foundation would be a member; they are. However, when you visit the list, you'll find that they're buried all the way down on the "T's". Some might say, that's where they belong; I say they should be listed as: Apache Software Foundation, The.
By Wikipedia's own definition:
The word the is the only definite article in the English language, and the most frequently used word in English.
This means the is incredibly versatile and widely used. Google for example considers it a stop word. So why do people keep sorting stuff starting with the.
In the case of The Age of Wonder:... I think the title should be left as is and sorted taking the into consideration. In the case of The Good Soldiers, I think it should be sorted as: Good Soldiers, The. Maybe I'm old school or something.
Surely this presents a philosophical or political question when deciding how to sort a list --I don't see that it could be much different from a technical point of view. Do you consider articles like "The", and how about other stop words, how do you sort those? Should programs and even webapps just offer this as a setting and let the user decide?