Let’s talk about book discovery.

I’ve been thinking about this topic a lot lately since I was asked to participate as a speaker on the topic at Mini-TOC Vancouver later this month. I’ve been mulling around in my head the last few days about what I could realistically bring to the table in this conversation and what would be most relevant. Keeping with the spirit of TOC, I’ve decided to aim for that space beyond the now; really looking at what is possible and what this should be.

So, I started looking through a couple sites which, seemingly, offer “discovery” services to their users. The objective being that we can all find new and interesting books. Unfortunately they all fell flat. Every single one came up with an almost parroted list of the same bestsellers and top titles that we’ve all heard about before many times. On one site alone, I saw Outliers, The Lean Startup, the Jobs bio and all three Hunger Games titles. Not only are these major bestsellers, all of them, but they are also not the most recent bestsellers. How is this helpful? Or, let me phrase it this way: how is this discovery?

The discoverability problem in books is a challenge. It’s about connecting users to new and interesting titles, that they wouldn’t normally have seen. This last part bears repeating: …that they wouldn’t normally have seen.

Ultimately, the problem with all these discoverability sites is this: their algorithms (if they are even using an algorithm) are based on aggregate data in a one size fits all model. The more people who read something, the more often it shows up in your recommendations. But, that’s not discoverability. That’s the NYT bestseller list. That’s Nielsen Bookscan telling you the top sales of the week. Just because most of my friends are reading bestsellers (because, duh, whose aren’t? In fact, that seems to just reinforce the concept of the term “bestseller”) does that mean I should only be shown these titles?

Obviously, the answer is no. But, how do we get there?

Let’s take a step back. There are two distinct operations that must be executed in order for discovery to take place. The first is having a pool of books or products to pull from, ie an inventory of sorts; the second is connecting people to the right things in that inventory. (Here I don’t use inventory to mean having books sitting in a warehouse; rather, inventory as in a simple list of products.)

I argue that the first operation should and must be accomplished by humans. A curated list of products should be offered. To make discovery work, we have to rely on offering books that speak to a range of tastes, which is best elucidated by a group of individuals who make judgments. This is similar to the “staff picks” section of your local bookstore.

The second operation is where algorithms come into play. This is where user behavioral data coupled with data collected on customer preference (ie building a predictive model) allows us to connect those selected materials to the users in the system. Here a machine is better than a human and can provide for efficiency and scale. Here, users get better selections based on a range of preferences and are able to truly find new things. Here, we are not talking about selections based on aggregate, everyone is reading this so you should be too, data. We are talking about a less macro-level data set which makes connections.

Lately, I’ve started using the service offered by ClubW. If you are unfamiliar with ClubW, the concept is simple. Basically, it a new take on the concept of wine of the month club. A monthly selection of 12 wines are offered; the selection having been made by sommeliers on staff. First month, you answer 5 questions about your taste in food (do you like citrus? how do you take your coffee? etc.) and a selection of three bottles is made. You can go with their selection or choose other bottles. A box arrives with your three bottles of wine. You enjoy the wine. While or after enjoying the wine, you are able to rate each bottle individually. This informs the algorithm.

So there you have it. A human curated inventory, an opportunity to input user data, and an algorithm that is refined over time as users tell you more about what they liked and what they didn’t like. Did I mention that I have never heard of 90% of the wines offered? And I know a thing or two about wine. The point here is that I am trying and discovering things I normally wouldn’t. And they are not trying to sell me Two Buck Chuck because everyone I know is drinking it.

Back to books.

Clearly, this model implies a few things: A stronger set of metadata employed in such a way that allows for an algorithm of this sort to function (whispers strategic metadata), a group of individuals with a deep, or at the very least good, understanding of different markets to make selections (narrow and deep v. broad and superficial; hopefully individuals who are readers of these specific genres or verticals themselves; hopefully individuals like librarians and booksellers who are experts at making this connection already), and an infrastructure to accommodate.

This ideal carries implications on almost every step of the publishing process. Our workflows have to change. Our strategies may change. Our business models will look different.

The point is this: without involvement from publishers to accommodate this shared goal by changing how products are built and deployed, any “discoverability” tool will end up as they are now: bestsellers across the board while midlist titles, in many respects the foundational canon of actual discoverability, are nowhere to be found.

This is the basis for my talk at TOC. I hope to take these ideas to the next level by talking about some of the ins and outs of of how to get to where we’re going. I am squarely looking toward publishing in the future and refusing to accept the way things are done. Discoverability is part of the product. It’s not a marketing thing. It should be inherently included in the DNA of a book.

I leave you with this last thought: this is what’s best for the reader. It should always be about what’s best for the reader.

Update 10/10/12: Today, it was announced that Amazon launched Author Rank. A system which updates hourly and shows you the top authors in different categories. A prime example of bringing the bestsellers to the top and making those titles readily available. And another failure of real discoverability. Please add your comments below on what you think about this new development.


15 Responses to Discover me!

  1. Peter Turner says:

    Simply the best, clearest, and visionary description of the future of book discovery. Thank you, Brett.

  2. Erik Smartt says:

    I’ve built two book recommendation systems based on similar ideas. The first was a “wisdom of the crowds” approach that used public curation weighted by individual interest graphs; The second leveraged professional reviewers/librarians as curators, public- and proprietary-metadata, and weighted, individual interest graphs. The latter system was extremely promising, and delivered very interesting recommendations. Unfortunately, it costs real money to operate a system like this at scale. Even if you can crowdsource the curation for free, hosting and data-processing aren’t free. Real’ish-time crunching of dozens of fields of metadata across hundreds of thousands (to millions) of books takes a bit of CPU power. It’s doable (obviously), but you need a revenue model that supports it. This is really the problem with book recommendation business models. Numerous folks have tried — and no one seems to have the resources to take it to a full realization.

    • Brett says:

      Thanks for the comment and weighing in, Erik.

      It is unfortunate that we have yet to see an engine like this take off and get the due funding it would need to keep going. Though I suspect that is part of the implication of changing business models.

      Ultimately, what I am trying to get at is that by understanding discoverability as part of a product, publishers give value to products that are currently neglected and left to the side. So much focus and attention is put on bestsellers because they bring in the most revenue. However, if other titles started moving, starting bringing in incremental revenue because they were built with customer discovery needs in mind, I suspect that these types of services would be much more valued within the ecosystem.

      3-4 years ago Mike Shatzkin was basically saying that it would take ebooks getting to around 15-20% of overall publishing revenue before they were taken seriously enough to base workflow considerations around them. At the time we hovering around 7-9%, I believe. I think the same idea holds true here. It will take some “showing people the light,” perhaps demonstrated success, before publishers will make the changes they know they have to.

      • Erik Smartt says:

        I like the idea of the mid-list and long-tail gaining traction; However, the revenue difference between best-sellers and “the rest” is so dramatic that it’s better for the publishers to cut their losses on anything that isn’t paying the bills.

        Instead of hoping the major publishers will participate in something that helps book discovery (or hoping that they will all agree on how to do it), there may be more long term viability in something that helps self-published authors participate. Unlike for a major house, a self-published mid-list author can make decent money. The exciting prospect then, is to use discovery to turn interesting self-published long-tail books into successful mid-list books. That would be life-changing for these authors.

  3. Suzanne Norman says:

    Excellent piece Brett! I am super excited to hear you speak!

  4. Carol says:

    Very very interesting! Brett, excellent insights & summary of an absolutely critical issue. Erik, considering how many (thousands of?) authors are shortchanged by the present system, I wonder if such a project couldn’t be funded partly with small entry fees &/or a % of book sales. Authors as well as librarians are horribly aware that, as Peter Brantley has observed, 21st-c. literature is being shaped not by writers or publishers but by tech companies; & given their priority of sales > brilliance, the scale is heavily tipped to thrillers & porn. I’d pay to avert that future, wouldn’t you?

  5. Pilar Wyman says:

    Bringing in the human curator, oh so important! Indexes — with all their metadata — can be awesome tools for book discovery.

    I’m looking forward to your talk in Vancouver. My colleagues at the American Society for Indexing (ASI) (www.asindexing.org) and I have similar talks coming up at TOC Frankfurt and the miniTOC in Charleston. Accessing information, making books discoverable and accessible via metadata (strategic, narrow, deep, yes!) is pivotal for discovery and sales.

    Let’s put the power of human analysis to work.

  6. Intriguing and thoughtful post. Could this idea be the basis of the business model that APA members asked ALA to define for them, with, as you suggest, librarians and booksellers providing the initial title recommendations? What if there were some means (OverDrive, 3M, Baker & Taylor?) by which librarians’ and booksellers’ recommendations could be collected and coordinated on a state, regional or national basis?

  7. Geoffrey Kidd says:

    Models may be good for discovering new-to-you authors, but nothing can replace a system which does NOT do spam, but simply allows me to say “Notify me when a book by [author x] will be coming out.” and have an option to add “and remind me the day it’s released.” I consider it desperately shameful that the best notification source at the moment is “the pirate channel.”

  8. Lee Pavach says:

    I’ve been thinking exactly the same thing. I’ve been working in a company that does the algorithm part of very nuanced personal recommendations very well (based on shopping behavior data), however, as you pointed out, the human curation database to support meaningful discovery doesn’t exist yet. My thought was to get independent bookstores involved in the curated metatagging for local self-published authors and to fuel discovery of the authors’ works through this discovery database. Might be a way to keep bookstores viable and relevant, attracting the younger generation of readers who rarely walk into a bookstore or library anymore.

  9. The bestseller feedback loop is a problem – but smarter algorithms are able to recommend books that not many people have read yet.

    Reddit faced this problem with people up and downvoting comments – early comments came out on top of their sorting algorithms because they always got more votes, even if proportionally, later comments were rated better.

    You can read how they fixed that on their blog: http://blog.reddit.com/2009/10/reddits-new-comment-sorting-system.html Recommendation algorithms for books could use similar sampling techniques to predict which books are likely to be popular, before many people have read them.

  10. That was a great, insightful post! As a book publishing professional for more than 10 years, I never had trouble finding books until I left and moved to Los Angeles. I felt like I was in a reading desert–which is why I am hoping to bring a new world of book discoverability through ShelfPleasure.com. This new website is for women
    who love to read and includes book news, author interviews, a monthly book club, interactive forums, and thoughtful commentary. We’re especially looking forward to introducing everyone to books outside of the box just as you mention in your post.

  11. Discovery is not a business, because readers don’t wish to pay for it, and publishers don’t care much for anything that doesn’t push THEER books (discoverability, not discovery).

    Support it with adverting and you are likely to fall flat on your nose with an ugly user interface and there is simply too little money in affiliate ecommerce to support a sophisticated back-end.

    There still might be an business opportunity in understanding what people read. Brett will know how we have been constructing this at Jellybooks. ;)

    Popularity is the THE most common discovery tool and it’s called the “Amazon best seller list”. Number of downloads, shares, buys, etc only tells you popularity and ratings pretty much do the same.

    Collaborative filters can be quite successful (look at last.fm), but have a bad name in books, because too few data points are used (see Amazon “people who bought this also bought….”) and because music consumption is quite different from book reading.

    To teach a machine to do useful work, one must take into consideration things other than popularity. One of the things we learnt the hard way at jellybooks.com I how narrow people’s interest can be, but also how predictable and in the latter lies opportunity to apply machine learning Did the reader sample the book and then buy or share it? A small number of books sampled but share widely is a much stronger signal than a large number downloaded, but not shared.
    When looking at social there is the fallacy, to use the Facebook or Twitter social graph, but influence networks in book are totally different and depend on topic. I listen to different people when it comes to books on design versus say business.

    To build a minimum viable product (MVP) at jellybooks.com we took a short cut. We decided to make book samples available freely and make then dead-easy to share and to post on websites, include in tweets, pin to Pinterest, etc. and to see what people do with it.

    Yes, this process always starts with a human being posting something. Machines then click to measure how other react to it, do they act on a recommendation, sample the book, share it with others, whom do they influence and who influences them. That gives you a data-set from which you can build something interesting, while simply mining the Twitter fire hose for stuff people say about books may be a fool’s errant.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>