Societies Research Libraries // November 11th, 2013

Can Scholarly Publishing Evolve Beyond the PDF?


In this video presentation, Gary Spencer, Associate Director of Product Management in Wiley’s Global Research Division, ponders the staying power of the PDF format in scholarly publishing. The presentation includes a brief history of digital publishing, and a look at how PDF and HTML have evolved. In spite of significant usability improvements, rich linking, and supporting information in HTML full-text articles, researchers still choose PDF over HTML 65% of the time.

Can new “smart” content and innovations like the iPad move researchers away from PDF? We would love to hear your thoughts on PDF vs. HTML in the comments.

  • Leslie Citrome

    I have converted my professional library (1000+ books) to digital form – PDF of course. I no longer keep paper copies of journal articles – they have either been scanned (my legacy resources) or directly downloaded in PDF format. Native PDF files are relatively small in size and hard drive disk space is very inexpensive. I can view PDF files, search through them, annotate them, etc. They are easily shared.

    On the other hand, HTML files are not as portable. In the case where I have to print a page or two for temporary use, HTML files are much less convenient given their formatting optimized for screen viewing.

    • Gary Spencer

      Hi Leslie, and thank you for your comment!

      I hear you loud and clear regarding the portability of PDF documents. I also take note of the value you place on “printability” — and HTML is often inferior. It has been possible for HTML to provide a print optimized version of content for quite some time, but adoption by content providers has been a mixed bag. I think a lot of content consumers have mostly given up trying to get a good print from HTML content, even if it exists.

      Your comment also exposes a real need to provide a conversion path for users to migrate their existing PDF libraries over to some form of portable HTML if we ever hope to see HTML as the favored format.

      Thanks again!

  • Robert Dingwall

    I haven’t gone quite as far as Leslie, although I would echo the points made in that post. However, there is also an important issue of permanence of access. Given the fluctuations in university library subscriptions and journal holdings, a pdf on the hard drive, if backed up properly, ensures independence of decisions by librarians about what they will and will not provide access to. If content is obtained through inter-library loans, it now normally comes as a pdf and is similarly permanent.

    I suspect that the supposed benefits of HTML will be slow to emerge in some humanities fields where images, for example, are likely to remain subject to copyright for a long time to come. Deep-linking to historic content may be useful – nice to check the context of quotes from 18th century books – but this is rather a niche benefit. Whatever happens with OA in the academic world, copyright is not going to go away in commercial environments and disciplines at that interface will struggle to adopt HTML.

    • Gary Spencer

      Thanks Robert, and excellent points!

      In terms of permanence, HTML in its current form doesn’t fit the bill. As you said, it is too easy for publishers to lock users out of their content at some point down the road. I think what is needed to address this is something with the flexibility of HTML, that can also stand on its own in an offline environment. In this scenario, the researcher would have possession and control of a discreet file (an HTML “package”) that resides on their PC (or on their cloud service, etc.), but when opened in a connected environment, has all the benefits of external data linking.

      I don’t think this is too far fetched, and improvements to “container” technology, like ePUB3 could help make this a reality in the not too distant future.

      Your other point about the usefulness of deep-linking is well taken. Some fields will see few (or no) benefits, but I do think there are fields (such as chemistry and engineering) that could see tremendous benefits.

      This isn’t likely to be a rapid transition, but it is a very exciting time to be in publishing!

  • Ken Lanfear

    As the Editor of a Wiley journal, I’m very much in favor of HTML and apps. The biggest constraint, however, may be author reluctance to move beyond the comfortable PDF. My authors are reasonably high tech, but they just are not beating on my door to provide opportunities to present scientific concepts in new, flexible, and more innovative ways. Any suggestions how to create this demand from authors?

    • Gary Spencer

      Hi Ken, and thanks for your comment!

      You’ve raised a really important point regarding authors. My view is that in order to get authors to supply new and innovative content, publishers need to provide an array of tools that make it very simple to do so.

      I think YouTube is a good example. Before YouTube, it was quite cumbersome (nearly impossible) to digitize video and make it available online in a format that was universally accessible. YouTube simplified the process by accepting any kind of video file and automatically converting it into the multitude of formats necessary for playback across many devices and internet speeds.

      Smartphone and camera manufacturers were quick to get on board to streamline the process further. Most smartphones, and many cameras, have a “send to YouTube” feature that makes video posting incredibly easy.

      Publishers need to develop similar tools for the research community. In many fields, great “artifacts” of research exist, but they are cumbersome or impossible to deliver to publishers in a broadly usable way.

      Thanks again for making this great point!

  • Guesst

    Thank you Gary, very interesting! The main downside as I see it is that articles are too much scattered. A program like Mendeley works great with pdf’s, but wont index all html files I would need. That’s what it would take for me…

    • Gary Spencer

      Thank you for taking the time to comment!

      I understand that Mendeley (and a few other similar tools) are very useful for researchers, and I feel that this is one of the tougher challenges to overcome for HTML. It isn’t really “portable” and can’t easily be put into a container and moved into other tools. We do have a few ideas about how to address this, but would require participation by many of the major publishers in order to get any traction.

      This need is definitely noted!

  • TAnthonyHowell

    Gary – Impressive presentation! I think your question has been answered – scholarly publishing is in fact already evolving beyond the PDF as evidenced by the Smart Article and Elsevier’s Article of the Future introduced last year along with other publishers who continue to innovate with interactive textbooks etc.

    I believe there is a role for each of the technologies to continue to coexist and evolve together to suit the specific needs and preferences of the creator and user. I’m not a big believer in EPUB3 as a potential replacement for PDFs because so much content is not “rich” and in need of the underlying support system.

    While HTML will continue to provide more and more interactivity, such as 3D modeling, mapping, live data update etc, sometimes a static PDF is all that is needed, nay preferred, for the particular discipline – particularly permanency, printability, portability and simplicity.

    And all that being said, I did convert thousands of my music CDs to iTunes.

    • Gary Spencer

      Thanks for stopping by Tony, great to see you!

      We (and other publishers) are definitely trying to move beyond the PDF with Smart Article (and other, similar initiatives), but researchers have been slow to let go of the PDF. There are many good reasons for this, several have been stated in the other comments. I think you’re right, there will certainly be a place for both technologies for the foreseeable future.

      I have high hopes for the EPUB3 format, but it hasn’t been “formalized” yet, and support is pretty limited. From what I can tell, Apple’s iBooks is the only software that comes close at this time. I speculate that we’ll see more support for EPUB3 over the next year or so. It will be interesting to see what, if anything, Amazon does here — they have a lot of influence in this space and have their own proprietary format.

      Since EPUB is just a container format, publishers should be able to encapsulate interactive articles (ie: the Smart Article) into that format and researchers will be able to save them locally and have all the benefits of HTML with the permanency and portability they currently get from PDF.

      One thing is for sure, its an exciting time to be in publishing! Thanks again for the comment T!

  • Konrad Hinsen

    As a reader of scientific journals, I don’t care about the technicalities. PDF, HTML, or whatever else is fine as long as it fulfills certain criteria: (1) a paper is a well-defined package that I can download, copy, and share with colleagues. (2) I can read the paper on my computer or on a mobile device (offline), seeing identical content. (3) I have reader software that just works (no special plugins etc.) (4) When reading online, my privacy is respected (no cookies etc.). Right now, PDF fulfills these criteria whereas HTML5 doesn’t. But that can change.

    From a technical point of view, the real question is whether it’s worth to put much effort into HTML for scientific publishing, considering that in the long run we want more functionality that even HTML cannot offer. In particular “executable papers” that embed raw data and code in such a way that they can be inspected in ways the authors didn’t envisage, and re-used (though a reference indexed in citation databases) in follow-up studies. This would be a much more important revolution than the minor conveniences that HTML can offer over PDF.

  • mmcpher

    There is still something familiar and visually pleasing in a pdf, but even with advances in the format, it remains stiff and relatively dumb and ultimately frustrating and wasteful of bandwidth and time. I surmise your use of “scholarly publishing” refers mostly to scientific, medical or academic journals. Which I sometimes cite to and research, but my pov, as an attorney, may have some relation to the question. In my experience, lawyers were years ahead of the medical profession in recognizing the utility of going online. The major legal research providers, Westlaw Next, Lexis Advance and Bloomberg Law have never been as tied to the pdf format as publications in other disciplines, but where the research providers cross with court systems, pdf format becomes the norm, with all its limitations.
    All three of the big research providers have launched major revamps of their products and publications, mostly to take greater advantage of html 5 capabilities, as Wiley has been doing for some time. It is surprising to see that, despite the fierce competition for eyeballs and the stakes and the relatively vast resources and experience of each of these companies, they still struggle with uniformity, layout, organization and navigation. As it stands now, and IMHO FWIW, Westlaw Next, Bloomberglaw and Lexis Advance, in order of sensible UI and consistent, intuitive searching and intuitive access to supplemental content.
    It is, I expect, a daunting task to draw together all the varied sources under one format.