Digital archiving and publication

Introduction and rationale

Publishing scholarly material digitally

While there has always been a considerable gap between the high-ranking educational institutions and less privileged ones both in access to scientific information and their ability to publish research, the gap has been growing drastically in recent years.

In a recent article in the Scientific American, the predicament of imbalance in publication is discussed, detailing the enormous disproportion in published material in the peer reviewed scientific press between various countries. The articles cites examples of material by the same author being readily accepted when submitted from a continental US institution and refused when it originated from the home university which was located in an emerging nation. It is also well documented that the overwhelming amount of published material originates and is published in North America and Europe. Several explanations are offered, among them the obvious one of density of institutions, publications and the amount of research being done. An interesting reason however, is the ease by which material is now transmitted for review in digital form and increasingly published in final and permanent form on the Internet itself. In fact, publication of scientific material on the Internet's World-Wide Web is fast becoming the medium of choice. Edited and reviewed sites are proliferating and the web is set to surpass the printed media volume of scholarly publication if it has not already done so.

Availability of scientific literature

Even more important than the possibility to publish, is the availability and accessibility of scientific journals and literature. In the increasingly unfriendly economic climate that vast numbers of educational institutions all over the world find themselves, libraries are invariable subjects of cost cuts both in terms of books and periodicals. There are innumerable examples of not only research but vital teaching being carried out on the basis of literature that is both out of date and factually wrong. With educational financing deteriorating relative to growing demands for quality education in all but the most privileged academic environments, the future for substantial improvements in access and availability of scientific publications in traditional form looks increasing dismal. To compound the problem, printed materials that are acquired, are often not procured in sufficient numbers, and for all practical purposes unavailable to all but the few individuals who have secured the copies that exist.

Digital archiving and preservation

An increasing number of institutions view intranet and Internet communication technologies, supplemented by other forms of digital documentation as a viable way out of their information access predicament. In this context "digital documentation" is perceived as encompassing methods for acquiring, archiving and publishing scientific and administrative information. It thus involves recording analog information in formats that are as close as possible to optimal for future retrieval, reuse and publishing. While online publishing is important, popular and cost-effective both in terms of information acquisition and publishing, simple archiving to tapes and CD both for conservation as duplication and backup is a major part of efforts in methodical documentation.

Sharing knowledge

Of all characteristics that can be attributed to the Internet, open sharing of information between institutions and individuals is definitely the most important. Exchange of knowledge and information was at the very roots of the development of the Internet and in spite of the enormous growth of commercialization on the net in recent years, it continues to be the most enduring advantage of the net. While many of the companies that have invested in the net in expectation of huge profits in information trade, difficulties in implementing methods of payment continue to be a problem. Even when practical methods of remuneration are in common use, it's doubtful if cultural and scientific information of serious educational use will remain anything other than the responsibility of the academic community.

Similar projects

Practical considerations

The following is an overview of some characteristics of digital documentation:

  • Advantages of digital archiving

    • Size
      As an example 100 images of large poster in 4 resolutions, the largest of print quality, can be stored on a standard digital compact disk.

    • Retrieval and reuse
      Converting information to digital form ensures that each copy retrieved is exactly identical to the original recording. Thus material properly cataloged and stored on read-only digital media ensures unchanged copies.
      Cataloging of the material is also digital, providing mediate access.
      It can also be sent digitally as copy to any location on the the Internet or on hard media.

    • Ease of reproduction
      Apart from reproduction in an infinite number as digital copies, digitized material would be able available for use in print or other media that use digital production techniques.

    • Permanence and safety in duplication
      Digital archiving provides the advantage of making duplicates on various media and storage in separate location, ensuring permanent, destruction-free storage.

  • Digital publication

    • Universal availability
      Digitized information can be made universally available on the Internet (with limited personal access if necessary) and on various other readily available and common media such as CDs, digital tapes and disks.

    • Low costs
      Publication costs are low, usually involving only the authoring or editorial processes and formatting of mater documents.

    • Ease of publication
      With the emergence of standard cross-platform formats of common data types such as html for digital publication and sgml for print, material can often be prepared for final publication by archivists and authors themselves. Additional work is usually limited to editorial and design modification for adaptation into larger collections such as web sites or for marketing and packaging purposes.

    • Search-ability
      Digital formats provide excellent search functionality, especially in text, but increasingly in other data where pattern recognition is applicable. Provided common, cross-platform standards are used, searches need not be limited to isolated collections of information.

    • Collaboration
      By digitizing information and it available universally or selectively, conditions for collaboration are greatly enhanced and time/space limitation (and costs) proportionately reduced.

    • Immediacy of intellectual property rights
      Intellectual property rights (commonly understood as "copyright") is not limited to particular media. Consequently the limitations previously imposed by print publications, including loss and theft between submission and publication are alleviated.
      Digital publication, for example on the Internet, ensures the immediate and permanent intellectual property rights of the owner of the material involved, provided it otherwise complies with international copyright regulations.

  • Prerequisites

    • Infrastructure
      Archiving, conversion, transmission and especially publishing require sufficient infrastructure in the form of basic network facilities. The situation in 1997 is characterized by fair to excellent internal infrastructure at many educational institutions. External interconnectivity remains a severe limitation on collaboration and publication with all but the most privileged. The emergence of common cross- platform standards in both archiving and publication has seen an explosion of both innovative software and methods of presentation and transmission, all putting severe demands and strains on available bandwidth and the dynamics of academic inter-networks. Unfortunately, there does not seem to be any reason to expect effective improvements other than those that can be provided by government and academic networking organizations that see clearly the benefits for improvement in education and research.

    • Technical resources
      Important resource to note are primarily equipment for recording in standardized formats that can easily be read and transferred to new standards as these are developed. Also important is the consideration of standardized, permanent recording media such as CDs and magnetic tapes.

      New file formats are developed continually, especially for displaying over the Internet. Original recordings should be made in formats that preserve complete ranges of sound and color at as high resolutions as possible. Adaptations to various schemes of compression should be made only where no significant loss of information is not of importance to either storage or display for end users.

      The success of any archival and publishing venture is dependent on the skills of the people involved. For non-skilled administrative personnel, the simplest means of judging the skills and competence of the staff involved is the measure of their reliance on proprietary commercial software and standards. The widest possible access and best conformity for information sharing with other institutions and low-cost software is usually achieved by using standards the are freely available in the public domain. Public domain software usually developed from research at educational institutions is freely available on the Internet. Standards developed along the same lines are open and thus readily adaptable for use in both public and commercial software, ensuring much wider compatibility and use at reasonable costs.

    • Editorial competence
      To ensure that material be digitized in formats and ways that are compatible with future use, it is important that the venture has a certain amount of knowledge of formatting and editing data for publication on various digital media. "Publication" in this context does not necessarily mean making the digitized information globally available on the Internet, but simply assembling the archived material in such a way that it is retrievable in reasonably ordered form at any time by those who are responsible for the original analog information.

      In addition digital archiving and especially publication does demand knowledge and skills in organizing material, editing for reader interest and some basic knowledge of data communication. This includes the technical skills necessary to run the software needed for publication both at the client and server ends. Also important is sufficient experience and knowledge of Internet communications to be able to judge delivery bandwidths and the constraints these impose on document formats and sizes with respect to online publication.

    • Equity
      Given the various prerequisites above, which are mostly of a technical nature, the most important principle for success of any digital archival and publication venture is the doctrine that information retained by educational institutions is in the public domain. Knowledge retained by educational institutions is developed through research and collected by acquisition first for the benefit of students, teachers and researchers and second, for the public at large including the use of other institutions with similar aims.

      Attitudes of possessiveness others than those required to protect personal integrity and intellectual property rights simply negate the purpose and aims of archival for knowledge retrieval and dissemination.

  • Digitizing for archiving and digitizing for publication

    There is an important point to be made about the difference between digitizing for archiving purposes alone and subsequent publication of the same material. Storing information on a computer which is attached to a network does not mean that it is immediately available on the global Internet anymore than putting money in a bank makes it accessible to people on the street outside.

    Digitized material can be stored on CDs for example, for safe physical storage. While ensuring permanence and safety, it does severely restrict retrievability. The World-Wide Web client-server software system in combination with configuration techniques in network routing allows for all the necessary variations in access control familiar to physical libraries and archives, with some fairly sophisticated additions. Access to archived digitized information can thus be controlled with respect to conditions such as location, area, personal identification, authentication, passwords, and so on.

  • Media shelf-life

    An often voiced concern with respect to archiving digitized material is the potential deterioration of the "permanent" media on which it is stored or the rapid outdating of the retrieval technology, both hardware and software. Media self-life is dependent on several factors not least of which is the market penetration of common technologies. The more common a certain piece of technology, the longer it will remain in use. Ultimately, the availability of digitized information will simply depend on its perceived value on the part of its custodians. In a well managed archival insitute, material will simply be moved from one form of media about to become outdated to a more modern version as a matter of routine. The fact that the archived material is digital will ensure that it can be moved preserving both accuracy and integrity irrespective of the storage medium. In the case of analog information such as print and film such movement between storage media is prohibitively expensive and causes serious deterioration to the quality of the information itself.

Børre Ludvigsen, professor of information architecture - 970805/970910

Created by the Documentation Center at AUB in collaboration with Al Mashriq of Høgskolen i Østfold, Norway.