Complete PDF/A-1b coverage now available in 0.6 release of veraPDF

The veraPDF consortium is pleased to announce the latest release of the veraPDF PDF/A validation software and test-suite currently under development.

Highlights for this release are:

  • validation of all conformance criteria for ISO 19005-1 (PDF/A-1), conformance level b;
  • a complete PDF/A-1b test corpus, including 200 new test-files:
  • PDF features reporting; and
  • a cross-platform installer.

Prototype features include:

  • PDF metadata fixing;
  • validation model and rules for PDF/A-1a, PDF/A-2 & PDF/A-3;
  • reporting in XML and HTML.

Download veraPDF 0.6 at: http://downloads.verapdf.org/rel/verapdf-installer.zip

Release notes are published at: https://github.com/veraPDF/veraPDF-library/blob/release-0.6/RELEASENOTES.md

veraPDF’s goal is to deliver an industry-supported PDF/A validator for memory institutions. We invite you to download and test the software. A guide for the desktop interface is provided to help you get started. If you encounter problems, or wish to make suggestions, please add them to the project’s GitHub issue tracker.

This release marks the one year point since veraPDF first began the design phase in November 2014. The next release will include release candidates of all eight PDF/A validation profiles. Further details are outlined in our development roadmap.

Keep up to date with the latest developments of veraPDF by subscribing to the veraPDF consortium’s newsletter.

Led by the Open Preservation Foundation and the PDF Association, the veraPDF consortium is developing industry-supported open source, file-format validation for all parts and conformance levels of ISO 19005 (PDF/A). The software is designed to meet the needs of memory institutions responsible for preserving digital content for the long term.

The veraPDF consortium is funded by the PREFORMA project. PREFORMA (PREservation FORMAts for culture information/e-archives) is a Pre-Commercial Procurement (PCP) project co-funded by the European Commission under its FP7-ICT Programme.

 

veraPDF webinar Tuesday 13 October at 16:00 CET

The Open Preservation Foundation is hosting a webinar exploring the progress and plans for veraPDF on Tuesday 13 October. Boris Doubrov from our partner Dual Lab, is the session lead.

The webinar covers the work the veraPDF consortium have done so far on PREFORMA project. This includes the initial implementation of the PDF/A-1B validation model, the details on the latest 0.4 code release and plans for upcoming releases. We will also discuss the integration points and extensibility of the framework to cover additional requirements on PDF documents.

Registration

To register visit: https://openpreservation.clickwebinar.com/from-the-toolbox-verapdf/register

The webinar will last approximately one hour.

veraPDF releases prototype validation library for PDF/A-1b

An early release of the veraPDF validation library is now available. This 0.4 release delivers:

  • a working validation model and validator;
  • an initial, unverified PDF/A-1b validation profile; and
  • prototype PDF feature reporting (characterisation).

This release comes ahead of our development schedule enabling users to test our first implementation of PDF/A-1b validation on single files, using a GUI. The latest version of the 0.4 release can be obtained from our download area: http://downloads.verapdf.org/rel/veraPDF-library-GUI-latest.zip

Version 0.4 release notes are available on GitHub: https://github.com/veraPDF/veraPDF-library/releases/tag/v0.4.10.

Instructions for unpacking the GUI zip archive, running the GUI and loading the PDF/A-1b profile can be found here: https://github.com/veraPDF/veraPDF-library#unpacking–using-the-gui-package

The PDF/A-1b profile has been tested internally against our corpora, but has not been fully assessed by the PDF Association’s PDF Validation Technical Working Group (TWG), nor has it passed the PREFORMA project’s acceptance tests.

Feedback is welcome, please use the project’s GitHub issue tracker to submit bug reports or enhancement requests.

Our next release milestone is on 31 October 2015, aligning with the end of the first design phase of the PREFORMA project. Details of the software development roadmap are published on the veraPDF website. You can also receive updates by subscribing to the veraPDF consortium’s newsletter.

 

veraPDF consortium issues first public software release

The first public prototype of veraPDF’s validation software has been released. The software can be downloaded at: http://downloads.verapdf.org/rel/veraPDF-library-GUI-0.2.0.zip

veraPDF is developing industry-supported open source, file-format validation for all parts and conformance levels of ISO 19005 (PDF/A). The software is designed to meet the needs of memory institutions responsible for preserving digital content for the long term. The project is led by the Open Preservation Foundation and the PDF Association and is supported by leading members of the PDF software development community through their Technical Working Group.

This initial public release of veraPDF’s software is incomplete, and is not to be used as a validator; it is currently more a proof of concept than a usable file format validator. The release notes are published at: https://github.com/veraPDF/veraPDF-library/releases/latest

veraPDF’s website is now up at https://verapdf.org/. The site contains information on the project’s software and roadmap, the team behind it, and how you can get involved.

If you’d like to keep up to date with veraPDF’s progress, and be among the first to find out when new software is available sign up to our email list: https://verapdf.org/subscribe/

About
The veraPDF consortium is a unique collaboration, bringing together an end user community of digital preservationists and a software industry rooted in the principle of interoperability based on ISO standardised technology to develop an industry-supported conformance checker for PDF/A. veraPDF is funded by the PREFORMA project.

PREFORMA
veraPDF is funded by PREFORMA – PREservation FORMAts for culture information/e-archives – a Pre-Commercial Procurement (PCP) project co-funded by the European Commission under its FP7-ICT Programme.

April 2015 – December 2016: Prototyping phase

PREFORMA awarded three of the six suppliers a contract for the prototyping phase in April 2015. veraPDF was successful in proceeding to phase 2 for the conformance checker for the PDF/A strand.

We are still in the early stages of development. We have been busy focusing on PDF/A validation fundamentals and establishing good working practices. You can find out more on our software page.

The verapdf.org website was launched in June 2015 in advance of the first public software release on 15 July 2015.

The DPC and OPF are running a briefing day on 15 July ‘Preserving Documents Forever: When is a PDF not a PDF?’. Participants at the briefing day will have a chance to find out what veraPDF plans to deliver over the coming years. More importantly they will also have an opportunity to contribute to its design by providing requirements and feedback.

 

Introducing veraPDF to ISO’s WG for PDF/A

The world of ISO standards development does not proceed very quickly unless there is substantial vendor demand in the first instance. In this case, the first formal communication between the veraPDF consortium’s effort to build an industry-supported validator for PDF/A and the ISO working group responsible for PDF/A occurred during the ISO TC 171 meetings in San Jose, CA in April, 2015.

In the lead-up to this meeting the PDF Association’s PDF Validation TWG created and posted several comments for review by the ISO TC 171 SC 2 WG 5 committee responsible for PDF/A. Additional questions were also put to the WG 8 committee, responsible for ISO 32000, the PDF specification.

As the TWG studied PDF/A in detail, and as discussed i the PDF/A Competence Centre’s Application Notes for PDF/A,  it became clear that there were several areas of ambiguity, especially in part 1 of ISO 19005. During the April meeting in San Jose the PDF Validation TWG submitted questions and received direction on the following key points of interest:

  • That existing Parts of PDF/A will not be amended via corrigenda or otherwise.
  • That the PDF Validation TWG may, as a body, supply proposed revisions to working text for a new Part of PDF/A.
  • The PDF Validation TWG submitted several proposed enhancements for a new Part of PDF/A addressing ambiguities in existing specifications. The first formal ISO meeting to address PDF/A-next will be held in Basel, Switzerland in November, 2015.

The veraPDF consortium will discuss development of a proposal to ISO to request permission to establish the PDF Association’s PDF Validation TWG and the veraPDF software it approves as normative references for PDF/A-next.

The generic nature of veraPDF’s design make it highly amenable to other file-format validators, including other PDF subset standards, PDF itself, and other, non-PDF formats. We hope the community will see veraPDF as a building-block for more comprehensive PDF validation.

How veraPDF does PDF/A validation

In his article on pdfa.org, veraPDF architect Boris Doubrov outlines the veraPDF model. In particular, he highlights the fact that:

  • veraPDF is a purpose-built validator, not a parser adapted to validation purposes
  • the veraPDF model is entirely generic, able to accommodate the variety of data structures possible within PDF
  • veraPDF development includes the PDF Validation TWG in its process

Read more

How veraPDF relates to the standards-development process

The veraPDF consortium project was created by and, fundamentally for the archivist community. The PREFORMA project funds three initiatives to create open-source validators for certain key file formats: PDF/A, TIFF and FFV1 and Matroska, for video content.

The PDF Association’s interest in the veraPDF project stems from the PDF technology industry’s collective interest in the highest possible degree of interoperability between PDF production, processing, display and other software.

PDF/A, with a strict set of requirements and limitations, provides the archival subset of PDF targeted by the PREFORMA project funding veraPDF.

The need for improved general understanding of PDF/A was established with the first Isartor Test Suite project, almost ten years ago. Isartor covered only PDF/A-1b, and today there are 3 parts of PDF/A, and 8 conformance levels. veraPDF will provide a complete test-suite for all of these.

How veraPDF fits into the standards process

The veraPDF consortium is a joint project that includes the PDF Association as a key member. Through the PDF Association, a Category A liaison with ISO’s TC 171, and it’s PDF Validation Technical Working Group, the veraPDF consortium will be able to address the ISO standards development process in several ways.

Enhanced feedback to the ISO delegates from the archivists’ perspective

  • Seek specific guidance on questions regarding interpretation of current standards
  • Ask specific questions designed to elicit commitments to address the ambiguity in any future part of the PDF/A specification
  • Highlight problematic conceptual aspects of PDF/A to encourage the ISO committee to address them in some way

The veraPDF consortium’s strategy will be to:

  • Establish a relationship between the PDF Association’s PDF Validation TWG, which is a new organization within the PDF Association, with the ISO WG (TC 1717 SC 2 WG 5) responsible for PDF/A
  • Establish a routine in which the PDF Validation TWG brings matters of interpretation to the WG at each of its twice-annual meetings for review and discussion at the ISO level
  • Provide a vehicle for determining and conveying industry-supported interpretations of PDF/A
  • Provide a framework for extending veraPDF to cover the rest of ISO 32000 (the PDF specification itself)
  • Provide a framework for extending veraPDF to address validation of any 3rd party data-structure that might occur in a PDF file

If veraPDF is successfully adopted, the principle effect will be that PDF/A software tends to agree with other PDF/A software about the PDF/A files they exchange, and archivists will have a new means of understanding PDF/A and PDF files. Success will be noticeable by what is absent – reduced conflicts between PDF/A software, and by the fact that PDF/A-based solutions become more reliable and useful in a variety of implementations.

November 2014 – February 2015: Design phase

We worked with closely with both memory institutions and industry to gather their requirements for the PDF/A conformance checker.

OPF and DPC consulted with their members, and held a webinar to explain the aims of veraPDF and get feedback on the following points in the functional and technical specification.

Functional specification:

  • scope of PDF/A validation;
  • policy requirements;
  • use cases;
  • interfaces;
  • integrations (e.g. with repository software).

technical specification:

  • implementation technologies, validation/policy profile
  • report formats, test framework, etc;
  • review of existing test corpora;
  • evaluation of PDFBox.

In parallel, the PDF Association’s Technical Working Group held regular meetings to:

  • identify ambiguities in the PDF/A standards;
  • form consensus at working group level to resolve ambiguity;
  • create test files and documentation to support resolution;
  • submit the above to the ISO committee for ratification;

 

The functional and technical design specifications were submitted to PREFORMA alongside a project plan proposal and community engagement plan for Phase 2 (prototyping).

The proposed software architecture was presented to the PREFORMA panel in March 2015.

 

PREFORMA call to tender

In June 2014, the PREFORMA project issued its call to tender. The main objective of the PREFORMA project is to oversee the development and deployment of an open source, software licensed, reference implementation for file format standards. The toolsets are aimed at any memory institution or organisation with a mandate to archive and preserve digital objects. The media types covered by the tender were: texts, still images and audio-visual records. The PREFORMA call required a conformance checker that:

  • verifies whether a file has been produced according to the specifications of a standardised file format,
  • verifies whether a file matches the acceptance criteria for long-term preservation by the memory institution,
  • reports in human and machine readable format which properties deviate from the standard specification and acceptance criteria, and
  • corrects relevant metadata in the preservation file.

We submitted submitted a response to the text strand, focussing on PDF/A under the name “Ver­aPDFa Consortium”.

PREFORMA chose two suppliers to proceed to the design phase for each media type. The successful six suppliers were announced in November 2014.