Webinar: Pre Commercial Procurement for the long-term Preservation of Digital Cultural Heritage, 14 June

Programme:

  1. Background and context, Börje Justrell (10 mins)
  2. The PCP/PPI instrument and how it is implemented in PREFORMA, Antonella Fresa (10 mins)
  3. The PREFORMA Challenge, Bert Lemmens (10 mins)
  4. How to contribute and next appointments, Claudio Prandoni (10 mins)
  5. Q&A

Outline:

Pre-Commercial Procurement (PCP) is a competition-like method designed to steer the development of innovative solutions towards concrete public sector needs. These solutions are developed by external suppliers that are awarded a contract through a phased open procurement process. In the last years, the PCP instrument is becoming more and more popular within the public sector and the European Union increased support for groups of public procurers working together on joint PCPs under Horizon 2020.

PREFORMA is a PCP project co-funded by the European Commission under its FP7-ICT Programme to work on one of the main challenges that memory institutions are facing nowadays: the long-term preservation of digital data. In particular, the project offers memory institutions an open source conformance checker that controls if a file complies with the standard specifications and with the acceptance criteria of the institutions, thus giving them full control of the process of conformity testing of files to be created, migrated and ingested into archives. This software development is carried out in a collaborative environment with memory institutions and experts. Aim of the webinar is to present the first results of the project and to invite the wider digital preservation community – open source community, developers, standardization bodies and memory institutions – to participate in this process.

For more information about the PREFORMA project visit: http://www.preforma-project.eu/

Time:

13:30 BST / 14:30 CET. The webinar will last approximately one hour.

Register:

http://opfwebinarpcpdigitalheritage.eventbrite.co.uk

veraPDF 0.16 released with full support for all PDF/A parts and conformance levels

The latest version of veraPDF features full support of all PDF/A-2 and PDF/A-3 requirements (all levels). Together with earlier support of PDF/A-1 validation, it represents the first full support for all PDF/A parts and conformance levels.

Features:

  • Conformance checker
    • validation of digital signature requirements
    • extraction of color space info from JPEG2000 images
    • validation of permissions dictionary
    • PDF/A-2B fix: correct implementation of CIDSystemInfo entry requirements
    • command line support for plugin execution to extend feature extraction
  • veraPDF characterisation plugins
    • first set of example pure java plugins available
    • optional sample plugin pack available through installer

Test corpus:

  • 112 new atomic test files for parts 2 and 3

Infrastructure:

Download veraPDF 0.16:

http://downloads.verapdf.org/rel/verapdf-installer.zip

Release notes:

https://github.com/veraPDF/veraPDF-library/releases/latest  

veraPDF is building an industry-supported, open source PDF/A validator. The project benefits from a high level of development resource and PDF/A expertise. Please support our efforts by downloading and testing the software. If you encounter problems, or wish to suggest improvements, please add them to the project’s GitHub issue tracker. You can expect a speedy response. Your feedback is very important, it helps to improve the software.

Keep up to date with the latest developments of veraPDF by subscribing to the veraPDF consortium’s newsletter.

Update on the Resolution of Ambiguities

In December of last year we reported the development of the PDF Validation TWG’s Resolution of Ambiguities document, with an additional 10 questions added to the 4 previously presented to the ISO committee and resolved in April, 2015 during the meetings in San Jose, California.

Since last November the veraPDF contractor has raised, and the TWG has addressed, several more ambiguities to the PDF Validation TWG for resolution, bringing the total number of ambiguities raised to 24 for all parts of PDF/A.

Since many of these questions pertain to PDF/A-next in addition to previous Parts of ISO 19005, the 10 new questions generated by the TWG between the two ISO meetings were submitted into the formal ISO process for reviewing comments against draft specifications. The ISO WG then duly considered the Resolution of Ambiguities document during its meetings in Ghent, Belgium in May, 2016.

These new questions proved somewhat more contentious than many of the questions formerly raised. To provide a flavor of the issues addressed, the most recent set of ambiguities is summarized here:

veraPDF-A015 discusses the interpretation of the corrigendum 2 to ISO 19005-1, which contains a special clause to exclude resources unreferenced from the corresponding content stream from further requirements.

At Adobe’s request, this item was parked by the WG for further study, to be resolved at the ISO meetings in Sydney, in November, 2016.

veraPDF-A016 remains a sore-spot. The keys in question are deprecated from ISO 32000-2, and thus do not affect PDF/A-next. However, the requirement remains for PDF/A-2 and PDF/A-3; it will be left to an industry Application Note to provide a universal reference for relaxing these unnecessary and problematic requirements for CharSet and CIDSet entries.

veraPDF-A017 sought to clarify that XMP metadata streams in PDF/A-1 must be uncompressed. The TWG’s interpretation was accepted, and the WG added an additional clarification: that XMP packages don’t need to conform to XMP or even XML.

veraPDF-A018 refers to an ambiguity over whether the requirement pertains to the file-format or to a means of comparing real values. The WG decided that Non-zero values less than the minimal one are not allowed in PDF/A-2 (and PDFA-3) on purpose.

veraPDF-A019 discusses the problem that clause 6.1.13 in ISO 19005-2 copies the list of limits from ISO 32000-1 and lists them explicitly. However, the word “approximately” was dropped, and so the definition of the limits thus differs between ISO 32000-1 and PDF/A-2, creating an untenable situation for processors encountering files that (may) exceed these limits. The WG elected to leave the matter as-is because although differing from the base specification for PDF the actual requirement for PDF/A-2 was itself not ambiguous.

veraPDF-A020 concerns the “shall” requirement in all three parts of PDF/A to comply with either predefined schemas from the XMP specifications or with an extension schema. The WG accepted the PDF Validation TWG’s recommendation for PD/A-next.

veraPDF-A021 questions the value and practicality of the requirement in PDF/A-2 and PDF/A-3 to record user actions in the xmpMM:History property. The WG accepted the PDF Validation TWG’s recommendation for PD/A-next but highlighted that the parameters field is still required in xmpMM:History for conformance with PDF/A-2 and PDF/A-3.

veraPDF-A021a (there was a numbering error, to be corrected in a subsequent Resolutions document) points out that in PDF/A-1 it’s not clear if any Widget annotation is required to have an annotation dictionary. The WG agreed with the TWG’s interpretation that for PDF/A-1, every button field widget shall have an appearance stream or dictionary.

veraPDF-A022 affects all parts of PDF/A. The requirement for multiple appearance streams misses the case when a form (such as a radio button) has multiple widgets associated to it and defined in /Kids array. The TWG proposed to PASS otherwise valid PDF/A documents if it contains a Widget annotation dictionary with Parent key referring to a parent form field of type Button, and if the value of the N key in this widget annotation dictionary refers to an appearance subdictionary. The WG agreed.

veraPDF-A023 pointed out that some wording pertaining to ICC color spaces was imprecise, and proposed specific replacement text. The WG accepted this interpretation, and the PDF/A-next Project Leader agreed to make this change in the text of PDF/A-next.

Following the ISO meetings in Ghent the PDF Validation TWG will continue its review and test-suite development for PDF/A-2 and PDF/A-3, with its final questions to be put before the ISO WG during the November, 2016 meetings in Sydney, Australia.

The PDF Association is currently considering publication of the final Resolution of Ambiguities document as a formal PDF Association Application Note for PDF/A.

veraPDF 0.14 released with launch of demo website

We are pleased to announce the latest release of veraPDF. Version 0.14 features Transparency and Unicode character map validation in PDF/A-2 levels B and U. Other highlights include:

Conformance checker:

  • added all transparency-related validation rules in PDF/A-2 and PDF/A-3
  • added full Level U support in PDF/A-2 and PDF/A-3
  • code refactoring to synchronize GUI, API and CLI interfaces
  • PDF/A-1B fix: check both Tiling patterns used as different fill and stroke colour spaces in the same painting operations
  • added initial versions of PDF/A-2U, PDF/A-2A, PDF/A-3U, PDF/A-3A validation profiles. We now have initial validation for all PDF/A flavours.

Test corpus:

  • added a further 65 atomic test files for PDF/A-2 specific requirements

Infrastructure:

Download veraPDF 0.14:

http://downloads.verapdf.org/rel/verapdf-installer.zip

Release notes:

https://github.com/veraPDF/veraPDF-library/releases/tag/v0.14.2

This is the first release of the final design phase which began on 19 April following the PREFORMA Project EC review at their Open Source Workshop.

veraPDF is building an industry-supported, open source PDF/A validator. The project benefits from a high level of development resource and PDF/A expertise. Please support our efforts by downloading and testing the software. If you encounter problems, or wish to suggest improvements, please add them to the project’s GitHub issue tracker. You can expect a speedy response. Your feedback is very important, it helps to improve the software.

Keep up to date with the latest developments of veraPDF by subscribing to the veraPDF consortium’s newsletter.

veraPDF 0.12 released alongside first version of wiki validation rules

The latest software release features improved PDF/A-2b and PDF/A-3b validation and the fully featured REST API.

veraPDF 0.12 has the following features:

Conformance checker:

  • PDF/A-2 and PDF-A/3 improvements: implemented checks for optional content, JPEG2000 requirements
  • full compliance with BFO test suite (PDF/A-2b)
  • PDF/A-1b fix: check for form field appearance
  • code refactoring to enable PDF model implementation via different PDF parsers
    performance and memory optimization

Test corpus:

  • full coverage of all predefined XMP properties

Documentation:

Infrastructure:

  • veraPDF-library project refactored into multiple projects
  • PDF Box validator implementation in separate project
  • Automated source packaging with dependencies
  • Corpus test results published online

The veraPDF validation engine implements the PDF/A specification using formalisations of each requirement in PDF/A-1, PDF/A-2 and PDF/A-3. The wiki determines each rule used by the software and provides details on the error(s) triggering a failure of the rule.

Download veraPDF 0.12 at: http://downloads.verapdf.org/rel/verapdf-installer.zip

Release notes are published at: https://github.com/veraPDF/veraPDF-library/releases/tag/v0.12.4

veraPDF is building an industry-supported, open source PDF/A validator. Please download and test the software. If you encounter problems, or wish to make suggestions, please add them to the project’s GitHub issue tracker. Your feedback is very important, it helps to improve the software.

Open Source Preservation Workshop – Serving the Cultural Heritage

Registration is now open to attend the Open Source Preservation Workshop. Organised by the PREFORMA project (http://www.preforma-project.eu/), it is the first in a series of international events planned by this international consortium of technology and content providers, working together on one of the main challenges memory institutions are facing nowadays: the long-term preservation of digital data.

The workshop will take place in Stockholm on April 7, 2016, and will be hosted at the National Library of Sweden.

Check out the full programme (http://opensourceworkshop.preforma-project.eu/programme/), the confirmed speakers (http://opensourceworkshop.preforma-project.eu/speakers/) and the list of exhibitors (http://opensourceworkshop.preforma-project.eu/exhibitors/).
 

About the event

The event is intended for anyone interested in digital preservation and cultural heritage: developers who want to contribute code to the PREFORMA tools; memory institutions or other cultural heritage organisations involved in (or planning) digital preservation initiatives; standardisation bodies maintaining the technical specifications of preservation file formats; any other person interested in cooperating with us in defining open digital preservation standards.

It will feature keynote presentations by representatives from the PREFORMA project and the open source community, live demonstrations of the three conformance checkers for electronic documents, images and AV files by the suppliers working in the project (veraPDF, Easy Innova, MediaArea) and an informal networking event where all the attendees can share experiences, meet the PREFORMA developers and learn about the tools.

Further information

The language of this workshop will be English.

Participation in the workshop is free of charge. Please register athttp://opensourceworkshop.preforma-project.eu/registration/ before March 31, 2016.

If you have any further enquiries or require additional information about this event, please contact Claudio Prandoni at prandoni@promoter.it.

Event website http://opensourceworkshop.preforma-project.eu
PREFORMA project http://www.preforma-project.eu/

veraPDF 0.10 released

The latest version of veraPDF is now available. This marks the end of the PREFORMA project’s first re-design phase.

Version 0.10 has the following feature enhancements:

Conformance checker:

  • new implementation of the XMP validation
  • proper CharSet / CIDSet validation

Command line:

  • processes stdin if no file paths are supplied for use in *nix pipes;
  • directory and recursive sub-directory processing; and
  • text mode output with summarised output.

Test corpus:

  • initial set of PDF/A-2 test files

There are also a number of bug fixes:

Conformance checker:

  • fixed CMap / WMode validation
  • minor fixes in PDF/A-2b and PDF/A-3b validation profiles

Command line fixes:

  • all CLI output for a single file now in one XML document; and
  • error output now all goes to stderr, keeping stdout clean.

Download veraPDF 0.10 at: http://downloads.verapdf.org/rel/verapdf-installer.zip

Release notes are published at: https://github.com/veraPDF/veraPDF-library/releases/tag/v0.10.7

veraPDF is building an industry-supported, open source PDF/A validator. Please download and test the software. If you encounter problems, or wish to make suggestions, please add them to the project’s GitHub issue tracker. Your feedback is very important, it helps to improve the software.

veraPDF workshop at PASIG 2016

veraPDF will be running a half day workshop at the PASIG (Preservation and Archiving Special Interest Group) 2016 conference. PASIG takes place on 9-11 March at the Czech National Library of Technology in Prague. Registration for the conference is open at: http://pasig.schk.sk/wordpress/registration.

The veraPDF workshop takes place on the first morning in parallel to the ‘Digital Preservation 101’ session.

9 March 09:00 – 12 noon
veraPDF: an industry-supported, open source PDF/A validation for digital preservationists
veraPDF’s goal is to build an industry-supported validator for PDF/A. This workshop will begin with an introduction to why PDF/A is used and the importance of validation. We will have a web demonstrator with user interface that participants can try out. We’ll then be looking at metadata extraction, focusing on what the data means and why it is important in a library or archive environment.

The second part of the workshop will focus on how the software has been put together to produce a reliable web services that can be used in your institution.

The full PASIG conference agenda is published at: http://pasig.schk.sk/wordpress/agenda.

Note that the attendee cost is 200€ until the end of January and 250€ February 1 on.  A list of local hotels is available on the website.

veraPDF 0.8 now available

We are pleased to announce the latest release of veraPDF. Version 0.8 features a re-designed command line interface (CLI) for validation and feature extraction.

Highlights of this release are:

  • Refactored plug-in architecture;
  • Re-designed CLI for PDF/A validation and feature reporting;
  • Supporting install scripts;
  • Updated validation profile syntax;
  • Simplified machine-readable report format;
  • Synchronization with PDFBox 2.0 RC1 library.

The most important bugs fixed in this release are:

    • comparison of Info dictionary and XMP metadata (PDF/A-1);
    • support for missing resources and resource inheritance mechanism (PDF/A-1); and
    • parsing TrueType fonts with zero-length tables.

Download veraPDF 0.8 at: http://downloads.verapdf.org/rel/verapdf-installer.zip

Release notes are published at: https://github.com/veraPDF/veraPDF-library/blob/release-0.8/RELEASENOTES.md

veraPDF will deliver an industry-supported PDF/A validator. Please download and test the software. If you encounter problems, or wish to make suggestions, please add them to the project’s GitHub issue tracker. Your feedback is important, it will contribute to improving the software.

Resolution of Ambiguities – the PDF Validation TWG’s work-product

As reported earlier, the veraPDF project was introduced to the ISO committee managing PDF/A in April, 2015.In that meeting, DualLab’s Boris Doubrov, PDF Association member and the lead technical contractor on the veraPDF project, presented 4 questions generated by the PDF Validation Technical Working Group (TWG), the group responsible for approving veraPDF test-suite files and project software.The ISO committee in question, ISO TC 171 SC 2 WG 5 is developing a document currently known as “PDF/A-next” the end-product of which is likely to become the next part of ISO 19005, the PDF/A specification. The questions generated by the TWG, therefore, are valuable for two reasons:

  • They provide an opportunity to remove the ambiguity in a future part of PDF/A, and
  • although the committee has previously stated that no corrigenda will be published for existing parts of PDF/A, in accepting these corrections, the ISO committee may be seen to be implicitly resolving the ambiguities in the most formal available sense, so
  • the ISO’s committee’s acceptance of a given resolution allows the industry to address the matter on their own terms, but with confidence in a fixed point of reference.

In the most recent meeting, held in Basel, Switzerland between November 16-20, 2015, the PDF Validation TWG added 10 more ambiguities to the original list of 4. Together, these 14 issues (and others to be uncovered as the TWG continues to review PDF/A test files) are now known as the PDF/A Resolution of Ambiguities document.

As the test-suite and software move deeper into PDF/A-2 and PDF/A-3 there will, doubtless, be more questions to be resolved. These will be presented at the next meeting of ISO TC 171 SC 2 WG 5, next May in Ghent, Belgium.