Skip to content

Conversation

@epuronta
Copy link
Contributor

When generating XMP metadata, the data is embedded in the XML template string unescaped. Now, if any non-XML-safe data comes in, the output XMP is invalid and will make the output PDF unparseable as Factur-X.

This will produce something like this when reading the file back:
pypdf.errors.PdfReadError: XML in XmpInformation was invalid: not well-formed (invalid token)

There are two easy cases to make this occur in practice with metadata automatically extracted from Factur-X payload:

  • Selling company name
  • Invoice number in ExchangedDocument/ID

All metadata going to XMP generation should be escaped.

This PR does a couple of things:

  • Add profile autodetection for minimum and basicwl - just a mechanical extension to the existing mechanism
  • Run isort as expected on pdf.py
  • Extend the roundtrip tests by having it first attach the generated XML to a PDF, and then reading the PDF back. This raises errors for a few existing tests cases
  • Fix the XMP generation with escaping logic

Fixes #79

@codecov
Copy link

codecov bot commented Jun 13, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.18%. Comparing base (5ed87a6) to head (eeda872).
Report is 35 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master      #80      +/-   ##
==========================================
+ Coverage   90.95%   91.18%   +0.22%     
==========================================
  Files          18       18              
  Lines        1360     1395      +35     
==========================================
+ Hits         1237     1272      +35     
  Misses        123      123              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Demonstrates issue with xmp metadata and characters
in facturx payload (ampersands) that should be escaped
@epuronta epuronta force-pushed the escape-xmp-metadata branch 2 times, most recently from 3e09985 to 112339c Compare June 16, 2025 10:57
@epuronta epuronta force-pushed the escape-xmp-metadata branch from 112339c to ac04829 Compare June 16, 2025 10:57
@raphaelm raphaelm merged commit e96e3bc into pretix:master Sep 2, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unescaped data going to XMP metadata makes output documents unparseable

2 participants