Accessibility: PDFs

Why is this important?
  • Ensures screen readers can navigate and interpret content.
  • Makes scanned documents searchable and readable.
  • Preserves structure and accessibility features from source documents.
When to use it

Use accessible PDFs when sharing finalized documents, forms, reports or resources that need to be preserved in a fixed format.

How to use it

Use Adobe Acrobat or Microsoft Word to tag headings, lists and tables before exporting. Convert scanned PDFs into searchable text using optical character recognition.

Example

A scanned flyer should be run through optical character recognition and tagged with headings and alternative text before being uploaded to a website or emailed to users.

What are PDF Tags?

PDF tags are hidden markers that define the logical structure and reading order of content within a PDF document. They work like HTML tags, identifying different elements like headings, paragraphs and lists.

Here’s a list of common PDF tag names based on the PDF/UA and WCAG standards:

Document Structure Tags

  • Document – Root element of the tagged PDF.
  • Part – Major division of a document.
  • Sect – Section within a part.
  • Div – Generic container for grouping content.

Text-Level Tags

  • P – Paragraph.
  • H1, H2, H3, H4, H5, H6 – Headings (levels 1–6).
  • Lbl – Label for list items.
  • Span – Generic inline text container.

List and Table Tags

  • L – List.
  • LI – List item.
  • Table – Table container.
  • TR – Table row.
  • TH – Table header cell.
  • TD – Table data cell.

Specialized Tags

  • Figure – Image or graphic.
  • Caption – Caption for a figure or table.
  • Artifact – Decorative or nonessential content (not read by assistive tech).
  • Note – Footnote or endnote.
  • Reference – Citation or reference.

Form Tags

  • Form – Container for form fields.
  • Field – Individual form field.

Optical Character Recognition

Optical character recognition is a technology that converts different types of documents — such as scanned paper documents, PDFs or images captured by a camera — into machine-readable text. It’s widely used for digitizing printed materials so they can be searched, edited and processed electronically.

How OCR Works

  1. Image Preprocessing
    • Improves image quality (deskewing, noise removal, binarization).
  2. Character Detection
    • Identifies shapes that resemble letters or numbers.
  3. Pattern Recognition
    • Matches detected shapes against known character patterns.
  4. Postprocessing
    • Applies language models or dictionaries to correct errors.

Common Uses

  • Digitizing books and archives.
  • Automating data entry from printed forms.
  • Extracting text from invoices, receipts or ID cards.
  • Making scanned PDFs accessible (tagging + searchable text).
Tip for Checking Accessibility

Use Adobe Acrobat’s Accessibility Checker (All Tools > Prepare for Accessibility > Check for Accessibility) to scan for missing tags, reading order issues and contrast problems. Manually verify that headings, lists and tables are tagged correctly.