Accessibility: PDFs
- Ensures screen readers can navigate and interpret content.
- Makes scanned documents searchable and readable.
- Preserves structure and accessibility features from source documents.
Use accessible PDFs when sharing finalized documents, forms, reports or resources that need to be preserved in a fixed format.
Use Adobe Acrobat or Microsoft Word to tag headings, lists and tables before exporting. Convert scanned PDFs into searchable text using optical character recognition.
Example
A scanned flyer should be run through optical character recognition and tagged with headings and alternative text before being uploaded to a website or emailed to users.
What are PDF Tags?
PDF tags are hidden markers that define the logical structure and reading order of content within a PDF document. They work like HTML tags, identifying different elements like headings, paragraphs and lists.
Here’s a list of common PDF tag names based on the PDF/UA and WCAG standards:
Document Structure Tags
- Document – Root element of the tagged PDF.
- Part – Major division of a document.
- Sect – Section within a part.
- Div – Generic container for grouping content.
Text-Level Tags
- P – Paragraph.
- H1, H2, H3, H4, H5, H6 – Headings (levels 1–6).
- Lbl – Label for list items.
- Span – Generic inline text container.
List and Table Tags
- L – List.
- LI – List item.
- Table – Table container.
- TR – Table row.
- TH – Table header cell.
- TD – Table data cell.
Specialized Tags
- Figure – Image or graphic.
- Caption – Caption for a figure or table.
- Artifact – Decorative or nonessential content (not read by assistive tech).
- Note – Footnote or endnote.
- Reference – Citation or reference.
Form Tags
- Form – Container for form fields.
- Field – Individual form field.
Optical Character Recognition
Optical character recognition is a technology that converts different types of documents — such as scanned paper documents, PDFs or images captured by a camera — into machine-readable text. It’s widely used for digitizing printed materials so they can be searched, edited and processed electronically.
How OCR Works
- Image Preprocessing
- Improves image quality (deskewing, noise removal, binarization).
- Character Detection
- Identifies shapes that resemble letters or numbers.
- Pattern Recognition
- Matches detected shapes against known character patterns.
- Postprocessing
- Applies language models or dictionaries to correct errors.
Common Uses
- Digitizing books and archives.
- Automating data entry from printed forms.
- Extracting text from invoices, receipts or ID cards.
- Making scanned PDFs accessible (tagging + searchable text).
Use Adobe Acrobat’s Accessibility Checker (All Tools > Prepare for Accessibility > Check for Accessibility) to scan for missing tags, reading order issues and contrast problems. Manually verify that headings, lists and tables are tagged correctly.