Improve struct tree splitting docs
This commit is contained in:
parent
e889d137e5
commit
5e48becc95
BIN
cpdfmanual.pdf
BIN
cpdfmanual.pdf
Binary file not shown.
|
@ -5176,10 +5176,10 @@ If the drawing range is a single page, and the next page already exists, the dra
|
||||||
\noindent\verb!cpdf -print-struct-tree in.pdf!
|
\noindent\verb!cpdf -print-struct-tree in.pdf!
|
||||||
|
|
||||||
\vspace{1.5mm}
|
\vspace{1.5mm}
|
||||||
\noindent\verb!cpdf -extract-struct-tree in.pdf -o out.json]!
|
\noindent\verb!cpdf -extract-struct-tree in.pdf -o out.json!
|
||||||
|
|
||||||
\vspace{1.5mm}
|
\vspace{1.5mm}
|
||||||
\noindent\verb!cpdf -replace-struct-tree in.json in.pdf -o out.pdf]!
|
\noindent\verb!cpdf -replace-struct-tree in.json in.pdf -o out.pdf!
|
||||||
|
|
||||||
\vspace{1.5mm}
|
\vspace{1.5mm}
|
||||||
\noindent\verb!cpdf -verify "PDF/UA-1(matterhorn)" [-json] in.pdf!
|
\noindent\verb!cpdf -verify "PDF/UA-1(matterhorn)" [-json] in.pdf!
|
||||||
|
@ -5195,15 +5195,19 @@ If the drawing range is a single page, and the next page already exists, the dra
|
||||||
|
|
||||||
\end{framed}}
|
\end{framed}}
|
||||||
|
|
||||||
PDF/UA (Universal Accessibility) is a PDF subformat whose rules consist of a set of machine-checkable and human-checkable-only requirements to make PDF documents accessible for all users - for example, those using screen readers. Cpdf has some basic facilities for manipulating the extra PDF constructs which are used in (amongst others) PDF/UA, and a basic verifier for most of the machine-checkable requirements.
|
PDF/UA (Universal Accessibility) is a PDF subformat whose rules consist of a set of machine-checkable and human-checkable-only requirements to make PDF documents accessible for all users - for example, those using screen readers. Cpdf has some basic facilities for manipulating the extra PDF constructs which are used in (amongst others) PDF/UA, and a basic verifier for many of the machine-checkable requirements.
|
||||||
|
|
||||||
\section{Structure trees}
|
\section{Structure trees}
|
||||||
|
|
||||||
In a PDF document, the optional Structure Tree is a parallel construct which describes the logical structure of a document (as opposed to the information for rendering the document on the screen or printing it out, which every PDF of course contains).
|
In a PDF document, the optional Structure Tree is a parallel construct which describes the logical structure of a document (as opposed to the information for rendering the document on the screen or printing it out, which every PDF of course contains.)
|
||||||
|
|
||||||
We can print an abbreviated form of the structure tree to standard output with \texttt{cpdf -print-struct-tree in.pdf}:
|
We can print an abbreviated form of the structure tree to standard output:
|
||||||
|
|
||||||
\smallgap
|
\begin{framed}
|
||||||
|
\noindent\small\verb!cpdf -print-struct-tree in.pdf!
|
||||||
|
\end{framed}
|
||||||
|
|
||||||
|
\noindent This might yield:
|
||||||
|
|
||||||
\begin{minipage}{\linewidth}
|
\begin{minipage}{\linewidth}
|
||||||
\begin{framed}
|
\begin{framed}
|
||||||
|
@ -5230,7 +5234,13 @@ We can print an abbreviated form of the structure tree to standard output with \
|
||||||
\end{minipage}
|
\end{minipage}
|
||||||
|
|
||||||
\smallgap
|
\smallgap
|
||||||
\noindent The numbers in parentheses are the page numbers for structure elements, where present. To extract the full structure tree to JSON, we can use \texttt{cpdf -extract-struct-tree in.pdf -o out.json}:
|
\noindent The numbers in parentheses are the page numbers for structure elements, where present. We can extract the full structure tree to JSON for inspection or manupulation:
|
||||||
|
|
||||||
|
\begin{framed}
|
||||||
|
\noindent\small\verb!cpdf -extract-struct-tree in.pdf -o out.json!
|
||||||
|
\end{framed}
|
||||||
|
|
||||||
|
\noindent Here is a typical fragment:
|
||||||
|
|
||||||
{\small\begin{verbatim}
|
{\small\begin{verbatim}
|
||||||
[
|
[
|
||||||
|
@ -5272,9 +5282,19 @@ We can print an abbreviated form of the structure tree to standard output with \
|
||||||
|
|
||||||
\noindent This JSON file contains the structure tree objects from the file, using the format described in chapter \ref{chap:15}. There is a special entry in object \texttt{0} which gives the key to the page object numbers. In this example, there is one page with object number \texttt{52}.
|
\noindent This JSON file contains the structure tree objects from the file, using the format described in chapter \ref{chap:15}. There is a special entry in object \texttt{0} which gives the key to the page object numbers. In this example, there is one page with object number \texttt{52}.
|
||||||
|
|
||||||
This JSON file can be edited, for example to change text strings, and reapplied with \texttt{cpdf -replace-struct-tree out.json in.pdf -o out.pdf}. If extra objects are required, they should be introduced with negative object numbers: cpdf will renumber them on import so as not to clash with any existing numbers.
|
This JSON file can be edited, for example to change text strings, and reapplied to the same file from which it was extracted:
|
||||||
|
|
||||||
To remove a structure tree from a PDF, we can use \texttt{-remove-dict-entry} from Chapter \ref{chap:misc} i.e \texttt{cpdf -remove-dict-entry /StructTreeRoot in.pdf -o out.pdf}.
|
\begin{framed}
|
||||||
|
\noindent\small\verb!cpdf -replace-struct-tree out.json in.pdf -o out.pdf!
|
||||||
|
\end{framed}
|
||||||
|
|
||||||
|
\noindent If extra objects are required, they should be introduced with negative object numbers: cpdf will renumber them on import so as not to clash with any existing numbers.
|
||||||
|
|
||||||
|
To remove a structure tree from a PDF, we can use \texttt{-remove-dict-entry} from Chapter \ref{chap:misc}, in other words:
|
||||||
|
|
||||||
|
\begin{framed}
|
||||||
|
\noindent\small\verb!cpdf -remove-dict-entry /StructTreeRoot in.pdf -o out.pdf!
|
||||||
|
\end{framed}
|
||||||
|
|
||||||
\section{Verifying conformance to PDF/UA}
|
\section{Verifying conformance to PDF/UA}
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue