diff --git a/html_manual/cpdfmanual.html b/html_manual/cpdfmanual.html index f11688e..5cc7ddf 100644 --- a/html_manual/cpdfmanual.html +++ b/html_manual/cpdfmanual.html @@ -19,22 +19,22 @@ >
Coherent PDF +class="cmssbx-10x-x-248">Coherent PDF
Command Line Toolkit +class="cmssbx-10x-x-248">Command Line Toolkit
User Manual
+class="cmr-17x-x-143">User Manual
Version 2.2 (March 2017)
Coherent Graphics Ltd +class="cmssbx-10x-x-172">Coherent Graphics Ltd
For bug reports, feature requests and comments, email
contact@coherentgraphics.co.uk
+class="cmtt-10">contact@coherentgraphics.co.uk
©2017 Coherent Graphics Limited. All rights reserved. ISBN 978-0957671140 +class="tcrm-1000">©2017 Coherent Graphics Limited. All rights reserved. ISBN 978-0957671140
Adobe, Acrobat, Adobe PDF, Adobe Reader and PostScript are registered trademarks of Adobe Systems Incorporated. Windows, Powerpoint and Excel are registered trademarks of Microsoft Corporation. @@ -47,6 +47,50 @@ Corporation.
-
When describing the general form of a command, rather than a particular example, square brackets -[] are used to enclose optional parts, and angled braces <> to enclose general descriptions which -may be substituted for particular instances. For example, +
When describing the general form of a command, rather than a particular example, square brackets [] +are used to enclose optional parts, and angled braces <> to enclose general descriptions which may be +substituted for particular instances. For example,
describes a command line which requires an operation and, optionally, a range. An exception is that we use in.pdf and out.pdf instead of <input file> and <output file> to reduce verbosity. -Under Microsoft Windows, type cpdf.exe instead of cpdf. +class="cmtt-10">in.pdf and out.pdf instead of <input file> and <output file> to reduce verbosity. Under +Microsoft Windows, type cpdf.exe instead of cpdf. + + +
+ + + + +
+ + + + +
+ + +
+
The Coherent PDF tools provide a wide range of facilities for modifying PDF files +created by other means. There is a single command-line program cpdf (cpdf.exe under +Microsoft Windows). The rest of this manual describes the options that may be given to this +program. + + +
The typical pattern for usage is +
+
and the simplest concrete example, assuming the existence of a file in.pdf is: +
+
which copies in.pdf to out.pdf. The input and output may be the same file. Of course, we should like +to do more interesting things to the PDF file than that! +
Files on the command line are distinguished from other input by their containing a period. If an +input file does not contain a period, it should be preceded by -i. For example: +
+
A whole directory of files may be added (where a command supports multiple files) by using the -idir +option: +
+
The files in the directory myfiles are considered in alphabetical order. They must all be PDF files. If +the names of the files are numeric, leading zeroes will be required for the order to be correct (e.g +001.pdf, 002.pdf etc). +
+
An input range may be specified after each input file. This is treated differently by each operation. +For instance +
+
extracts pages two, three, four and five from in.pdf, writing the result to out.pdf, assuming that +in.pdf contains at least five pages. Here are the rules for building input ranges: + + +
For example: +
+
In order to perform many operations, encrypted input PDF files must be decrypted. Some require the +owner password, some either the user or owner passwords. Either password is supplied +by writing user=<password> or owner=<password> following each input file requiring it +(before or after any range). The document will not be re-encrypted upon writing. For +example: +
+
To re-encrypt the file with its existing encryption upon writing, which is required if only the user +password was supplied, but allowed in any case, add the -recrypt option: +
+
The password required (owner or user) depends upon the operation being performed. Separate +facilities are provided to decrypt and encrypt files (See Section ??). +
+
Thus far, we have assumed that the input PDF will be read from a file on disk, and the output written +similarly. Often it’s useful to be able to read input from stdin (Standard Input) or write output to +stdout (Standard Output) instead. The typical use is to join several programs together into a pipe, +passing data from one to the next without the use of intermediate files. Use -stdin to read from +standard input, and -stdout to write to standard input, either to pipe data between multiple +programs, or multiple invocations of the same program. For example, this sequence of commands (all +typed on one line) +
+
extracts the last five pages of in.pdf in the correct order, writing them to out.pdf. It does this by +reversing the input, taking the first five pages and then reversing the result. +
To supply passwords for a file from -stdin, use -stdin-owner <password> and/or -stdin-user +<password>. +
Using -stdout on the final command in the pipeline to output the PDF to screen is not +recommended, since PDF files often contain compressed sections which are not screen-readable. +
Several cpdf operations write to standard output by default (for example, listing fonts). A useful +feature of the command line (not specific to cpdf) is the ability to redirect this output to a file. This is +achieved with the > operator: + + +
+
+
The keyword AND can be used to string together several commands in one. The advantage compared +with using pipes is that the file need not be repeatedly parsed and written out, saving +time. +
To use AND, simply leave off the output specifier (e.g -o) of one command, and the input specifier +(e.g filename) of the next. For instance: +
+
To specify the range for each section, use -range: +
+
+
When measurements are given to cpdf, they are in points (1 point = 1/72 inch). They +may optionally be followed by some letters to change the measurement. The following are +supported: +
For example, one may write 14mm or 21.6in. In addition, the following letters stand, in some
+operations (-scale-page, -scale-to-fit, -scale-contents, -shift, -mediabox,
-crop) for various page dimensions:
+
For example, we may write PMINX PMINY to stand for the coordinate of the lower left corner of the +page. +
Simple arithmetic may be performed using the words add, sub, mul and div to stand for addition, +subtraction, multiplication and division. For example, one may write 14insub30pt or PMINXmul +2 +
The -producer and -creator options may be added to any cpdf command line to set the +producer and/or creator of the PDF file. If the file was converted from another format, the +creator is the program producing the original, the producer the program converting it to +PDF. +
+
+
When an operation which uses a part of the PDF standard which was introduced in a later version +than that of the input file, the PDF version in the output file is set to the later version (most PDF +viewers will try to load any PDF file, even if it is marked with a later version number). +However, this automatic version changing may be suppressed with the -keep-version +flag. +
Here is a list of Acrobat versions together with the maximum PDF version they are intended to +support: +
If you wish to manually alter the PDF version of a file, use the -set-version option described in +Section ??. +
+ + +
PDF files contain an ID (consisting of two parts), used by some workflow systems to uniquely identify +a file. To change the ID, behavior, use the -change-id operation. This will create a new ID for the +output file. +
+
+
Linearized PDF is a version of the PDF format in which the data is held in a special manner to allow +content to be fetched only when needed. This means viewing a multipage PDF over a slow connection +is more responsive. By default, cpdf does not linearize output files. To make it do so, add +the -l option to the command line, in addition to any other command being used. For +example: +
+
This requires the existence of the external program cpdflin which is provided with commercial +versions of cpdf. This must be installed as described in the installation documentation provided with +your copy of cpdf. If you are unable to install cpdflin, you must use -cpdflin to let cpdf know +where to find it: +
+
In extremis, you may place cpdflin and its resources in the current working directory, though this +is not recommended. For further help, refer to the installation instructions for your copy of +cpdf. +
To keep the existing linearization status of a file (produce linearized output if the input is +linearized and the reverse), use -keep-l instead of -l. +
+
PDF 1.5 introduced a new mechanism for storing objects to save space: object streams. by default, +cpdf will preserve object streams in input files, creating no more. To prevent the retention of existing +object streams, use -no-preserve-objstm: +
+
To create new object streams if none exist, or augment the existing ones, use -create-objstm: + + +
+
To create wholly new object streams, use both options together: +
+
Files written with object streams will be set to PDF 1.5 or higher, unless -keep-version is used (see +above). +
+
There are many malformed PDF files in existence, including many produced by otherwise-reputable +applications. cpdf attempts to correct these problems silently. +
Grossly malformed files will be reconstructed. The reconstruction progress is shown on stderr +(Standard Error): +
+
Sometimes files can be technically well-formed but use inefficient PDF constructs. If you are sure the +input files you are using are impeccably formed, the -fast option added to the command line (or, if +using AND, to each section of the command line). This will use certain shortcuts which speed up +processing, but would fail on badly-produced files. +
The -fast option may be used with: +
+
If problems occur, refrain from using -fast. +
+
When cpdf encounters an error, it exits with code 2. An error message is displayed on stderr +(Standard Error). In normal usage, this means it’s displayed on the screen. When a bad or +inappropriate password is given, the exit code is 1. + + +
+
+
Some operating systems have a limit on the length of a command line. To circumvent this, or +simply for reasons of flexibility, a control file may be specified from which arguments are drawn. This +file does not support the full syntax of the command line. Commands are separated by whitespace, +quotation marks may be used if an argument contains a space, and the sequence \" may be used to +introduce a genuine quotation mark in such an argument. +
Several -control arguments may be specified, and may be mixed in with conventional +command-line arguments. The commands in each control file are considered in the order in which they +are given, after all conventional arguments have been processed. It is recommended to use -args in all +new applications. However, -control will be supported for legacy applications. +
To avoid interference between -control and AND, a new mechanism has been added. Using -args +in place of -control will perform direct textual substitution of the file into the command line, prior to +any other processing. +
+
Command lines are handled differently on each operating system. Some characters are reserved with +special meanings, even when they occur inside quoted string arguments. To avoid this problem, +cpdf performs processing on string arguments as they are read. +
A backslash is used to indicate that a character which would otherwise be treated specially by the +command line interpreter is to be treated literally. For example, Unix-like systems attribute a special +meaning to the exclamation mark, so the command line +
+
would fail. We must escape the exclamation mark with a backslash: +
+
It follows that backslashes intended to be taken literally must themselves be escaped (i.e. written +\\). +
+
Some cpdf commands write text to standard output, or read text from the command line or +configuration files. These are: + + +
+
There are three options to control how the text is interpreted: +
+
Add -utf8 to use Unicode UTF8, -stripped to convert to 7 bit ASCII by dropping any high +characters, or -raw to perform no processing. The default is -stripped. +
+
Use the -no-embed-font to avoid embedding the Standard 14 Font metrics when adding text with +-add-text. + + +
The -merge operation allow the merging of several files into one. Ranges can be used to select only a +subset of pages from each input file in the output. The output file consists of the concatenation of all +the input pages in the order specified on the command line. Actually, the -merge can be omitted, since +this is the default operation of cpdf. +
+
Merge maintains bookmarks, named destinations, and name dictionaries. +
Forms and other objects which cannot be merged are retained if they are from the document which +first exhibits that feature. +
The -retain-numbering option keeps the PDF page numbering labels of each document intact, +rather than renumbering the output pages from 1. +
The -remove-duplicate-fonts ensures that fonts used in more than one of the inputs only +appear once in the output. +
+
The -split operation splits a PDF file into a number of parts which are written to file, their names +being generated from a format. The optional -chunk option allows the number of pages written to +each output file to be set. +
+
If the output format does not provide enough numbers for the files generated, the result is unspecified. +The following format operators may be used: +
The -split-bookmarks <level> operation splits a PDF file into a number of parts, according to the +page ranges implied by the document’s bookmarks. These parts are then written to file with names +generated from the given format. +
Level 0 denotes the top-level bookmarks, level 1 the next level (sub-bookmarks) and so on. So +-split-bookmarks 1 creates breaks on level 0 and level 1 boundaries. +
+
Now, there may be many bookmarks on a single page (for instance, if paragraphs are bookmarked or +there are two subsections on one page). The splits calculated by -split-bookmarks ensure that each +page appears in only one of the output files. It is possible to use the @ operators above, including +operator @B which expands to the text of the bookmark: +
+
The bookmark text used for a name is converted from unicode to 7 bit ASCII, and the +following characters are removed, in addition to any character with ASCII code less than +32: +