diff --git a/html_manual/cpdfmanual.html b/html_manual/cpdfmanual.html index f11688e..5cc7ddf 100644 --- a/html_manual/cpdfmanual.html +++ b/html_manual/cpdfmanual.html @@ -19,22 +19,22 @@ >

Coherent PDF +class="cmssbx-10x-x-248">Coherent PDF

Command Line Toolkit +class="cmssbx-10x-x-248">Command Line Toolkit

User Manual
+class="cmr-17x-x-143">User Manual
Version 2.2 (March 2017)

Coherent Graphics Ltd +class="cmssbx-10x-x-172">Coherent Graphics Ltd

For bug reports, feature requests and comments, email
contact@coherentgraphics.co.uk +class="cmtt-10">contact@coherentgraphics.co.uk

©2017 Coherent Graphics Limited. All rights reserved. ISBN 978-0957671140 +class="tcrm-1000">©2017 Coherent Graphics Limited. All rights reserved. ISBN 978-0957671140

Adobe, Acrobat, Adobe PDF, Adobe Reader and PostScript are registered trademarks of Adobe Systems Incorporated. Windows, Powerpoint and Excel are registered trademarks of Microsoft Corporation. @@ -47,6 +47,50 @@ Corporation.

Contents

+ 1 Basic Usage +
 1.1 Input and Output Files +
 1.2 Input Ranges +
 1.3 Working with Encrypted Documents +
 1.4 Standard Input and Standard Output +
 1.5 Doing Several Things at Once with AND +
 1.6 Units +
 1.7 Setting the Producer and Creator +
 1.8 PDF Version Numbers +
 1.9 File IDs +
 1.10 Linearization +
 1.11 Object Streams +
 1.12 Malformed Files +
 1.13 Error Handling +
 1.14 Control Files +
 1.15 String Arguments +
 1.16 Text Encodings +
 1.17 Font Embedding +
2 Merging and Splitting +
 2.1 Merging +
 2.2 Splitting +
 2.3 Splitting on Bookmarks
@@ -60,26 +104,620 @@ Corporation.

Typographical Conventions

Command lines to be typed are shown in typewriter font in a -box. For example: +class="cmtt-10">typewriterfont in a box. +For example:

-

When describing the general form of a command, rather than a particular example, square brackets -[] are used to enclose optional parts, and angled braces <> to enclose general descriptions which -may be substituted for particular instances. For example, +

When describing the general form of a command, rather than a particular example, square brackets [] +are used to enclose optional parts, and angled braces <> to enclose general descriptions which may be +substituted for particular instances. For example,

describes a command line which requires an operation and, optionally, a range. An exception is that we use in.pdf and out.pdf instead of <input file> and <output file> to reduce verbosity. -Under Microsoft Windows, type cpdf.exe instead of cpdf. +class="cmtt-10">in.pdf and out.pdf instead of <input file> and <output file> to reduce verbosity. Under +Microsoft Windows, type cpdf.exe instead of cpdf. + + +

+ + + + +

+ + + + +

+ + +

Chapter 1
Basic Usage

+

+

The Coherent PDF tools provide a wide range of facilities for modifying PDF files +created by other means. There is a single command-line program cpdf (cpdf.exe under +Microsoft Windows). The rest of this manual describes the options that may be given to this +program. + + +

1.1 Input and Output Files

+

The typical pattern for usage is +

+

and the simplest concrete example, assuming the existence of a file in.pdf is: +

+

which copies in.pdf to out.pdf. The input and output may be the same file. Of course, we should like +to do more interesting things to the PDF file than that! +

Files on the command line are distinguished from other input by their containing a period. If an +input file does not contain a period, it should be preceded by -i. For example: +

+

A whole directory of files may be added (where a command supports multiple files) by using the -idir +option: +

+

The files in the directory myfiles are considered in alphabetical order. They must all be PDF files. If +the names of the files are numeric, leading zeroes will be required for the order to be correct (e.g +001.pdf, 002.pdf etc). +

+

1.2 Input Ranges

+

An input range may be specified after each input file. This is treated differently by each operation. +For instance +

+

extracts pages two, three, four and five from in.pdf, writing the result to out.pdf, assuming that +in.pdf contains at least five pages. Here are the rules for building input ranges: + + +

+

For example: +

+ + + +

+

1.3 Working with Encrypted Documents

+ + + +

In order to perform many operations, encrypted input PDF files must be decrypted. Some require the +owner password, some either the user or owner passwords. Either password is supplied +by writing user=<password> or owner=<password> following each input file requiring it +(before or after any range). The document will not be re-encrypted upon writing. For +example: +

+

To re-encrypt the file with its existing encryption upon writing, which is required if only the user +password was supplied, but allowed in any case, add the -recrypt option: +

+

The password required (owner or user) depends upon the operation being performed. Separate +facilities are provided to decrypt and encrypt files (See Section ??). +

+

1.4 Standard Input and Standard Output

+ + +

Thus far, we have assumed that the input PDF will be read from a file on disk, and the output written +similarly. Often it’s useful to be able to read input from stdin (Standard Input) or write output to +stdout (Standard Output) instead. The typical use is to join several programs together into a pipe, +passing data from one to the next without the use of intermediate files. Use -stdin to read from +standard input, and -stdout to write to standard input, either to pipe data between multiple +programs, or multiple invocations of the same program. For example, this sequence of commands (all +typed on one line) +

+

extracts the last five pages of in.pdf in the correct order, writing them to out.pdf. It does this by +reversing the input, taking the first five pages and then reversing the result. +

To supply passwords for a file from -stdin, use -stdin-owner <password> and/or -stdin-user +<password>. +

Using -stdout on the final command in the pipeline to output the PDF to screen is not +recommended, since PDF files often contain compressed sections which are not screen-readable. +

Several cpdf operations write to standard output by default (for example, listing fonts). A useful +feature of the command line (not specific to cpdf) is the ability to redirect this output to a file. This is +achieved with the > operator: + + +

+

+

1.5 Doing Several Things at Once with AND

+

The keyword AND can be used to string together several commands in one. The advantage compared +with using pipes is that the file need not be repeatedly parsed and written out, saving +time. +

To use AND, simply leave off the output specifier (e.g -o) of one command, and the input specifier +(e.g filename) of the next. For instance: +

+

To specify the range for each section, use -range: +

+

+

1.6 Units

+ +

When measurements are given to cpdf, they are in points (1 point = 1/72 inch). They +may optionally be followed by some letters to change the measurement. The following are +supported: +

+ + +


+ + +
+pt  Points (72 points per inch). The default.
+cm  Centimeters
+mm  Millimeters
+in  Inches
+
+ + +

+
+

For example, one may write 14mm or 21.6in. In addition, the following letters stand, in some +operations (-scale-page, -scale-to-fit, -scale-contents, -shift, -mediabox,
-crop) for various page dimensions: +

+ + +


+ + +
+   PW  Page width
+   PH  Page height
+PMINX  Page minimum x coordinate
+PMINY  Page minimum y coordinate
+PMAXX  Page maximum  x coordinate
+PMAXY  Page maximum  y coordinate
+   CW  Crop box width
+   CH  Crop box height
+CMINX  Crop box minimum  x coordinate
+CMINY  Crop box minimum  y coordinate
+CMAXX  Crop box maximum  x coordinate
+CMAXY  Crop box maximum  y coordinate
+
+ + +

+
+

For example, we may write PMINX PMINY to stand for the coordinate of the lower left corner of the +page. +

Simple arithmetic may be performed using the words add, sub, mul and div to stand for addition, +subtraction, multiplication and division. For example, one may write 14insub30pt or PMINXmul +2 +

1.7 Setting the Producer and Creator

+

The -producer and -creator options may be added to any cpdf command line to set the +producer and/or creator of the PDF file. If the file was converted from another format, the +creator is the program producing the original, the producer the program converting it to +PDF. +

+

+

1.8 PDF Version Numbers

+ +

When an operation which uses a part of the PDF standard which was introduced in a later version +than that of the input file, the PDF version in the output file is set to the later version (most PDF +viewers will try to load any PDF file, even if it is marked with a later version number). +However, this automatic version changing may be suppressed with the -keep-version +flag. +

Here is a list of Acrobat versions together with the maximum PDF version they are intended to +support: +

+ PDF 1.2  Acrobat 3.0
+PDF 1.3  Acrobat 4.0
+PDF 1.4  Acrobat 5.0
+PDF 1.5  Acrobat 6.0
+PDF 1.6  Acrobat 7.0
+PDF 1.7  Acrobat 8.0, 9.0, 10.0
+

If you wish to manually alter the PDF version of a file, use the -set-version option described in +Section ??. +

+ + +

1.9 File IDs

+

PDF files contain an ID (consisting of two parts), used by some workflow systems to uniquely identify +a file. To change the ID, behavior, use the -change-id operation. This will create a new ID for the +output file. +

+

+

1.10 Linearization

+ +

Linearized PDF is a version of the PDF format in which the data is held in a special manner to allow +content to be fetched only when needed. This means viewing a multipage PDF over a slow connection +is more responsive. By default, cpdf does not linearize output files. To make it do so, add +the -l option to the command line, in addition to any other command being used. For +example: +

+

This requires the existence of the external program cpdflin which is provided with commercial +versions of cpdf. This must be installed as described in the installation documentation provided with +your copy of cpdf. If you are unable to install cpdflin, you must use -cpdflin to let cpdf know +where to find it: +

+

In extremis, you may place cpdflin and its resources in the current working directory, though this +is not recommended. For further help, refer to the installation instructions for your copy of +cpdf. +

To keep the existing linearization status of a file (produce linearized output if the input is +linearized and the reverse), use -keep-l instead of -l. +

+

1.11 Object Streams

+

PDF 1.5 introduced a new mechanism for storing objects to save space: object streams. by default, +cpdf will preserve object streams in input files, creating no more. To prevent the retention of existing +object streams, use -no-preserve-objstm: +

+

To create new object streams if none exist, or augment the existing ones, use -create-objstm: + + +

+

To create wholly new object streams, use both options together: +

+

Files written with object streams will be set to PDF 1.5 or higher, unless -keep-version is used (see +above). +

+

1.12 Malformed Files

+

There are many malformed PDF files in existence, including many produced by otherwise-reputable +applications. cpdf attempts to correct these problems silently. +

Grossly malformed files will be reconstructed. The reconstruction progress is shown on stderr +(Standard Error): +

+

Sometimes files can be technically well-formed but use inefficient PDF constructs. If you are sure the +input files you are using are impeccably formed, the -fast option added to the command line (or, if +using AND, to each section of the command line). This will use certain shortcuts which speed up +processing, but would fail on badly-produced files. +

The -fast option may be used with: +

+

If problems occur, refrain from using -fast. +

+

1.13 Error Handling

+ +

When cpdf encounters an error, it exits with code 2. An error message is displayed on stderr +(Standard Error). In normal usage, this means it’s displayed on the screen. When a bad or +inappropriate password is given, the exit code is 1. + + +

+

1.14 Control Files

+ +

+

Some operating systems have a limit on the length of a command line. To circumvent this, or +simply for reasons of flexibility, a control file may be specified from which arguments are drawn. This +file does not support the full syntax of the command line. Commands are separated by whitespace, +quotation marks may be used if an argument contains a space, and the sequence \" may be used to +introduce a genuine quotation mark in such an argument. +

Several -control arguments may be specified, and may be mixed in with conventional +command-line arguments. The commands in each control file are considered in the order in which they +are given, after all conventional arguments have been processed. It is recommended to use -args in all +new applications. However, -control will be supported for legacy applications. +

To avoid interference between -control and AND, a new mechanism has been added. Using -args +in place of -control will perform direct textual substitution of the file into the command line, prior to +any other processing. +

+

1.15 String Arguments

+

Command lines are handled differently on each operating system. Some characters are reserved with +special meanings, even when they occur inside quoted string arguments. To avoid this problem, +cpdf performs processing on string arguments as they are read. +

A backslash is used to indicate that a character which would otherwise be treated specially by the +command line interpreter is to be treated literally. For example, Unix-like systems attribute a special +meaning to the exclamation mark, so the command line +

+

would fail. We must escape the exclamation mark with a backslash: +

+

It follows that backslashes intended to be taken literally must themselves be escaped (i.e. written +\\). +

+

1.16 Text Encodings

+ +

Some cpdf commands write text to standard output, or read text from the command line or +configuration files. These are: + + +

+

There are three options to control how the text is interpreted: +

+

Add -utf8 to use Unicode UTF8, -stripped to convert to 7 bit ASCII by dropping any high +characters, or -raw to perform no processing. The default is -stripped. +

+

1.17 Font Embedding

+

Use the -no-embed-font to avoid embedding the Standard 14 Font metrics when adding text with +-add-text. + + +

Chapter 2
Merging and Splitting

+

2.1 Merging

+ +

The -merge operation allow the merging of several files into one. Ranges can be used to select only a +subset of pages from each input file in the output. The output file consists of the concatenation of all +the input pages in the order specified on the command line. Actually, the -merge can be omitted, since +this is the default operation of cpdf. +

+

Merge maintains bookmarks, named destinations, and name dictionaries. +

Forms and other objects which cannot be merged are retained if they are from the document which +first exhibits that feature. +

The -retain-numbering option keeps the PDF page numbering labels of each document intact, +rather than renumbering the output pages from 1. +

The -remove-duplicate-fonts ensures that fonts used in more than one of the inputs only +appear once in the output. +

+

2.2 Splitting

+ +

The -split operation splits a PDF file into a number of parts which are written to file, their names +being generated from a format. The optional -chunk option allows the number of pages written to +each output file to be set. +

+

If the output format does not provide enough numbers for the files generated, the result is unspecified. +The following format operators may be used: +

+ + +


+ + +
+%, %%, %%% etc.  Sequence number padded to the number of percent signs
+             @F  Original filename without extension
+             @N  Sequence number without padding zeroes
+             @S  Start page of this chunk
+             @E  End page of this chunk
+             @B  Bookmark name  at this page
+
+ + +

+
+

2.3 Splitting on Bookmarks

+ +

The -split-bookmarks <level> operation splits a PDF file into a number of parts, according to the +page ranges implied by the document’s bookmarks. These parts are then written to file with names +generated from the given format. +

Level 0 denotes the top-level bookmarks, level 1 the next level (sub-bookmarks) and so on. So +-split-bookmarks 1 creates breaks on level 0 and level 1 boundaries. +

+

Now, there may be many bookmarks on a single page (for instance, if paragraphs are bookmarked or +there are two subsections on one page). The splits calculated by -split-bookmarks ensure that each +page appears in only one of the output files. It is possible to use the @ operators above, including +operator @B which expands to the text of the bookmark: +

+

The bookmark text used for a name is converted from unicode to 7 bit ASCII, and the +following characters are removed, in addition to any character with ASCII code less than +32: +