1774 lines
60 KiB
HTML
1774 lines
60 KiB
HTML
|
<HTML>
|
||
|
<HEAD>
|
||
|
<!-- This HTML file has been created by texi2html 1.54
|
||
|
from manual.texi on 23 March 2000 -->
|
||
|
|
||
|
<TITLE>bzip2 and libbzip2 - Programming with libbzip2</TITLE>
|
||
|
<link href="manual_4.html" rel=Next>
|
||
|
<link href="manual_2.html" rel=Previous>
|
||
|
<link href="manual_toc.html" rel=ToC>
|
||
|
|
||
|
</HEAD>
|
||
|
<BODY>
|
||
|
<p>Go to the <A HREF="manual_1.html">first</A>, <A HREF="manual_2.html">previous</A>, <A HREF="manual_4.html">next</A>, <A HREF="manual_4.html">last</A> section, <A HREF="manual_toc.html">table of contents</A>.
|
||
|
<P><HR><P>
|
||
|
|
||
|
|
||
|
<H1><A NAME="SEC12" HREF="manual_toc.html#TOC12">Programming with <CODE>libbzip2</CODE></A></H1>
|
||
|
|
||
|
<P>
|
||
|
This chapter describes the programming interface to <CODE>libbzip2</CODE>.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
For general background information, particularly about memory
|
||
|
use and performance aspects, you'd be well advised to read Chapter 2
|
||
|
as well.
|
||
|
|
||
|
</P>
|
||
|
|
||
|
|
||
|
<H2><A NAME="SEC13" HREF="manual_toc.html#TOC13">Top-level structure</A></H2>
|
||
|
|
||
|
<P>
|
||
|
<CODE>libbzip2</CODE> is a flexible library for compressing and decompressing
|
||
|
data in the <CODE>bzip2</CODE> data format. Although packaged as a single
|
||
|
entity, it helps to regard the library as three separate parts: the low
|
||
|
level interface, and the high level interface, and some utility
|
||
|
functions.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
The structure of <CODE>libbzip2</CODE>'s interfaces is similar to
|
||
|
that of Jean-loup Gailly's and Mark Adler's excellent <CODE>zlib</CODE>
|
||
|
library.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
All externally visible symbols have names beginning <CODE>BZ2_</CODE>.
|
||
|
This is new in version 1.0. The intention is to minimise pollution
|
||
|
of the namespaces of library clients.
|
||
|
|
||
|
</P>
|
||
|
|
||
|
|
||
|
<H3><A NAME="SEC14" HREF="manual_toc.html#TOC14">Low-level summary</A></H3>
|
||
|
|
||
|
<P>
|
||
|
This interface provides services for compressing and decompressing
|
||
|
data in memory. There's no provision for dealing with files, streams
|
||
|
or any other I/O mechanisms, just straight memory-to-memory work.
|
||
|
In fact, this part of the library can be compiled without inclusion
|
||
|
of <CODE>stdio.h</CODE>, which may be helpful for embedded applications.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
The low-level part of the library has no global variables and
|
||
|
is therefore thread-safe.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Six routines make up the low level interface:
|
||
|
<CODE>BZ2_bzCompressInit</CODE>, <CODE>BZ2_bzCompress</CODE>, and <BR> <CODE>BZ2_bzCompressEnd</CODE>
|
||
|
for compression,
|
||
|
and a corresponding trio <CODE>BZ2_bzDecompressInit</CODE>, <BR> <CODE>BZ2_bzDecompress</CODE>
|
||
|
and <CODE>BZ2_bzDecompressEnd</CODE> for decompression.
|
||
|
The <CODE>*Init</CODE> functions allocate
|
||
|
memory for compression/decompression and do other
|
||
|
initialisations, whilst the <CODE>*End</CODE> functions close down operations
|
||
|
and release memory.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
The real work is done by <CODE>BZ2_bzCompress</CODE> and <CODE>BZ2_bzDecompress</CODE>.
|
||
|
These compress and decompress data from a user-supplied input buffer
|
||
|
to a user-supplied output buffer. These buffers can be any size;
|
||
|
arbitrary quantities of data are handled by making repeated calls
|
||
|
to these functions. This is a flexible mechanism allowing a
|
||
|
consumer-pull style of activity, or producer-push, or a mixture of
|
||
|
both.
|
||
|
|
||
|
</P>
|
||
|
|
||
|
|
||
|
|
||
|
<H3><A NAME="SEC15" HREF="manual_toc.html#TOC15">High-level summary</A></H3>
|
||
|
|
||
|
<P>
|
||
|
This interface provides some handy wrappers around the low-level
|
||
|
interface to facilitate reading and writing <CODE>bzip2</CODE> format
|
||
|
files (<CODE>.bz2</CODE> files). The routines provide hooks to facilitate
|
||
|
reading files in which the <CODE>bzip2</CODE> data stream is embedded
|
||
|
within some larger-scale file structure, or where there are
|
||
|
multiple <CODE>bzip2</CODE> data streams concatenated end-to-end.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
For reading files, <CODE>BZ2_bzReadOpen</CODE>, <CODE>BZ2_bzRead</CODE>,
|
||
|
<CODE>BZ2_bzReadClose</CODE> and <BR> <CODE>BZ2_bzReadGetUnused</CODE> are supplied. For
|
||
|
writing files, <CODE>BZ2_bzWriteOpen</CODE>, <CODE>BZ2_bzWrite</CODE> and
|
||
|
<CODE>BZ2_bzWriteFinish</CODE> are available.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
As with the low-level library, no global variables are used
|
||
|
so the library is per se thread-safe. However, if I/O errors
|
||
|
occur whilst reading or writing the underlying compressed files,
|
||
|
you may have to consult <CODE>errno</CODE> to determine the cause of
|
||
|
the error. In that case, you'd need a C library which correctly
|
||
|
supports <CODE>errno</CODE> in a multithreaded environment.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
To make the library a little simpler and more portable,
|
||
|
<CODE>BZ2_bzReadOpen</CODE> and <CODE>BZ2_bzWriteOpen</CODE> require you to pass them file
|
||
|
handles (<CODE>FILE*</CODE>s) which have previously been opened for reading or
|
||
|
writing respectively. That avoids portability problems associated with
|
||
|
file operations and file attributes, whilst not being much of an
|
||
|
imposition on the programmer.
|
||
|
|
||
|
</P>
|
||
|
|
||
|
|
||
|
|
||
|
<H3><A NAME="SEC16" HREF="manual_toc.html#TOC16">Utility functions summary</A></H3>
|
||
|
<P>
|
||
|
For very simple needs, <CODE>BZ2_bzBuffToBuffCompress</CODE> and
|
||
|
<CODE>BZ2_bzBuffToBuffDecompress</CODE> are provided. These compress
|
||
|
data in memory from one buffer to another buffer in a single
|
||
|
function call. You should assess whether these functions
|
||
|
fulfill your memory-to-memory compression/decompression
|
||
|
requirements before investing effort in understanding the more
|
||
|
general but more complex low-level interface.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Yoshioka Tsuneo (<CODE>QWF00133@niftyserve.or.jp</CODE> /
|
||
|
<CODE>tsuneo-y@is.aist-nara.ac.jp</CODE>) has contributed some functions to
|
||
|
give better <CODE>zlib</CODE> compatibility. These functions are
|
||
|
<CODE>BZ2_bzopen</CODE>, <CODE>BZ2_bzread</CODE>, <CODE>BZ2_bzwrite</CODE>, <CODE>BZ2_bzflush</CODE>,
|
||
|
<CODE>BZ2_bzclose</CODE>,
|
||
|
<CODE>BZ2_bzerror</CODE> and <CODE>BZ2_bzlibVersion</CODE>. You may find these functions
|
||
|
more convenient for simple file reading and writing, than those in the
|
||
|
high-level interface. These functions are not (yet) officially part of
|
||
|
the library, and are minimally documented here. If they break, you
|
||
|
get to keep all the pieces. I hope to document them properly when time
|
||
|
permits.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Yoshioka also contributed modifications to allow the library to be
|
||
|
built as a Windows DLL.
|
||
|
|
||
|
</P>
|
||
|
|
||
|
|
||
|
|
||
|
<H2><A NAME="SEC17" HREF="manual_toc.html#TOC17">Error handling</A></H2>
|
||
|
|
||
|
<P>
|
||
|
The library is designed to recover cleanly in all situations, including
|
||
|
the worst-case situation of decompressing random data. I'm not
|
||
|
100% sure that it can always do this, so you might want to add
|
||
|
a signal handler to catch segmentation violations during decompression
|
||
|
if you are feeling especially paranoid. I would be interested in
|
||
|
hearing more about the robustness of the library to corrupted
|
||
|
compressed data.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Version 1.0 is much more robust in this respect than
|
||
|
0.9.0 or 0.9.5. Investigations with Checker (a tool for
|
||
|
detecting problems with memory management, similar to Purify)
|
||
|
indicate that, at least for the few files I tested, all single-bit
|
||
|
errors in the decompressed data are caught properly, with no
|
||
|
segmentation faults, no reads of uninitialised data and no
|
||
|
out of range reads or writes. So it's certainly much improved,
|
||
|
although I wouldn't claim it to be totally bombproof.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
The file <CODE>bzlib.h</CODE> contains all definitions needed to use
|
||
|
the library. In particular, you should definitely not include
|
||
|
<CODE>bzlib_private.h</CODE>.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
In <CODE>bzlib.h</CODE>, the various return values are defined. The following
|
||
|
list is not intended as an exhaustive description of the circumstances
|
||
|
in which a given value may be returned -- those descriptions are given
|
||
|
later. Rather, it is intended to convey the rough meaning of each
|
||
|
return value. The first five actions are normal and not intended to
|
||
|
denote an error situation.
|
||
|
<DL COMPACT>
|
||
|
|
||
|
<DT><CODE>BZ_OK</CODE>
|
||
|
<DD>
|
||
|
The requested action was completed successfully.
|
||
|
<DT><CODE>BZ_RUN_OK</CODE>
|
||
|
<DD>
|
||
|
<DT><CODE>BZ_FLUSH_OK</CODE>
|
||
|
<DD>
|
||
|
<DT><CODE>BZ_FINISH_OK</CODE>
|
||
|
<DD>
|
||
|
In <CODE>BZ2_bzCompress</CODE>, the requested flush/finish/nothing-special action
|
||
|
was completed successfully.
|
||
|
<DT><CODE>BZ_STREAM_END</CODE>
|
||
|
<DD>
|
||
|
Compression of data was completed, or the logical stream end was
|
||
|
detected during decompression.
|
||
|
</DL>
|
||
|
|
||
|
<P>
|
||
|
The following return values indicate an error of some kind.
|
||
|
<DL COMPACT>
|
||
|
|
||
|
<DT><CODE>BZ_CONFIG_ERROR</CODE>
|
||
|
<DD>
|
||
|
Indicates that the library has been improperly compiled on your
|
||
|
platform -- a major configuration error. Specifically, it means
|
||
|
that <CODE>sizeof(char)</CODE>, <CODE>sizeof(short)</CODE> and <CODE>sizeof(int)</CODE>
|
||
|
are not 1, 2 and 4 respectively, as they should be. Note that the
|
||
|
library should still work properly on 64-bit platforms which follow
|
||
|
the LP64 programming model -- that is, where <CODE>sizeof(long)</CODE>
|
||
|
and <CODE>sizeof(void*)</CODE> are 8. Under LP64, <CODE>sizeof(int)</CODE> is
|
||
|
still 4, so <CODE>libbzip2</CODE>, which doesn't use the <CODE>long</CODE> type,
|
||
|
is OK.
|
||
|
<DT><CODE>BZ_SEQUENCE_ERROR</CODE>
|
||
|
<DD>
|
||
|
When using the library, it is important to call the functions in the
|
||
|
correct sequence and with data structures (buffers etc) in the correct
|
||
|
states. <CODE>libbzip2</CODE> checks as much as it can to ensure this is
|
||
|
happening, and returns <CODE>BZ_SEQUENCE_ERROR</CODE> if not. Code which
|
||
|
complies precisely with the function semantics, as detailed below,
|
||
|
should never receive this value; such an event denotes buggy code
|
||
|
which you should investigate.
|
||
|
<DT><CODE>BZ_PARAM_ERROR</CODE>
|
||
|
<DD>
|
||
|
Returned when a parameter to a function call is out of range
|
||
|
or otherwise manifestly incorrect. As with <CODE>BZ_SEQUENCE_ERROR</CODE>,
|
||
|
this denotes a bug in the client code. The distinction between
|
||
|
<CODE>BZ_PARAM_ERROR</CODE> and <CODE>BZ_SEQUENCE_ERROR</CODE> is a bit hazy, but still worth
|
||
|
making.
|
||
|
<DT><CODE>BZ_MEM_ERROR</CODE>
|
||
|
<DD>
|
||
|
Returned when a request to allocate memory failed. Note that the
|
||
|
quantity of memory needed to decompress a stream cannot be determined
|
||
|
until the stream's header has been read. So <CODE>BZ2_bzDecompress</CODE> and
|
||
|
<CODE>BZ2_bzRead</CODE> may return <CODE>BZ_MEM_ERROR</CODE> even though some of
|
||
|
the compressed data has been read. The same is not true for
|
||
|
compression; once <CODE>BZ2_bzCompressInit</CODE> or <CODE>BZ2_bzWriteOpen</CODE> have
|
||
|
successfully completed, <CODE>BZ_MEM_ERROR</CODE> cannot occur.
|
||
|
<DT><CODE>BZ_DATA_ERROR</CODE>
|
||
|
<DD>
|
||
|
Returned when a data integrity error is detected during decompression.
|
||
|
Most importantly, this means when stored and computed CRCs for the
|
||
|
data do not match. This value is also returned upon detection of any
|
||
|
other anomaly in the compressed data.
|
||
|
<DT><CODE>BZ_DATA_ERROR_MAGIC</CODE>
|
||
|
<DD>
|
||
|
As a special case of <CODE>BZ_DATA_ERROR</CODE>, it is sometimes useful to
|
||
|
know when the compressed stream does not start with the correct
|
||
|
magic bytes (<CODE>'B' 'Z' 'h'</CODE>).
|
||
|
<DT><CODE>BZ_IO_ERROR</CODE>
|
||
|
<DD>
|
||
|
Returned by <CODE>BZ2_bzRead</CODE> and <CODE>BZ2_bzWrite</CODE> when there is an error
|
||
|
reading or writing in the compressed file, and by <CODE>BZ2_bzReadOpen</CODE>
|
||
|
and <CODE>BZ2_bzWriteOpen</CODE> for attempts to use a file for which the
|
||
|
error indicator (viz, <CODE>ferror(f)</CODE>) is set.
|
||
|
On receipt of <CODE>BZ_IO_ERROR</CODE>, the caller should consult
|
||
|
<CODE>errno</CODE> and/or <CODE>perror</CODE> to acquire operating-system
|
||
|
specific information about the problem.
|
||
|
<DT><CODE>BZ_UNEXPECTED_EOF</CODE>
|
||
|
<DD>
|
||
|
Returned by <CODE>BZ2_bzRead</CODE> when the compressed file finishes
|
||
|
before the logical end of stream is detected.
|
||
|
<DT><CODE>BZ_OUTBUFF_FULL</CODE>
|
||
|
<DD>
|
||
|
Returned by <CODE>BZ2_bzBuffToBuffCompress</CODE> and
|
||
|
<CODE>BZ2_bzBuffToBuffDecompress</CODE> to indicate that the output data
|
||
|
will not fit into the output buffer provided.
|
||
|
</DL>
|
||
|
|
||
|
|
||
|
|
||
|
<H2><A NAME="SEC18" HREF="manual_toc.html#TOC18">Low-level interface</A></H2>
|
||
|
|
||
|
|
||
|
|
||
|
<H3><A NAME="SEC19" HREF="manual_toc.html#TOC19"><CODE>BZ2_bzCompressInit</CODE></A></H3>
|
||
|
|
||
|
<PRE>
|
||
|
typedef
|
||
|
struct {
|
||
|
char *next_in;
|
||
|
unsigned int avail_in;
|
||
|
unsigned int total_in_lo32;
|
||
|
unsigned int total_in_hi32;
|
||
|
|
||
|
char *next_out;
|
||
|
unsigned int avail_out;
|
||
|
unsigned int total_out_lo32;
|
||
|
unsigned int total_out_hi32;
|
||
|
|
||
|
void *state;
|
||
|
|
||
|
void *(*bzalloc)(void *,int,int);
|
||
|
void (*bzfree)(void *,void *);
|
||
|
void *opaque;
|
||
|
}
|
||
|
bz_stream;
|
||
|
|
||
|
int BZ2_bzCompressInit ( bz_stream *strm,
|
||
|
int blockSize100k,
|
||
|
int verbosity,
|
||
|
int workFactor );
|
||
|
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Prepares for compression. The <CODE>bz_stream</CODE> structure
|
||
|
holds all data pertaining to the compression activity.
|
||
|
A <CODE>bz_stream</CODE> structure should be allocated and initialised
|
||
|
prior to the call.
|
||
|
The fields of <CODE>bz_stream</CODE>
|
||
|
comprise the entirety of the user-visible data. <CODE>state</CODE>
|
||
|
is a pointer to the private data structures required for compression.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Custom memory allocators are supported, via fields <CODE>bzalloc</CODE>,
|
||
|
<CODE>bzfree</CODE>,
|
||
|
and <CODE>opaque</CODE>. The value
|
||
|
<CODE>opaque</CODE> is passed to as the first argument to
|
||
|
all calls to <CODE>bzalloc</CODE> and <CODE>bzfree</CODE>, but is
|
||
|
otherwise ignored by the library.
|
||
|
The call <CODE>bzalloc ( opaque, n, m )</CODE> is expected to return a
|
||
|
pointer <CODE>p</CODE> to
|
||
|
<CODE>n * m</CODE> bytes of memory, and <CODE>bzfree ( opaque, p )</CODE>
|
||
|
should free
|
||
|
that memory.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
If you don't want to use a custom memory allocator, set <CODE>bzalloc</CODE>,
|
||
|
<CODE>bzfree</CODE> and
|
||
|
<CODE>opaque</CODE> to <CODE>NULL</CODE>,
|
||
|
and the library will then use the standard <CODE>malloc</CODE>/<CODE>free</CODE>
|
||
|
routines.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Before calling <CODE>BZ2_bzCompressInit</CODE>, fields <CODE>bzalloc</CODE>,
|
||
|
<CODE>bzfree</CODE> and <CODE>opaque</CODE> should
|
||
|
be filled appropriately, as just described. Upon return, the internal
|
||
|
state will have been allocated and initialised, and <CODE>total_in_lo32</CODE>,
|
||
|
<CODE>total_in_hi32</CODE>, <CODE>total_out_lo32</CODE> and
|
||
|
<CODE>total_out_hi32</CODE> will have been set to zero.
|
||
|
These four fields are used by the library
|
||
|
to inform the caller of the total amount of data passed into and out of
|
||
|
the library, respectively. You should not try to change them.
|
||
|
As of version 1.0, 64-bit counts are maintained, even on 32-bit
|
||
|
platforms, using the <CODE>_hi32</CODE> fields to store the upper 32 bits
|
||
|
of the count. So, for example, the total amount of data in
|
||
|
is <CODE>(total_in_hi32 << 32) + total_in_lo32</CODE>.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Parameter <CODE>blockSize100k</CODE> specifies the block size to be used for
|
||
|
compression. It should be a value between 1 and 9 inclusive, and the
|
||
|
actual block size used is 100000 x this figure. 9 gives the best
|
||
|
compression but takes most memory.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Parameter <CODE>verbosity</CODE> should be set to a number between 0 and 4
|
||
|
inclusive. 0 is silent, and greater numbers give increasingly verbose
|
||
|
monitoring/debugging output. If the library has been compiled with
|
||
|
<CODE>-DBZ_NO_STDIO</CODE>, no such output will appear for any verbosity
|
||
|
setting.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Parameter <CODE>workFactor</CODE> controls how the compression phase behaves
|
||
|
when presented with worst case, highly repetitive, input data. If
|
||
|
compression runs into difficulties caused by repetitive data, the
|
||
|
library switches from the standard sorting algorithm to a fallback
|
||
|
algorithm. The fallback is slower than the standard algorithm by
|
||
|
perhaps a factor of three, but always behaves reasonably, no matter how
|
||
|
bad the input.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Lower values of <CODE>workFactor</CODE> reduce the amount of effort the
|
||
|
standard algorithm will expend before resorting to the fallback. You
|
||
|
should set this parameter carefully; too low, and many inputs will be
|
||
|
handled by the fallback algorithm and so compress rather slowly, too
|
||
|
high, and your average-to-worst case compression times can become very
|
||
|
large. The default value of 30 gives reasonable behaviour over a wide
|
||
|
range of circumstances.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Allowable values range from 0 to 250 inclusive. 0 is a special case,
|
||
|
equivalent to using the default value of 30.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Note that the compressed output generated is the same regardless of
|
||
|
whether or not the fallback algorithm is used.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Be aware also that this parameter may disappear entirely in future
|
||
|
versions of the library. In principle it should be possible to devise a
|
||
|
good way to automatically choose which algorithm to use. Such a
|
||
|
mechanism would render the parameter obsolete.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Possible return values:
|
||
|
|
||
|
<PRE>
|
||
|
<CODE>BZ_CONFIG_ERROR</CODE>
|
||
|
if the library has been mis-compiled
|
||
|
<CODE>BZ_PARAM_ERROR</CODE>
|
||
|
if <CODE>strm</CODE> is <CODE>NULL</CODE>
|
||
|
or <CODE>blockSize</CODE> < 1 or <CODE>blockSize</CODE> > 9
|
||
|
or <CODE>verbosity</CODE> < 0 or <CODE>verbosity</CODE> > 4
|
||
|
or <CODE>workFactor</CODE> < 0 or <CODE>workFactor</CODE> > 250
|
||
|
<CODE>BZ_MEM_ERROR</CODE>
|
||
|
if not enough memory is available
|
||
|
<CODE>BZ_OK</CODE>
|
||
|
otherwise
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Allowable next actions:
|
||
|
|
||
|
<PRE>
|
||
|
<CODE>BZ2_bzCompress</CODE>
|
||
|
if <CODE>BZ_OK</CODE> is returned
|
||
|
no specific action needed in case of error
|
||
|
</PRE>
|
||
|
|
||
|
|
||
|
|
||
|
<H3><A NAME="SEC20" HREF="manual_toc.html#TOC20"><CODE>BZ2_bzCompress</CODE></A></H3>
|
||
|
|
||
|
<PRE>
|
||
|
int BZ2_bzCompress ( bz_stream *strm, int action );
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Provides more input and/or output buffer space for the library. The
|
||
|
caller maintains input and output buffers, and calls <CODE>BZ2_bzCompress</CODE> to
|
||
|
transfer data between them.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Before each call to <CODE>BZ2_bzCompress</CODE>, <CODE>next_in</CODE> should point at
|
||
|
the data to be compressed, and <CODE>avail_in</CODE> should indicate how many
|
||
|
bytes the library may read. <CODE>BZ2_bzCompress</CODE> updates <CODE>next_in</CODE>,
|
||
|
<CODE>avail_in</CODE> and <CODE>total_in</CODE> to reflect the number of bytes it
|
||
|
has read.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Similarly, <CODE>next_out</CODE> should point to a buffer in which the
|
||
|
compressed data is to be placed, with <CODE>avail_out</CODE> indicating how
|
||
|
much output space is available. <CODE>BZ2_bzCompress</CODE> updates
|
||
|
<CODE>next_out</CODE>, <CODE>avail_out</CODE> and <CODE>total_out</CODE> to reflect the
|
||
|
number of bytes output.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
You may provide and remove as little or as much data as you like on each
|
||
|
call of <CODE>BZ2_bzCompress</CODE>. In the limit, it is acceptable to supply and
|
||
|
remove data one byte at a time, although this would be terribly
|
||
|
inefficient. You should always ensure that at least one byte of output
|
||
|
space is available at each call.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
A second purpose of <CODE>BZ2_bzCompress</CODE> is to request a change of mode of the
|
||
|
compressed stream.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Conceptually, a compressed stream can be in one of four states: IDLE,
|
||
|
RUNNING, FLUSHING and FINISHING. Before initialisation
|
||
|
(<CODE>BZ2_bzCompressInit</CODE>) and after termination (<CODE>BZ2_bzCompressEnd</CODE>), a
|
||
|
stream is regarded as IDLE.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Upon initialisation (<CODE>BZ2_bzCompressInit</CODE>), the stream is placed in the
|
||
|
RUNNING state. Subsequent calls to <CODE>BZ2_bzCompress</CODE> should pass
|
||
|
<CODE>BZ_RUN</CODE> as the requested action; other actions are illegal and
|
||
|
will result in <CODE>BZ_SEQUENCE_ERROR</CODE>.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
At some point, the calling program will have provided all the input data
|
||
|
it wants to. It will then want to finish up -- in effect, asking the
|
||
|
library to process any data it might have buffered internally. In this
|
||
|
state, <CODE>BZ2_bzCompress</CODE> will no longer attempt to read data from
|
||
|
<CODE>next_in</CODE>, but it will want to write data to <CODE>next_out</CODE>.
|
||
|
Because the output buffer supplied by the user can be arbitrarily small,
|
||
|
the finishing-up operation cannot necessarily be done with a single call
|
||
|
of <CODE>BZ2_bzCompress</CODE>.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Instead, the calling program passes <CODE>BZ_FINISH</CODE> as an action to
|
||
|
<CODE>BZ2_bzCompress</CODE>. This changes the stream's state to FINISHING. Any
|
||
|
remaining input (ie, <CODE>next_in[0 .. avail_in-1]</CODE>) is compressed and
|
||
|
transferred to the output buffer. To do this, <CODE>BZ2_bzCompress</CODE> must be
|
||
|
called repeatedly until all the output has been consumed. At that
|
||
|
point, <CODE>BZ2_bzCompress</CODE> returns <CODE>BZ_STREAM_END</CODE>, and the stream's
|
||
|
state is set back to IDLE. <CODE>BZ2_bzCompressEnd</CODE> should then be
|
||
|
called.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Just to make sure the calling program does not cheat, the library makes
|
||
|
a note of <CODE>avail_in</CODE> at the time of the first call to
|
||
|
<CODE>BZ2_bzCompress</CODE> which has <CODE>BZ_FINISH</CODE> as an action (ie, at the
|
||
|
time the program has announced its intention to not supply any more
|
||
|
input). By comparing this value with that of <CODE>avail_in</CODE> over
|
||
|
subsequent calls to <CODE>BZ2_bzCompress</CODE>, the library can detect any
|
||
|
attempts to slip in more data to compress. Any calls for which this is
|
||
|
detected will return <CODE>BZ_SEQUENCE_ERROR</CODE>. This indicates a
|
||
|
programming mistake which should be corrected.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Instead of asking to finish, the calling program may ask
|
||
|
<CODE>BZ2_bzCompress</CODE> to take all the remaining input, compress it and
|
||
|
terminate the current (Burrows-Wheeler) compression block. This could
|
||
|
be useful for error control purposes. The mechanism is analogous to
|
||
|
that for finishing: call <CODE>BZ2_bzCompress</CODE> with an action of
|
||
|
<CODE>BZ_FLUSH</CODE>, remove output data, and persist with the
|
||
|
<CODE>BZ_FLUSH</CODE> action until the value <CODE>BZ_RUN</CODE> is returned. As
|
||
|
with finishing, <CODE>BZ2_bzCompress</CODE> detects any attempt to provide more
|
||
|
input data once the flush has begun.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Once the flush is complete, the stream returns to the normal RUNNING
|
||
|
state.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
This all sounds pretty complex, but isn't really. Here's a table
|
||
|
which shows which actions are allowable in each state, what action
|
||
|
will be taken, what the next state is, and what the non-error return
|
||
|
values are. Note that you can't explicitly ask what state the
|
||
|
stream is in, but nor do you need to -- it can be inferred from the
|
||
|
values returned by <CODE>BZ2_bzCompress</CODE>.
|
||
|
|
||
|
<PRE>
|
||
|
IDLE/<CODE>any</CODE>
|
||
|
Illegal. IDLE state only exists after <CODE>BZ2_bzCompressEnd</CODE> or
|
||
|
before <CODE>BZ2_bzCompressInit</CODE>.
|
||
|
Return value = <CODE>BZ_SEQUENCE_ERROR</CODE>
|
||
|
|
||
|
RUNNING/<CODE>BZ_RUN</CODE>
|
||
|
Compress from <CODE>next_in</CODE> to <CODE>next_out</CODE> as much as possible.
|
||
|
Next state = RUNNING
|
||
|
Return value = <CODE>BZ_RUN_OK</CODE>
|
||
|
|
||
|
RUNNING/<CODE>BZ_FLUSH</CODE>
|
||
|
Remember current value of <CODE>next_in</CODE>. Compress from <CODE>next_in</CODE>
|
||
|
to <CODE>next_out</CODE> as much as possible, but do not accept any more input.
|
||
|
Next state = FLUSHING
|
||
|
Return value = <CODE>BZ_FLUSH_OK</CODE>
|
||
|
|
||
|
RUNNING/<CODE>BZ_FINISH</CODE>
|
||
|
Remember current value of <CODE>next_in</CODE>. Compress from <CODE>next_in</CODE>
|
||
|
to <CODE>next_out</CODE> as much as possible, but do not accept any more input.
|
||
|
Next state = FINISHING
|
||
|
Return value = <CODE>BZ_FINISH_OK</CODE>
|
||
|
|
||
|
FLUSHING/<CODE>BZ_FLUSH</CODE>
|
||
|
Compress from <CODE>next_in</CODE> to <CODE>next_out</CODE> as much as possible,
|
||
|
but do not accept any more input.
|
||
|
If all the existing input has been used up and all compressed
|
||
|
output has been removed
|
||
|
Next state = RUNNING; Return value = <CODE>BZ_RUN_OK</CODE>
|
||
|
else
|
||
|
Next state = FLUSHING; Return value = <CODE>BZ_FLUSH_OK</CODE>
|
||
|
|
||
|
FLUSHING/other
|
||
|
Illegal.
|
||
|
Return value = <CODE>BZ_SEQUENCE_ERROR</CODE>
|
||
|
|
||
|
FINISHING/<CODE>BZ_FINISH</CODE>
|
||
|
Compress from <CODE>next_in</CODE> to <CODE>next_out</CODE> as much as possible,
|
||
|
but to not accept any more input.
|
||
|
If all the existing input has been used up and all compressed
|
||
|
output has been removed
|
||
|
Next state = IDLE; Return value = <CODE>BZ_STREAM_END</CODE>
|
||
|
else
|
||
|
Next state = FINISHING; Return value = <CODE>BZ_FINISHING</CODE>
|
||
|
|
||
|
FINISHING/other
|
||
|
Illegal.
|
||
|
Return value = <CODE>BZ_SEQUENCE_ERROR</CODE>
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
That still looks complicated? Well, fair enough. The usual sequence
|
||
|
of calls for compressing a load of data is:
|
||
|
|
||
|
<UL>
|
||
|
<LI>Get started with <CODE>BZ2_bzCompressInit</CODE>.
|
||
|
|
||
|
<LI>Shovel data in and shlurp out its compressed form using zero or more
|
||
|
|
||
|
calls of <CODE>BZ2_bzCompress</CODE> with action = <CODE>BZ_RUN</CODE>.
|
||
|
<LI>Finish up.
|
||
|
|
||
|
Repeatedly call <CODE>BZ2_bzCompress</CODE> with action = <CODE>BZ_FINISH</CODE>,
|
||
|
copying out the compressed output, until <CODE>BZ_STREAM_END</CODE> is returned.
|
||
|
<LI>Close up and go home. Call <CODE>BZ2_bzCompressEnd</CODE>.
|
||
|
|
||
|
</UL>
|
||
|
|
||
|
<P>
|
||
|
If the data you want to compress fits into your input buffer all
|
||
|
at once, you can skip the calls of <CODE>BZ2_bzCompress ( ..., BZ_RUN )</CODE> and
|
||
|
just do the <CODE>BZ2_bzCompress ( ..., BZ_FINISH )</CODE> calls.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
All required memory is allocated by <CODE>BZ2_bzCompressInit</CODE>. The
|
||
|
compression library can accept any data at all (obviously). So you
|
||
|
shouldn't get any error return values from the <CODE>BZ2_bzCompress</CODE> calls.
|
||
|
If you do, they will be <CODE>BZ_SEQUENCE_ERROR</CODE>, and indicate a bug in
|
||
|
your programming.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Trivial other possible return values:
|
||
|
|
||
|
<PRE>
|
||
|
<CODE>BZ_PARAM_ERROR</CODE>
|
||
|
if <CODE>strm</CODE> is <CODE>NULL</CODE>, or <CODE>strm->s</CODE> is <CODE>NULL</CODE>
|
||
|
</PRE>
|
||
|
|
||
|
|
||
|
|
||
|
<H3><A NAME="SEC21" HREF="manual_toc.html#TOC21"><CODE>BZ2_bzCompressEnd</CODE></A></H3>
|
||
|
|
||
|
<PRE>
|
||
|
int BZ2_bzCompressEnd ( bz_stream *strm );
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Releases all memory associated with a compression stream.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Possible return values:
|
||
|
|
||
|
<PRE>
|
||
|
<CODE>BZ_PARAM_ERROR</CODE> if <CODE>strm</CODE> is <CODE>NULL</CODE> or <CODE>strm->s</CODE> is <CODE>NULL</CODE>
|
||
|
<CODE>BZ_OK</CODE> otherwise
|
||
|
</PRE>
|
||
|
|
||
|
|
||
|
|
||
|
<H3><A NAME="SEC22" HREF="manual_toc.html#TOC22"><CODE>BZ2_bzDecompressInit</CODE></A></H3>
|
||
|
|
||
|
<PRE>
|
||
|
int BZ2_bzDecompressInit ( bz_stream *strm, int verbosity, int small );
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Prepares for decompression. As with <CODE>BZ2_bzCompressInit</CODE>, a
|
||
|
<CODE>bz_stream</CODE> record should be allocated and initialised before the
|
||
|
call. Fields <CODE>bzalloc</CODE>, <CODE>bzfree</CODE> and <CODE>opaque</CODE> should be
|
||
|
set if a custom memory allocator is required, or made <CODE>NULL</CODE> for
|
||
|
the normal <CODE>malloc</CODE>/<CODE>free</CODE> routines. Upon return, the internal
|
||
|
state will have been initialised, and <CODE>total_in</CODE> and
|
||
|
<CODE>total_out</CODE> will be zero.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
For the meaning of parameter <CODE>verbosity</CODE>, see <CODE>BZ2_bzCompressInit</CODE>.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
If <CODE>small</CODE> is nonzero, the library will use an alternative
|
||
|
decompression algorithm which uses less memory but at the cost of
|
||
|
decompressing more slowly (roughly speaking, half the speed, but the
|
||
|
maximum memory requirement drops to around 2300k). See Chapter 2 for
|
||
|
more information on memory management.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Note that the amount of memory needed to decompress
|
||
|
a stream cannot be determined until the stream's header has been read,
|
||
|
so even if <CODE>BZ2_bzDecompressInit</CODE> succeeds, a subsequent
|
||
|
<CODE>BZ2_bzDecompress</CODE> could fail with <CODE>BZ_MEM_ERROR</CODE>.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Possible return values:
|
||
|
|
||
|
<PRE>
|
||
|
<CODE>BZ_CONFIG_ERROR</CODE>
|
||
|
if the library has been mis-compiled
|
||
|
<CODE>BZ_PARAM_ERROR</CODE>
|
||
|
if <CODE>(small != 0 && small != 1)</CODE>
|
||
|
or <CODE>(verbosity < 0 || verbosity > 4)</CODE>
|
||
|
<CODE>BZ_MEM_ERROR</CODE>
|
||
|
if insufficient memory is available
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Allowable next actions:
|
||
|
|
||
|
<PRE>
|
||
|
<CODE>BZ2_bzDecompress</CODE>
|
||
|
if <CODE>BZ_OK</CODE> was returned
|
||
|
no specific action required in case of error
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
|
||
|
|
||
|
</P>
|
||
|
|
||
|
|
||
|
<H3><A NAME="SEC23" HREF="manual_toc.html#TOC23"><CODE>BZ2_bzDecompress</CODE></A></H3>
|
||
|
|
||
|
<PRE>
|
||
|
int BZ2_bzDecompress ( bz_stream *strm );
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Provides more input and/out output buffer space for the library. The
|
||
|
caller maintains input and output buffers, and uses <CODE>BZ2_bzDecompress</CODE>
|
||
|
to transfer data between them.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Before each call to <CODE>BZ2_bzDecompress</CODE>, <CODE>next_in</CODE>
|
||
|
should point at the compressed data,
|
||
|
and <CODE>avail_in</CODE> should indicate how many bytes the library
|
||
|
may read. <CODE>BZ2_bzDecompress</CODE> updates <CODE>next_in</CODE>, <CODE>avail_in</CODE>
|
||
|
and <CODE>total_in</CODE>
|
||
|
to reflect the number of bytes it has read.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Similarly, <CODE>next_out</CODE> should point to a buffer in which the uncompressed
|
||
|
output is to be placed, with <CODE>avail_out</CODE> indicating how much output space
|
||
|
is available. <CODE>BZ2_bzCompress</CODE> updates <CODE>next_out</CODE>,
|
||
|
<CODE>avail_out</CODE> and <CODE>total_out</CODE> to reflect
|
||
|
the number of bytes output.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
You may provide and remove as little or as much data as you like on
|
||
|
each call of <CODE>BZ2_bzDecompress</CODE>.
|
||
|
In the limit, it is acceptable to
|
||
|
supply and remove data one byte at a time, although this would be
|
||
|
terribly inefficient. You should always ensure that at least one
|
||
|
byte of output space is available at each call.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Use of <CODE>BZ2_bzDecompress</CODE> is simpler than <CODE>BZ2_bzCompress</CODE>.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
You should provide input and remove output as described above, and
|
||
|
repeatedly call <CODE>BZ2_bzDecompress</CODE> until <CODE>BZ_STREAM_END</CODE> is
|
||
|
returned. Appearance of <CODE>BZ_STREAM_END</CODE> denotes that
|
||
|
<CODE>BZ2_bzDecompress</CODE> has detected the logical end of the compressed
|
||
|
stream. <CODE>BZ2_bzDecompress</CODE> will not produce <CODE>BZ_STREAM_END</CODE> until
|
||
|
all output data has been placed into the output buffer, so once
|
||
|
<CODE>BZ_STREAM_END</CODE> appears, you are guaranteed to have available all
|
||
|
the decompressed output, and <CODE>BZ2_bzDecompressEnd</CODE> can safely be
|
||
|
called.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
If case of an error return value, you should call <CODE>BZ2_bzDecompressEnd</CODE>
|
||
|
to clean up and release memory.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Possible return values:
|
||
|
|
||
|
<PRE>
|
||
|
<CODE>BZ_PARAM_ERROR</CODE>
|
||
|
if <CODE>strm</CODE> is <CODE>NULL</CODE> or <CODE>strm->s</CODE> is <CODE>NULL</CODE>
|
||
|
or <CODE>strm->avail_out < 1</CODE>
|
||
|
<CODE>BZ_DATA_ERROR</CODE>
|
||
|
if a data integrity error is detected in the compressed stream
|
||
|
<CODE>BZ_DATA_ERROR_MAGIC</CODE>
|
||
|
if the compressed stream doesn't begin with the right magic bytes
|
||
|
<CODE>BZ_MEM_ERROR</CODE>
|
||
|
if there wasn't enough memory available
|
||
|
<CODE>BZ_STREAM_END</CODE>
|
||
|
if the logical end of the data stream was detected and all
|
||
|
output in has been consumed, eg <CODE>s->avail_out > 0</CODE>
|
||
|
<CODE>BZ_OK</CODE>
|
||
|
otherwise
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Allowable next actions:
|
||
|
|
||
|
<PRE>
|
||
|
<CODE>BZ2_bzDecompress</CODE>
|
||
|
if <CODE>BZ_OK</CODE> was returned
|
||
|
<CODE>BZ2_bzDecompressEnd</CODE>
|
||
|
otherwise
|
||
|
</PRE>
|
||
|
|
||
|
|
||
|
|
||
|
<H3><A NAME="SEC24" HREF="manual_toc.html#TOC24"><CODE>BZ2_bzDecompressEnd</CODE></A></H3>
|
||
|
|
||
|
<PRE>
|
||
|
int BZ2_bzDecompressEnd ( bz_stream *strm );
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Releases all memory associated with a decompression stream.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Possible return values:
|
||
|
|
||
|
<PRE>
|
||
|
<CODE>BZ_PARAM_ERROR</CODE>
|
||
|
if <CODE>strm</CODE> is <CODE>NULL</CODE> or <CODE>strm->s</CODE> is <CODE>NULL</CODE>
|
||
|
<CODE>BZ_OK</CODE>
|
||
|
otherwise
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Allowable next actions:
|
||
|
|
||
|
<PRE>
|
||
|
None.
|
||
|
</PRE>
|
||
|
|
||
|
|
||
|
|
||
|
<H2><A NAME="SEC25" HREF="manual_toc.html#TOC25">High-level interface</A></H2>
|
||
|
|
||
|
<P>
|
||
|
This interface provides functions for reading and writing
|
||
|
<CODE>bzip2</CODE> format files. First, some general points.
|
||
|
|
||
|
</P>
|
||
|
|
||
|
<UL>
|
||
|
<LI>All of the functions take an <CODE>int*</CODE> first argument,
|
||
|
|
||
|
<CODE>bzerror</CODE>.
|
||
|
After each call, <CODE>bzerror</CODE> should be consulted first to determine
|
||
|
the outcome of the call. If <CODE>bzerror</CODE> is <CODE>BZ_OK</CODE>,
|
||
|
the call completed
|
||
|
successfully, and only then should the return value of the function
|
||
|
(if any) be consulted. If <CODE>bzerror</CODE> is <CODE>BZ_IO_ERROR</CODE>,
|
||
|
there was an error
|
||
|
reading/writing the underlying compressed file, and you should
|
||
|
then consult <CODE>errno</CODE>/<CODE>perror</CODE> to determine the
|
||
|
cause of the difficulty.
|
||
|
<CODE>bzerror</CODE> may also be set to various other values; precise details are
|
||
|
given on a per-function basis below.
|
||
|
<LI>If <CODE>bzerror</CODE> indicates an error
|
||
|
|
||
|
(ie, anything except <CODE>BZ_OK</CODE> and <CODE>BZ_STREAM_END</CODE>),
|
||
|
you should immediately call <CODE>BZ2_bzReadClose</CODE> (or <CODE>BZ2_bzWriteClose</CODE>,
|
||
|
depending on whether you are attempting to read or to write)
|
||
|
to free up all resources associated
|
||
|
with the stream. Once an error has been indicated, behaviour of all calls
|
||
|
except <CODE>BZ2_bzReadClose</CODE> (<CODE>BZ2_bzWriteClose</CODE>) is undefined.
|
||
|
The implication is that (1) <CODE>bzerror</CODE> should
|
||
|
be checked after each call, and (2) if <CODE>bzerror</CODE> indicates an error,
|
||
|
<CODE>BZ2_bzReadClose</CODE> (<CODE>BZ2_bzWriteClose</CODE>) should then be called to clean up.
|
||
|
<LI>The <CODE>FILE*</CODE> arguments passed to
|
||
|
|
||
|
<CODE>BZ2_bzReadOpen</CODE>/<CODE>BZ2_bzWriteOpen</CODE>
|
||
|
should be set to binary mode.
|
||
|
Most Unix systems will do this by default, but other platforms,
|
||
|
including Windows and Mac, will not. If you omit this, you may
|
||
|
encounter problems when moving code to new platforms.
|
||
|
<LI>Memory allocation requests are handled by
|
||
|
|
||
|
<CODE>malloc</CODE>/<CODE>free</CODE>.
|
||
|
At present
|
||
|
there is no facility for user-defined memory allocators in the file I/O
|
||
|
functions (could easily be added, though).
|
||
|
</UL>
|
||
|
|
||
|
|
||
|
|
||
|
<H3><A NAME="SEC26" HREF="manual_toc.html#TOC26"><CODE>BZ2_bzReadOpen</CODE></A></H3>
|
||
|
|
||
|
<PRE>
|
||
|
typedef void BZFILE;
|
||
|
|
||
|
BZFILE *BZ2_bzReadOpen ( int *bzerror, FILE *f,
|
||
|
int small, int verbosity,
|
||
|
void *unused, int nUnused );
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Prepare to read compressed data from file handle <CODE>f</CODE>. <CODE>f</CODE>
|
||
|
should refer to a file which has been opened for reading, and for which
|
||
|
the error indicator (<CODE>ferror(f)</CODE>)is not set. If <CODE>small</CODE> is 1,
|
||
|
the library will try to decompress using less memory, at the expense of
|
||
|
speed.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
For reasons explained below, <CODE>BZ2_bzRead</CODE> will decompress the
|
||
|
<CODE>nUnused</CODE> bytes starting at <CODE>unused</CODE>, before starting to read
|
||
|
from the file <CODE>f</CODE>. At most <CODE>BZ_MAX_UNUSED</CODE> bytes may be
|
||
|
supplied like this. If this facility is not required, you should pass
|
||
|
<CODE>NULL</CODE> and <CODE>0</CODE> for <CODE>unused</CODE> and n<CODE>Unused</CODE>
|
||
|
respectively.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
For the meaning of parameters <CODE>small</CODE> and <CODE>verbosity</CODE>,
|
||
|
see <CODE>BZ2_bzDecompressInit</CODE>.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
The amount of memory needed to decompress a file cannot be determined
|
||
|
until the file's header has been read. So it is possible that
|
||
|
<CODE>BZ2_bzReadOpen</CODE> returns <CODE>BZ_OK</CODE> but a subsequent call of
|
||
|
<CODE>BZ2_bzRead</CODE> will return <CODE>BZ_MEM_ERROR</CODE>.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Possible assignments to <CODE>bzerror</CODE>:
|
||
|
|
||
|
<PRE>
|
||
|
<CODE>BZ_CONFIG_ERROR</CODE>
|
||
|
if the library has been mis-compiled
|
||
|
<CODE>BZ_PARAM_ERROR</CODE>
|
||
|
if <CODE>f</CODE> is <CODE>NULL</CODE>
|
||
|
or <CODE>small</CODE> is neither <CODE>0</CODE> nor <CODE>1</CODE>
|
||
|
or <CODE>(unused == NULL && nUnused != 0)</CODE>
|
||
|
or <CODE>(unused != NULL && !(0 <= nUnused <= BZ_MAX_UNUSED))</CODE>
|
||
|
<CODE>BZ_IO_ERROR</CODE>
|
||
|
if <CODE>ferror(f)</CODE> is nonzero
|
||
|
<CODE>BZ_MEM_ERROR</CODE>
|
||
|
if insufficient memory is available
|
||
|
<CODE>BZ_OK</CODE>
|
||
|
otherwise.
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Possible return values:
|
||
|
|
||
|
<PRE>
|
||
|
Pointer to an abstract <CODE>BZFILE</CODE>
|
||
|
if <CODE>bzerror</CODE> is <CODE>BZ_OK</CODE>
|
||
|
<CODE>NULL</CODE>
|
||
|
otherwise
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Allowable next actions:
|
||
|
|
||
|
<PRE>
|
||
|
<CODE>BZ2_bzRead</CODE>
|
||
|
if <CODE>bzerror</CODE> is <CODE>BZ_OK</CODE>
|
||
|
<CODE>BZ2_bzClose</CODE>
|
||
|
otherwise
|
||
|
</PRE>
|
||
|
|
||
|
|
||
|
|
||
|
<H3><A NAME="SEC27" HREF="manual_toc.html#TOC27"><CODE>BZ2_bzRead</CODE></A></H3>
|
||
|
|
||
|
<PRE>
|
||
|
int BZ2_bzRead ( int *bzerror, BZFILE *b, void *buf, int len );
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Reads up to <CODE>len</CODE> (uncompressed) bytes from the compressed file
|
||
|
<CODE>b</CODE> into
|
||
|
the buffer <CODE>buf</CODE>. If the read was successful,
|
||
|
<CODE>bzerror</CODE> is set to <CODE>BZ_OK</CODE>
|
||
|
and the number of bytes read is returned. If the logical end-of-stream
|
||
|
was detected, <CODE>bzerror</CODE> will be set to <CODE>BZ_STREAM_END</CODE>,
|
||
|
and the number
|
||
|
of bytes read is returned. All other <CODE>bzerror</CODE> values denote an error.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
<CODE>BZ2_bzRead</CODE> will supply <CODE>len</CODE> bytes,
|
||
|
unless the logical stream end is detected
|
||
|
or an error occurs. Because of this, it is possible to detect the
|
||
|
stream end by observing when the number of bytes returned is
|
||
|
less than the number
|
||
|
requested. Nevertheless, this is regarded as inadvisable; you should
|
||
|
instead check <CODE>bzerror</CODE> after every call and watch out for
|
||
|
<CODE>BZ_STREAM_END</CODE>.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Internally, <CODE>BZ2_bzRead</CODE> copies data from the compressed file in chunks
|
||
|
of size <CODE>BZ_MAX_UNUSED</CODE> bytes
|
||
|
before decompressing it. If the file contains more bytes than strictly
|
||
|
needed to reach the logical end-of-stream, <CODE>BZ2_bzRead</CODE> will almost certainly
|
||
|
read some of the trailing data before signalling <CODE>BZ_SEQUENCE_END</CODE>.
|
||
|
To collect the read but unused data once <CODE>BZ_SEQUENCE_END</CODE> has
|
||
|
appeared, call <CODE>BZ2_bzReadGetUnused</CODE> immediately before <CODE>BZ2_bzReadClose</CODE>.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Possible assignments to <CODE>bzerror</CODE>:
|
||
|
|
||
|
<PRE>
|
||
|
<CODE>BZ_PARAM_ERROR</CODE>
|
||
|
if <CODE>b</CODE> is <CODE>NULL</CODE> or <CODE>buf</CODE> is <CODE>NULL</CODE> or <CODE>len < 0</CODE>
|
||
|
<CODE>BZ_SEQUENCE_ERROR</CODE>
|
||
|
if <CODE>b</CODE> was opened with <CODE>BZ2_bzWriteOpen</CODE>
|
||
|
<CODE>BZ_IO_ERROR</CODE>
|
||
|
if there is an error reading from the compressed file
|
||
|
<CODE>BZ_UNEXPECTED_EOF</CODE>
|
||
|
if the compressed file ended before the logical end-of-stream was detected
|
||
|
<CODE>BZ_DATA_ERROR</CODE>
|
||
|
if a data integrity error was detected in the compressed stream
|
||
|
<CODE>BZ_DATA_ERROR_MAGIC</CODE>
|
||
|
if the stream does not begin with the requisite header bytes (ie, is not
|
||
|
a <CODE>bzip2</CODE> data file). This is really a special case of <CODE>BZ_DATA_ERROR</CODE>.
|
||
|
<CODE>BZ_MEM_ERROR</CODE>
|
||
|
if insufficient memory was available
|
||
|
<CODE>BZ_STREAM_END</CODE>
|
||
|
if the logical end of stream was detected.
|
||
|
<CODE>BZ_OK</CODE>
|
||
|
otherwise.
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Possible return values:
|
||
|
|
||
|
<PRE>
|
||
|
number of bytes read
|
||
|
if <CODE>bzerror</CODE> is <CODE>BZ_OK</CODE> or <CODE>BZ_STREAM_END</CODE>
|
||
|
undefined
|
||
|
otherwise
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Allowable next actions:
|
||
|
|
||
|
<PRE>
|
||
|
collect data from <CODE>buf</CODE>, then <CODE>BZ2_bzRead</CODE> or <CODE>BZ2_bzReadClose</CODE>
|
||
|
if <CODE>bzerror</CODE> is <CODE>BZ_OK</CODE>
|
||
|
collect data from <CODE>buf</CODE>, then <CODE>BZ2_bzReadClose</CODE> or <CODE>BZ2_bzReadGetUnused</CODE>
|
||
|
if <CODE>bzerror</CODE> is <CODE>BZ_SEQUENCE_END</CODE>
|
||
|
<CODE>BZ2_bzReadClose</CODE>
|
||
|
otherwise
|
||
|
</PRE>
|
||
|
|
||
|
|
||
|
|
||
|
<H3><A NAME="SEC28" HREF="manual_toc.html#TOC28"><CODE>BZ2_bzReadGetUnused</CODE></A></H3>
|
||
|
|
||
|
<PRE>
|
||
|
void BZ2_bzReadGetUnused ( int* bzerror, BZFILE *b,
|
||
|
void** unused, int* nUnused );
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Returns data which was read from the compressed file but was not needed
|
||
|
to get to the logical end-of-stream. <CODE>*unused</CODE> is set to the address
|
||
|
of the data, and <CODE>*nUnused</CODE> to the number of bytes. <CODE>*nUnused</CODE> will
|
||
|
be set to a value between <CODE>0</CODE> and <CODE>BZ_MAX_UNUSED</CODE> inclusive.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
This function may only be called once <CODE>BZ2_bzRead</CODE> has signalled
|
||
|
<CODE>BZ_STREAM_END</CODE> but before <CODE>BZ2_bzReadClose</CODE>.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Possible assignments to <CODE>bzerror</CODE>:
|
||
|
|
||
|
<PRE>
|
||
|
<CODE>BZ_PARAM_ERROR</CODE>
|
||
|
if <CODE>b</CODE> is <CODE>NULL</CODE>
|
||
|
or <CODE>unused</CODE> is <CODE>NULL</CODE> or <CODE>nUnused</CODE> is <CODE>NULL</CODE>
|
||
|
<CODE>BZ_SEQUENCE_ERROR</CODE>
|
||
|
if <CODE>BZ_STREAM_END</CODE> has not been signalled
|
||
|
or if <CODE>b</CODE> was opened with <CODE>BZ2_bzWriteOpen</CODE>
|
||
|
<CODE>BZ_OK</CODE>
|
||
|
otherwise
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Allowable next actions:
|
||
|
|
||
|
<PRE>
|
||
|
<CODE>BZ2_bzReadClose</CODE>
|
||
|
</PRE>
|
||
|
|
||
|
|
||
|
|
||
|
<H3><A NAME="SEC29" HREF="manual_toc.html#TOC29"><CODE>BZ2_bzReadClose</CODE></A></H3>
|
||
|
|
||
|
<PRE>
|
||
|
void BZ2_bzReadClose ( int *bzerror, BZFILE *b );
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Releases all memory pertaining to the compressed file <CODE>b</CODE>.
|
||
|
<CODE>BZ2_bzReadClose</CODE> does not call <CODE>fclose</CODE> on the underlying file
|
||
|
handle, so you should do that yourself if appropriate.
|
||
|
<CODE>BZ2_bzReadClose</CODE> should be called to clean up after all error
|
||
|
situations.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Possible assignments to <CODE>bzerror</CODE>:
|
||
|
|
||
|
<PRE>
|
||
|
<CODE>BZ_SEQUENCE_ERROR</CODE>
|
||
|
if <CODE>b</CODE> was opened with <CODE>BZ2_bzOpenWrite</CODE>
|
||
|
<CODE>BZ_OK</CODE>
|
||
|
otherwise
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Allowable next actions:
|
||
|
|
||
|
<PRE>
|
||
|
none
|
||
|
</PRE>
|
||
|
|
||
|
|
||
|
|
||
|
<H3><A NAME="SEC30" HREF="manual_toc.html#TOC30"><CODE>BZ2_bzWriteOpen</CODE></A></H3>
|
||
|
|
||
|
<PRE>
|
||
|
BZFILE *BZ2_bzWriteOpen ( int *bzerror, FILE *f,
|
||
|
int blockSize100k, int verbosity,
|
||
|
int workFactor );
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Prepare to write compressed data to file handle <CODE>f</CODE>.
|
||
|
<CODE>f</CODE> should refer to
|
||
|
a file which has been opened for writing, and for which the error
|
||
|
indicator (<CODE>ferror(f)</CODE>)is not set.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
For the meaning of parameters <CODE>blockSize100k</CODE>,
|
||
|
<CODE>verbosity</CODE> and <CODE>workFactor</CODE>, see
|
||
|
<BR> <CODE>BZ2_bzCompressInit</CODE>.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
All required memory is allocated at this stage, so if the call
|
||
|
completes successfully, <CODE>BZ_MEM_ERROR</CODE> cannot be signalled by a
|
||
|
subsequent call to <CODE>BZ2_bzWrite</CODE>.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Possible assignments to <CODE>bzerror</CODE>:
|
||
|
|
||
|
<PRE>
|
||
|
<CODE>BZ_CONFIG_ERROR</CODE>
|
||
|
if the library has been mis-compiled
|
||
|
<CODE>BZ_PARAM_ERROR</CODE>
|
||
|
if <CODE>f</CODE> is <CODE>NULL</CODE>
|
||
|
or <CODE>blockSize100k < 1</CODE> or <CODE>blockSize100k > 9</CODE>
|
||
|
<CODE>BZ_IO_ERROR</CODE>
|
||
|
if <CODE>ferror(f)</CODE> is nonzero
|
||
|
<CODE>BZ_MEM_ERROR</CODE>
|
||
|
if insufficient memory is available
|
||
|
<CODE>BZ_OK</CODE>
|
||
|
otherwise
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Possible return values:
|
||
|
|
||
|
<PRE>
|
||
|
Pointer to an abstract <CODE>BZFILE</CODE>
|
||
|
if <CODE>bzerror</CODE> is <CODE>BZ_OK</CODE>
|
||
|
<CODE>NULL</CODE>
|
||
|
otherwise
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Allowable next actions:
|
||
|
|
||
|
<PRE>
|
||
|
<CODE>BZ2_bzWrite</CODE>
|
||
|
if <CODE>bzerror</CODE> is <CODE>BZ_OK</CODE>
|
||
|
(you could go directly to <CODE>BZ2_bzWriteClose</CODE>, but this would be pretty pointless)
|
||
|
<CODE>BZ2_bzWriteClose</CODE>
|
||
|
otherwise
|
||
|
</PRE>
|
||
|
|
||
|
|
||
|
|
||
|
<H3><A NAME="SEC31" HREF="manual_toc.html#TOC31"><CODE>BZ2_bzWrite</CODE></A></H3>
|
||
|
|
||
|
<PRE>
|
||
|
void BZ2_bzWrite ( int *bzerror, BZFILE *b, void *buf, int len );
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Absorbs <CODE>len</CODE> bytes from the buffer <CODE>buf</CODE>, eventually to be
|
||
|
compressed and written to the file.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Possible assignments to <CODE>bzerror</CODE>:
|
||
|
|
||
|
<PRE>
|
||
|
<CODE>BZ_PARAM_ERROR</CODE>
|
||
|
if <CODE>b</CODE> is <CODE>NULL</CODE> or <CODE>buf</CODE> is <CODE>NULL</CODE> or <CODE>len < 0</CODE>
|
||
|
<CODE>BZ_SEQUENCE_ERROR</CODE>
|
||
|
if b was opened with <CODE>BZ2_bzReadOpen</CODE>
|
||
|
<CODE>BZ_IO_ERROR</CODE>
|
||
|
if there is an error writing the compressed file.
|
||
|
<CODE>BZ_OK</CODE>
|
||
|
otherwise
|
||
|
</PRE>
|
||
|
|
||
|
|
||
|
|
||
|
<H3><A NAME="SEC32" HREF="manual_toc.html#TOC32"><CODE>BZ2_bzWriteClose</CODE></A></H3>
|
||
|
|
||
|
<PRE>
|
||
|
void BZ2_bzWriteClose ( int *bzerror, BZFILE* f,
|
||
|
int abandon,
|
||
|
unsigned int* nbytes_in,
|
||
|
unsigned int* nbytes_out );
|
||
|
|
||
|
void BZ2_bzWriteClose64 ( int *bzerror, BZFILE* f,
|
||
|
int abandon,
|
||
|
unsigned int* nbytes_in_lo32,
|
||
|
unsigned int* nbytes_in_hi32,
|
||
|
unsigned int* nbytes_out_lo32,
|
||
|
unsigned int* nbytes_out_hi32 );
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Compresses and flushes to the compressed file all data so far supplied
|
||
|
by <CODE>BZ2_bzWrite</CODE>. The logical end-of-stream markers are also written, so
|
||
|
subsequent calls to <CODE>BZ2_bzWrite</CODE> are illegal. All memory associated
|
||
|
with the compressed file <CODE>b</CODE> is released.
|
||
|
<CODE>fflush</CODE> is called on the
|
||
|
compressed file, but it is not <CODE>fclose</CODE>'d.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
If <CODE>BZ2_bzWriteClose</CODE> is called to clean up after an error, the only
|
||
|
action is to release the memory. The library records the error codes
|
||
|
issued by previous calls, so this situation will be detected
|
||
|
automatically. There is no attempt to complete the compression
|
||
|
operation, nor to <CODE>fflush</CODE> the compressed file. You can force this
|
||
|
behaviour to happen even in the case of no error, by passing a nonzero
|
||
|
value to <CODE>abandon</CODE>.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
If <CODE>nbytes_in</CODE> is non-null, <CODE>*nbytes_in</CODE> will be set to be the
|
||
|
total volume of uncompressed data handled. Similarly, <CODE>nbytes_out</CODE>
|
||
|
will be set to the total volume of compressed data written. For
|
||
|
compatibility with older versions of the library, <CODE>BZ2_bzWriteClose</CODE>
|
||
|
only yields the lower 32 bits of these counts. Use
|
||
|
<CODE>BZ2_bzWriteClose64</CODE> if you want the full 64 bit counts. These
|
||
|
two functions are otherwise absolutely identical.
|
||
|
|
||
|
</P>
|
||
|
|
||
|
<P>
|
||
|
Possible assignments to <CODE>bzerror</CODE>:
|
||
|
|
||
|
<PRE>
|
||
|
<CODE>BZ_SEQUENCE_ERROR</CODE>
|
||
|
if <CODE>b</CODE> was opened with <CODE>BZ2_bzReadOpen</CODE>
|
||
|
<CODE>BZ_IO_ERROR</CODE>
|
||
|
if there is an error writing the compressed file
|
||
|
<CODE>BZ_OK</CODE>
|
||
|
otherwise
|
||
|
</PRE>
|
||
|
|
||
|
|
||
|
|
||
|
<H3><A NAME="SEC33" HREF="manual_toc.html#TOC33">Handling embedded compressed data streams</A></H3>
|
||
|
|
||
|
<P>
|
||
|
The high-level library facilitates use of
|
||
|
<CODE>bzip2</CODE> data streams which form some part of a surrounding, larger
|
||
|
data stream.
|
||
|
|
||
|
<UL>
|
||
|
<LI>For writing, the library takes an open file handle, writes
|
||
|
|
||
|
compressed data to it, <CODE>fflush</CODE>es it but does not <CODE>fclose</CODE> it.
|
||
|
The calling application can write its own data before and after the
|
||
|
compressed data stream, using that same file handle.
|
||
|
<LI>Reading is more complex, and the facilities are not as general
|
||
|
|
||
|
as they could be since generality is hard to reconcile with efficiency.
|
||
|
<CODE>BZ2_bzRead</CODE> reads from the compressed file in blocks of size
|
||
|
<CODE>BZ_MAX_UNUSED</CODE> bytes, and in doing so probably will overshoot
|
||
|
the logical end of compressed stream.
|
||
|
To recover this data once decompression has
|
||
|
ended, call <CODE>BZ2_bzReadGetUnused</CODE> after the last call of <CODE>BZ2_bzRead</CODE>
|
||
|
(the one returning <CODE>BZ_STREAM_END</CODE>) but before calling
|
||
|
<CODE>BZ2_bzReadClose</CODE>.
|
||
|
</UL>
|
||
|
|
||
|
<P>
|
||
|
This mechanism makes it easy to decompress multiple <CODE>bzip2</CODE>
|
||
|
streams placed end-to-end. As the end of one stream, when <CODE>BZ2_bzRead</CODE>
|
||
|
returns <CODE>BZ_STREAM_END</CODE>, call <CODE>BZ2_bzReadGetUnused</CODE> to collect the
|
||
|
unused data (copy it into your own buffer somewhere).
|
||
|
That data forms the start of the next compressed stream.
|
||
|
To start uncompressing that next stream, call <CODE>BZ2_bzReadOpen</CODE> again,
|
||
|
feeding in the unused data via the <CODE>unused</CODE>/<CODE>nUnused</CODE>
|
||
|
parameters.
|
||
|
Keep doing this until <CODE>BZ_STREAM_END</CODE> return coincides with the
|
||
|
physical end of file (<CODE>feof(f)</CODE>). In this situation
|
||
|
<CODE>BZ2_bzReadGetUnused</CODE>
|
||
|
will of course return no data.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
This should give some feel for how the high-level interface can be used.
|
||
|
If you require extra flexibility, you'll have to bite the bullet and get
|
||
|
to grips with the low-level interface.
|
||
|
|
||
|
</P>
|
||
|
|
||
|
|
||
|
<H3><A NAME="SEC34" HREF="manual_toc.html#TOC34">Standard file-reading/writing code</A></H3>
|
||
|
<P>
|
||
|
Here's how you'd write data to a compressed file:
|
||
|
|
||
|
<PRE>
|
||
|
FILE* f;
|
||
|
BZFILE* b;
|
||
|
int nBuf;
|
||
|
char buf[ /* whatever size you like */ ];
|
||
|
int bzerror;
|
||
|
int nWritten;
|
||
|
|
||
|
f = fopen ( "myfile.bz2", "w" );
|
||
|
if (!f) {
|
||
|
/* handle error */
|
||
|
}
|
||
|
b = BZ2_bzWriteOpen ( &bzerror, f, 9 );
|
||
|
if (bzerror != BZ_OK) {
|
||
|
BZ2_bzWriteClose ( b );
|
||
|
/* handle error */
|
||
|
}
|
||
|
|
||
|
while ( /* condition */ ) {
|
||
|
/* get data to write into buf, and set nBuf appropriately */
|
||
|
nWritten = BZ2_bzWrite ( &bzerror, b, buf, nBuf );
|
||
|
if (bzerror == BZ_IO_ERROR) {
|
||
|
BZ2_bzWriteClose ( &bzerror, b );
|
||
|
/* handle error */
|
||
|
}
|
||
|
}
|
||
|
|
||
|
BZ2_bzWriteClose ( &bzerror, b );
|
||
|
if (bzerror == BZ_IO_ERROR) {
|
||
|
/* handle error */
|
||
|
}
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
And to read from a compressed file:
|
||
|
|
||
|
<PRE>
|
||
|
FILE* f;
|
||
|
BZFILE* b;
|
||
|
int nBuf;
|
||
|
char buf[ /* whatever size you like */ ];
|
||
|
int bzerror;
|
||
|
int nWritten;
|
||
|
|
||
|
f = fopen ( "myfile.bz2", "r" );
|
||
|
if (!f) {
|
||
|
/* handle error */
|
||
|
}
|
||
|
b = BZ2_bzReadOpen ( &bzerror, f, 0, NULL, 0 );
|
||
|
if (bzerror != BZ_OK) {
|
||
|
BZ2_bzReadClose ( &bzerror, b );
|
||
|
/* handle error */
|
||
|
}
|
||
|
|
||
|
bzerror = BZ_OK;
|
||
|
while (bzerror == BZ_OK && /* arbitrary other conditions */) {
|
||
|
nBuf = BZ2_bzRead ( &bzerror, b, buf, /* size of buf */ );
|
||
|
if (bzerror == BZ_OK) {
|
||
|
/* do something with buf[0 .. nBuf-1] */
|
||
|
}
|
||
|
}
|
||
|
if (bzerror != BZ_STREAM_END) {
|
||
|
BZ2_bzReadClose ( &bzerror, b );
|
||
|
/* handle error */
|
||
|
} else {
|
||
|
BZ2_bzReadClose ( &bzerror );
|
||
|
}
|
||
|
</PRE>
|
||
|
|
||
|
|
||
|
|
||
|
<H2><A NAME="SEC35" HREF="manual_toc.html#TOC35">Utility functions</A></H2>
|
||
|
|
||
|
|
||
|
<H3><A NAME="SEC36" HREF="manual_toc.html#TOC36"><CODE>BZ2_bzBuffToBuffCompress</CODE></A></H3>
|
||
|
|
||
|
<PRE>
|
||
|
int BZ2_bzBuffToBuffCompress( char* dest,
|
||
|
unsigned int* destLen,
|
||
|
char* source,
|
||
|
unsigned int sourceLen,
|
||
|
int blockSize100k,
|
||
|
int verbosity,
|
||
|
int workFactor );
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Attempts to compress the data in <CODE>source[0 .. sourceLen-1]</CODE>
|
||
|
into the destination buffer, <CODE>dest[0 .. *destLen-1]</CODE>.
|
||
|
If the destination buffer is big enough, <CODE>*destLen</CODE> is
|
||
|
set to the size of the compressed data, and <CODE>BZ_OK</CODE> is
|
||
|
returned. If the compressed data won't fit, <CODE>*destLen</CODE>
|
||
|
is unchanged, and <CODE>BZ_OUTBUFF_FULL</CODE> is returned.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Compression in this manner is a one-shot event, done with a single call
|
||
|
to this function. The resulting compressed data is a complete
|
||
|
<CODE>bzip2</CODE> format data stream. There is no mechanism for making
|
||
|
additional calls to provide extra input data. If you want that kind of
|
||
|
mechanism, use the low-level interface.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
For the meaning of parameters <CODE>blockSize100k</CODE>, <CODE>verbosity</CODE>
|
||
|
and <CODE>workFactor</CODE>, <BR> see <CODE>BZ2_bzCompressInit</CODE>.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
To guarantee that the compressed data will fit in its buffer, allocate
|
||
|
an output buffer of size 1% larger than the uncompressed data, plus
|
||
|
six hundred extra bytes.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
<CODE>BZ2_bzBuffToBuffDecompress</CODE> will not write data at or
|
||
|
beyond <CODE>dest[*destLen]</CODE>, even in case of buffer overflow.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Possible return values:
|
||
|
|
||
|
<PRE>
|
||
|
<CODE>BZ_CONFIG_ERROR</CODE>
|
||
|
if the library has been mis-compiled
|
||
|
<CODE>BZ_PARAM_ERROR</CODE>
|
||
|
if <CODE>dest</CODE> is <CODE>NULL</CODE> or <CODE>destLen</CODE> is <CODE>NULL</CODE>
|
||
|
or <CODE>blockSize100k < 1</CODE> or <CODE>blockSize100k > 9</CODE>
|
||
|
or <CODE>verbosity < 0</CODE> or <CODE>verbosity > 4</CODE>
|
||
|
or <CODE>workFactor < 0</CODE> or <CODE>workFactor > 250</CODE>
|
||
|
<CODE>BZ_MEM_ERROR</CODE>
|
||
|
if insufficient memory is available
|
||
|
<CODE>BZ_OUTBUFF_FULL</CODE>
|
||
|
if the size of the compressed data exceeds <CODE>*destLen</CODE>
|
||
|
<CODE>BZ_OK</CODE>
|
||
|
otherwise
|
||
|
</PRE>
|
||
|
|
||
|
|
||
|
|
||
|
<H3><A NAME="SEC37" HREF="manual_toc.html#TOC37"><CODE>BZ2_bzBuffToBuffDecompress</CODE></A></H3>
|
||
|
|
||
|
<PRE>
|
||
|
int BZ2_bzBuffToBuffDecompress ( char* dest,
|
||
|
unsigned int* destLen,
|
||
|
char* source,
|
||
|
unsigned int sourceLen,
|
||
|
int small,
|
||
|
int verbosity );
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Attempts to decompress the data in <CODE>source[0 .. sourceLen-1]</CODE>
|
||
|
into the destination buffer, <CODE>dest[0 .. *destLen-1]</CODE>.
|
||
|
If the destination buffer is big enough, <CODE>*destLen</CODE> is
|
||
|
set to the size of the uncompressed data, and <CODE>BZ_OK</CODE> is
|
||
|
returned. If the compressed data won't fit, <CODE>*destLen</CODE>
|
||
|
is unchanged, and <CODE>BZ_OUTBUFF_FULL</CODE> is returned.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
<CODE>source</CODE> is assumed to hold a complete <CODE>bzip2</CODE> format
|
||
|
data stream. <BR> <CODE>BZ2_bzBuffToBuffDecompress</CODE> tries to decompress
|
||
|
the entirety of the stream into the output buffer.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
For the meaning of parameters <CODE>small</CODE> and <CODE>verbosity</CODE>,
|
||
|
see <CODE>BZ2_bzDecompressInit</CODE>.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Because the compression ratio of the compressed data cannot be known in
|
||
|
advance, there is no easy way to guarantee that the output buffer will
|
||
|
be big enough. You may of course make arrangements in your code to
|
||
|
record the size of the uncompressed data, but such a mechanism is beyond
|
||
|
the scope of this library.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
<CODE>BZ2_bzBuffToBuffDecompress</CODE> will not write data at or
|
||
|
beyond <CODE>dest[*destLen]</CODE>, even in case of buffer overflow.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Possible return values:
|
||
|
|
||
|
<PRE>
|
||
|
<CODE>BZ_CONFIG_ERROR</CODE>
|
||
|
if the library has been mis-compiled
|
||
|
<CODE>BZ_PARAM_ERROR</CODE>
|
||
|
if <CODE>dest</CODE> is <CODE>NULL</CODE> or <CODE>destLen</CODE> is <CODE>NULL</CODE>
|
||
|
or <CODE>small != 0 && small != 1</CODE>
|
||
|
or <CODE>verbosity < 0</CODE> or <CODE>verbosity > 4</CODE>
|
||
|
<CODE>BZ_MEM_ERROR</CODE>
|
||
|
if insufficient memory is available
|
||
|
<CODE>BZ_OUTBUFF_FULL</CODE>
|
||
|
if the size of the compressed data exceeds <CODE>*destLen</CODE>
|
||
|
<CODE>BZ_DATA_ERROR</CODE>
|
||
|
if a data integrity error was detected in the compressed data
|
||
|
<CODE>BZ_DATA_ERROR_MAGIC</CODE>
|
||
|
if the compressed data doesn't begin with the right magic bytes
|
||
|
<CODE>BZ_UNEXPECTED_EOF</CODE>
|
||
|
if the compressed data ends unexpectedly
|
||
|
<CODE>BZ_OK</CODE>
|
||
|
otherwise
|
||
|
</PRE>
|
||
|
|
||
|
|
||
|
|
||
|
<H2><A NAME="SEC38" HREF="manual_toc.html#TOC38"><CODE>zlib</CODE> compatibility functions</A></H2>
|
||
|
<P>
|
||
|
Yoshioka Tsuneo has contributed some functions to
|
||
|
give better <CODE>zlib</CODE> compatibility. These functions are
|
||
|
<CODE>BZ2_bzopen</CODE>, <CODE>BZ2_bzread</CODE>, <CODE>BZ2_bzwrite</CODE>, <CODE>BZ2_bzflush</CODE>,
|
||
|
<CODE>BZ2_bzclose</CODE>,
|
||
|
<CODE>BZ2_bzerror</CODE> and <CODE>BZ2_bzlibVersion</CODE>.
|
||
|
These functions are not (yet) officially part of
|
||
|
the library. If they break, you get to keep all the pieces.
|
||
|
Nevertheless, I think they work ok.
|
||
|
|
||
|
<PRE>
|
||
|
typedef void BZFILE;
|
||
|
|
||
|
const char * BZ2_bzlibVersion ( void );
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Returns a string indicating the library version.
|
||
|
|
||
|
<PRE>
|
||
|
BZFILE * BZ2_bzopen ( const char *path, const char *mode );
|
||
|
BZFILE * BZ2_bzdopen ( int fd, const char *mode );
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Opens a <CODE>.bz2</CODE> file for reading or writing, using either its name
|
||
|
or a pre-existing file descriptor.
|
||
|
Analogous to <CODE>fopen</CODE> and <CODE>fdopen</CODE>.
|
||
|
|
||
|
<PRE>
|
||
|
int BZ2_bzread ( BZFILE* b, void* buf, int len );
|
||
|
int BZ2_bzwrite ( BZFILE* b, void* buf, int len );
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Reads/writes data from/to a previously opened <CODE>BZFILE</CODE>.
|
||
|
Analogous to <CODE>fread</CODE> and <CODE>fwrite</CODE>.
|
||
|
|
||
|
<PRE>
|
||
|
int BZ2_bzflush ( BZFILE* b );
|
||
|
void BZ2_bzclose ( BZFILE* b );
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Flushes/closes a <CODE>BZFILE</CODE>. <CODE>BZ2_bzflush</CODE> doesn't actually do
|
||
|
anything. Analogous to <CODE>fflush</CODE> and <CODE>fclose</CODE>.
|
||
|
|
||
|
</P>
|
||
|
|
||
|
<PRE>
|
||
|
const char * BZ2_bzerror ( BZFILE *b, int *errnum )
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Returns a string describing the more recent error status of
|
||
|
<CODE>b</CODE>, and also sets <CODE>*errnum</CODE> to its numerical value.
|
||
|
|
||
|
</P>
|
||
|
|
||
|
|
||
|
|
||
|
<H2><A NAME="SEC39" HREF="manual_toc.html#TOC39">Using the library in a <CODE>stdio</CODE>-free environment</A></H2>
|
||
|
|
||
|
|
||
|
|
||
|
<H3><A NAME="SEC40" HREF="manual_toc.html#TOC40">Getting rid of <CODE>stdio</CODE></A></H3>
|
||
|
|
||
|
<P>
|
||
|
In a deeply embedded application, you might want to use just
|
||
|
the memory-to-memory functions. You can do this conveniently
|
||
|
by compiling the library with preprocessor symbol <CODE>BZ_NO_STDIO</CODE>
|
||
|
defined. Doing this gives you a library containing only the following
|
||
|
eight functions:
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
<CODE>BZ2_bzCompressInit</CODE>, <CODE>BZ2_bzCompress</CODE>, <CODE>BZ2_bzCompressEnd</CODE> <BR>
|
||
|
<CODE>BZ2_bzDecompressInit</CODE>, <CODE>BZ2_bzDecompress</CODE>, <CODE>BZ2_bzDecompressEnd</CODE> <BR>
|
||
|
<CODE>BZ2_bzBuffToBuffCompress</CODE>, <CODE>BZ2_bzBuffToBuffDecompress</CODE>
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
When compiled like this, all functions will ignore <CODE>verbosity</CODE>
|
||
|
settings.
|
||
|
|
||
|
</P>
|
||
|
|
||
|
|
||
|
<H3><A NAME="SEC41" HREF="manual_toc.html#TOC41">Critical error handling</A></H3>
|
||
|
<P>
|
||
|
<CODE>libbzip2</CODE> contains a number of internal assertion checks which
|
||
|
should, needless to say, never be activated. Nevertheless, if an
|
||
|
assertion should fail, behaviour depends on whether or not the library
|
||
|
was compiled with <CODE>BZ_NO_STDIO</CODE> set.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
For a normal compile, an assertion failure yields the message
|
||
|
|
||
|
<PRE>
|
||
|
bzip2/libbzip2: internal error number N.
|
||
|
This is a bug in bzip2/libbzip2, 1.0 of 21-Mar-2000.
|
||
|
Please report it to me at: jseward@acm.org. If this happened
|
||
|
when you were using some program which uses libbzip2 as a
|
||
|
component, you should also report this bug to the author(s)
|
||
|
of that program. Please make an effort to report this bug;
|
||
|
timely and accurate bug reports eventually lead to higher
|
||
|
quality software. Thanks. Julian Seward, 21 March 2000.
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
where <CODE>N</CODE> is some error code number. <CODE>exit(3)</CODE>
|
||
|
is then called.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
For a <CODE>stdio</CODE>-free library, assertion failures result
|
||
|
in a call to a function declared as:
|
||
|
|
||
|
<PRE>
|
||
|
extern void bz_internal_error ( int errcode );
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
The relevant code is passed as a parameter. You should supply
|
||
|
such a function.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
In either case, once an assertion failure has occurred, any
|
||
|
<CODE>bz_stream</CODE> records involved can be regarded as invalid.
|
||
|
You should not attempt to resume normal operation with them.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
You may, of course, change critical error handling to suit
|
||
|
your needs. As I said above, critical errors indicate bugs
|
||
|
in the library and should not occur. All "normal" error
|
||
|
situations are indicated via error return codes from functions,
|
||
|
and can be recovered from.
|
||
|
|
||
|
</P>
|
||
|
|
||
|
|
||
|
|
||
|
<H2><A NAME="SEC42" HREF="manual_toc.html#TOC42">Making a Windows DLL</A></H2>
|
||
|
<P>
|
||
|
Everything related to Windows has been contributed by Yoshioka Tsuneo
|
||
|
<BR> (<CODE>QWF00133@niftyserve.or.jp</CODE> /
|
||
|
<CODE>tsuneo-y@is.aist-nara.ac.jp</CODE>), so you should send your queries to
|
||
|
him (but perhaps Cc: me, <CODE>jseward@acm.org</CODE>).
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
My vague understanding of what to do is: using Visual C++ 5.0,
|
||
|
open the project file <CODE>libbz2.dsp</CODE>, and build. That's all.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
If you can't
|
||
|
open the project file for some reason, make a new one, naming these files:
|
||
|
<CODE>blocksort.c</CODE>, <CODE>bzlib.c</CODE>, <CODE>compress.c</CODE>,
|
||
|
<CODE>crctable.c</CODE>, <CODE>decompress.c</CODE>, <CODE>huffman.c</CODE>, <BR>
|
||
|
<CODE>randtable.c</CODE> and <CODE>libbz2.def</CODE>. You will also need
|
||
|
to name the header files <CODE>bzlib.h</CODE> and <CODE>bzlib_private.h</CODE>.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
If you don't use VC++, you may need to define the proprocessor symbol
|
||
|
<CODE>_WIN32</CODE>.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Finally, <CODE>dlltest.c</CODE> is a sample program using the DLL. It has a
|
||
|
project file, <CODE>dlltest.dsp</CODE>.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
If you just want a makefile for Visual C, have a look at
|
||
|
<CODE>makefile.msc</CODE>.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Be aware that if you compile <CODE>bzip2</CODE> itself on Win32, you must set
|
||
|
<CODE>BZ_UNIX</CODE> to 0 and <CODE>BZ_LCCWIN32</CODE> to 1, in the file
|
||
|
<CODE>bzip2.c</CODE>, before compiling. Otherwise the resulting binary won't
|
||
|
work correctly.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
I haven't tried any of this stuff myself, but it all looks plausible.
|
||
|
|
||
|
</P>
|
||
|
|
||
|
<P><HR><P>
|
||
|
<p>Go to the <A HREF="manual_1.html">first</A>, <A HREF="manual_2.html">previous</A>, <A HREF="manual_4.html">next</A>, <A HREF="manual_4.html">last</A> section, <A HREF="manual_toc.html">table of contents</A>.
|
||
|
</BODY>
|
||
|
</HTML>
|