529 lines
		
	
	
		
			17 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
			
		
		
	
	
			529 lines
		
	
	
		
			17 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
<HTML>
 | 
						|
<HEAD>
 | 
						|
<!-- This HTML file has been created by texi2html 1.54
 | 
						|
     from manual.texi on 23 March 2000 -->
 | 
						|
 | 
						|
<TITLE>bzip2 and libbzip2 - Miscellanea</TITLE>
 | 
						|
<link href="manual_3.html" rel=Previous>
 | 
						|
<link href="manual_toc.html" rel=ToC>
 | 
						|
 | 
						|
</HEAD>
 | 
						|
<BODY>
 | 
						|
<p>Go to the <A HREF="manual_1.html">first</A>, <A HREF="manual_3.html">previous</A>, next, last section, <A HREF="manual_toc.html">table of contents</A>.
 | 
						|
<P><HR><P>
 | 
						|
 | 
						|
 | 
						|
<H1><A NAME="SEC43" HREF="manual_toc.html#TOC43">Miscellanea</A></H1>
 | 
						|
 | 
						|
<P>
 | 
						|
These are just some random thoughts of mine.  Your mileage may
 | 
						|
vary.
 | 
						|
 | 
						|
</P>
 | 
						|
 | 
						|
 | 
						|
<H2><A NAME="SEC44" HREF="manual_toc.html#TOC44">Limitations of the compressed file format</A></H2>
 | 
						|
<P>
 | 
						|
<CODE>bzip2-1.0</CODE>, <CODE>0.9.5</CODE> and <CODE>0.9.0</CODE>
 | 
						|
use exactly the same file format as the previous
 | 
						|
version, <CODE>bzip2-0.1</CODE>.  This decision was made in the interests of
 | 
						|
stability.  Creating yet another incompatible compressed file format
 | 
						|
would create further confusion and disruption for users.
 | 
						|
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
Nevertheless, this is not a painless decision.  Development
 | 
						|
work since the release of <CODE>bzip2-0.1</CODE> in August 1997
 | 
						|
has shown complexities in the file format which slow down
 | 
						|
decompression and, in retrospect, are unnecessary.  These are:
 | 
						|
 | 
						|
<UL>
 | 
						|
<LI>The run-length encoder, which is the first of the
 | 
						|
 | 
						|
      compression transformations, is entirely irrelevant.
 | 
						|
      The original purpose was to protect the sorting algorithm
 | 
						|
      from the very worst case input: a string of repeated
 | 
						|
      symbols.  But algorithm steps Q6a and Q6b in the original
 | 
						|
      Burrows-Wheeler technical report (SRC-124) show how
 | 
						|
      repeats can be handled without difficulty in block
 | 
						|
      sorting.
 | 
						|
<LI>The randomisation mechanism doesn't really need to be
 | 
						|
 | 
						|
      there.  Udi Manber and Gene Myers published a suffix
 | 
						|
      array construction algorithm a few years back, which
 | 
						|
      can be employed to sort any block, no matter how 
 | 
						|
      repetitive, in O(N log N) time.  Subsequent work by
 | 
						|
      Kunihiko Sadakane has produced a derivative O(N (log N)^2) 
 | 
						|
      algorithm which usually outperforms the Manber-Myers
 | 
						|
      algorithm.
 | 
						|
 | 
						|
      I could have changed to Sadakane's algorithm, but I find
 | 
						|
      it to be slower than <CODE>bzip2</CODE>'s existing algorithm for
 | 
						|
      most inputs, and the randomisation mechanism protects
 | 
						|
      adequately against bad cases.  I didn't think it was
 | 
						|
      a good tradeoff to make.  Partly this is due to the fact
 | 
						|
      that I was not flooded with email complaints about
 | 
						|
      <CODE>bzip2-0.1</CODE>'s performance on repetitive data, so
 | 
						|
      perhaps it isn't a problem for real inputs.
 | 
						|
 | 
						|
      Probably the best long-term solution,
 | 
						|
      and the one I have incorporated into 0.9.5 and above,
 | 
						|
      is to use the existing sorting
 | 
						|
      algorithm initially, and fall back to a O(N (log N)^2)
 | 
						|
      algorithm if the standard algorithm gets into difficulties.
 | 
						|
<LI>The compressed file format was never designed to be
 | 
						|
 | 
						|
      handled by a library, and I have had to jump though
 | 
						|
      some hoops to produce an efficient implementation of
 | 
						|
      decompression.  It's a bit hairy.  Try passing
 | 
						|
      <CODE>decompress.c</CODE> through the C preprocessor 
 | 
						|
      and you'll see what I mean.  Much of this complexity
 | 
						|
      could have been avoided if the compressed size of
 | 
						|
      each block of data was recorded in the data stream.
 | 
						|
<LI>An Adler-32 checksum, rather than a CRC32 checksum,
 | 
						|
 | 
						|
      would be faster to compute.
 | 
						|
</UL>
 | 
						|
 | 
						|
<P>
 | 
						|
It would be fair to say that the <CODE>bzip2</CODE> format was frozen
 | 
						|
before I properly and fully understood the performance
 | 
						|
consequences of doing so.
 | 
						|
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
Improvements which I was able to incorporate into
 | 
						|
0.9.0, despite using the same file format, are:
 | 
						|
 | 
						|
<UL>
 | 
						|
<LI>Single array implementation of the inverse BWT.  This
 | 
						|
 | 
						|
      significantly speeds up decompression, presumably
 | 
						|
      because it reduces the number of cache misses.
 | 
						|
<LI>Faster inverse MTF transform for large MTF values.  The
 | 
						|
 | 
						|
      new implementation is based on the notion of sliding blocks
 | 
						|
      of values.
 | 
						|
<LI><CODE>bzip2-0.9.0</CODE> now reads and writes files with <CODE>fread</CODE>
 | 
						|
 | 
						|
      and <CODE>fwrite</CODE>; version 0.1 used <CODE>putc</CODE> and <CODE>getc</CODE>.
 | 
						|
      Duh!  Well, you live and learn.
 | 
						|
 | 
						|
</UL>
 | 
						|
 | 
						|
<P>
 | 
						|
Further ahead, it would be nice 
 | 
						|
to be able to do random access into files.  This will 
 | 
						|
require some careful design of compressed file formats.
 | 
						|
 | 
						|
</P>
 | 
						|
 | 
						|
 | 
						|
 | 
						|
<H2><A NAME="SEC45" HREF="manual_toc.html#TOC45">Portability issues</A></H2>
 | 
						|
<P>
 | 
						|
After some consideration, I have decided not to use
 | 
						|
GNU <CODE>autoconf</CODE> to configure 0.9.5 or 1.0.
 | 
						|
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
<CODE>autoconf</CODE>, admirable and wonderful though it is, 
 | 
						|
mainly assists with portability problems between Unix-like
 | 
						|
platforms.  But <CODE>bzip2</CODE> doesn't have much in the way
 | 
						|
of portability problems on Unix; most of the difficulties appear
 | 
						|
when porting to the Mac, or to Microsoft's operating systems.
 | 
						|
<CODE>autoconf</CODE> doesn't help in those cases, and brings in a 
 | 
						|
whole load of new complexity.
 | 
						|
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
Most people should be able to compile the library and program
 | 
						|
under Unix straight out-of-the-box, so to speak, especially 
 | 
						|
if you have a version of GNU C available.
 | 
						|
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
There are a couple of <CODE>__inline__</CODE> directives in the code.  GNU C
 | 
						|
(<CODE>gcc</CODE>) should be able to handle them.  If you're not using
 | 
						|
GNU C, your C compiler shouldn't see them at all.
 | 
						|
If your compiler does, for some reason, see them and doesn't
 | 
						|
like them, just <CODE>#define</CODE> <CODE>__inline__</CODE> to be <CODE>/* */</CODE>.  One
 | 
						|
easy way to do this is to compile with the flag <CODE>-D__inline__=</CODE>, 
 | 
						|
which should be understood by most Unix compilers.
 | 
						|
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
If you still have difficulties, try compiling with the macro
 | 
						|
<CODE>BZ_STRICT_ANSI</CODE> defined.  This should enable you to build the
 | 
						|
library in a strictly ANSI compliant environment.  Building the program
 | 
						|
itself like this is dangerous and not supported, since you remove
 | 
						|
<CODE>bzip2</CODE>'s checks against compressing directories, symbolic links,
 | 
						|
devices, and other not-really-a-file entities.  This could cause
 | 
						|
filesystem corruption!
 | 
						|
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
One other thing: if you create a <CODE>bzip2</CODE> binary for public
 | 
						|
distribution, please try and link it statically (<CODE>gcc -s</CODE>).  This
 | 
						|
avoids all sorts of library-version issues that others may encounter
 | 
						|
later on.
 | 
						|
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
If you build <CODE>bzip2</CODE> on Win32, you must set <CODE>BZ_UNIX</CODE> to 0 and
 | 
						|
<CODE>BZ_LCCWIN32</CODE> to 1, in the file <CODE>bzip2.c</CODE>, before compiling.
 | 
						|
Otherwise the resulting binary won't work correctly.
 | 
						|
 | 
						|
</P>
 | 
						|
 | 
						|
 | 
						|
 | 
						|
<H2><A NAME="SEC46" HREF="manual_toc.html#TOC46">Reporting bugs</A></H2>
 | 
						|
<P>
 | 
						|
I tried pretty hard to make sure <CODE>bzip2</CODE> is
 | 
						|
bug free, both by design and by testing.  Hopefully
 | 
						|
you'll never need to read this section for real.
 | 
						|
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
Nevertheless, if <CODE>bzip2</CODE> dies with a segmentation
 | 
						|
fault, a bus error or an internal assertion failure, it
 | 
						|
will ask you to email me a bug report.  Experience with
 | 
						|
version 0.1 shows that almost all these problems can
 | 
						|
be traced to either compiler bugs or hardware problems.
 | 
						|
 | 
						|
<UL>
 | 
						|
<LI>
 | 
						|
 | 
						|
Recompile the program with no optimisation, and see if it
 | 
						|
works.  And/or try a different compiler.
 | 
						|
I heard all sorts of stories about various flavours
 | 
						|
of GNU C (and other compilers) generating bad code for
 | 
						|
<CODE>bzip2</CODE>, and I've run across two such examples myself.
 | 
						|
 | 
						|
2.7.X versions of GNU C are known to generate bad code from
 | 
						|
time to time, at high optimisation levels.  
 | 
						|
If you get problems, try using the flags
 | 
						|
<CODE>-O2</CODE> <CODE>-fomit-frame-pointer</CODE> <CODE>-fno-strength-reduce</CODE>.
 | 
						|
You should specifically <EM>not</EM> use <CODE>-funroll-loops</CODE>.
 | 
						|
 | 
						|
You may notice that the Makefile runs six tests as part of
 | 
						|
the build process.  If the program passes all of these, it's
 | 
						|
a pretty good (but not 100%) indication that the compiler has
 | 
						|
done its job correctly.
 | 
						|
<LI>
 | 
						|
 | 
						|
If <CODE>bzip2</CODE> crashes randomly, and the crashes are not
 | 
						|
repeatable, you may have a flaky memory subsystem.  <CODE>bzip2</CODE>
 | 
						|
really hammers your memory hierarchy, and if it's a bit marginal,
 | 
						|
you may get these problems.  Ditto if your disk or I/O subsystem
 | 
						|
is slowly failing.  Yup, this really does happen.
 | 
						|
 | 
						|
Try using a different machine of the same type, and see if
 | 
						|
you can repeat the problem.
 | 
						|
<LI>This isn't really a bug, but ... If <CODE>bzip2</CODE> tells
 | 
						|
 | 
						|
you your file is corrupted on decompression, and you
 | 
						|
obtained the file via FTP, there is a possibility that you
 | 
						|
forgot to tell FTP to do a binary mode transfer.  That absolutely
 | 
						|
will cause the file to be non-decompressible.  You'll have to transfer
 | 
						|
it again.
 | 
						|
</UL>
 | 
						|
 | 
						|
<P>
 | 
						|
If you've incorporated <CODE>libbzip2</CODE> into your own program
 | 
						|
and are getting problems, please, please, please, check that the 
 | 
						|
parameters you are passing in calls to the library, are
 | 
						|
correct, and in accordance with what the documentation says
 | 
						|
is allowable.  I have tried to make the library robust against
 | 
						|
such problems, but I'm sure I haven't succeeded.
 | 
						|
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
Finally, if the above comments don't help, you'll have to send
 | 
						|
me a bug report.  Now, it's just amazing how many people will 
 | 
						|
send me a bug report saying something like
 | 
						|
 | 
						|
<PRE>
 | 
						|
   bzip2 crashed with segmentation fault on my machine
 | 
						|
</PRE>
 | 
						|
 | 
						|
<P>
 | 
						|
and absolutely nothing else.  Needless to say, a such a report
 | 
						|
is <EM>totally, utterly, completely and comprehensively 100% useless; 
 | 
						|
a waste of your time, my time, and net bandwidth</EM>.
 | 
						|
With no details at all, there's no way I can possibly begin
 | 
						|
to figure out what the problem is.
 | 
						|
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
The rules of the game are: facts, facts, facts.  Don't omit
 | 
						|
them because "oh, they won't be relevant".  At the bare 
 | 
						|
minimum:
 | 
						|
 | 
						|
<PRE>
 | 
						|
   Machine type.  Operating system version.  
 | 
						|
   Exact version of <CODE>bzip2</CODE> (do <CODE>bzip2 -V</CODE>).  
 | 
						|
   Exact version of the compiler used.  
 | 
						|
   Flags passed to the compiler.
 | 
						|
</PRE>
 | 
						|
 | 
						|
<P>
 | 
						|
However, the most important single thing that will help me is
 | 
						|
the file that you were trying to compress or decompress at the
 | 
						|
time the problem happened.  Without that, my ability to do anything
 | 
						|
more than speculate about the cause, is limited.
 | 
						|
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
Please remember that I connect to the Internet with a modem, so
 | 
						|
you should contact me before mailing me huge files.
 | 
						|
 | 
						|
</P>
 | 
						|
 | 
						|
 | 
						|
 | 
						|
<H2><A NAME="SEC47" HREF="manual_toc.html#TOC47">Did you get the right package?</A></H2>
 | 
						|
 | 
						|
<P>
 | 
						|
<CODE>bzip2</CODE> is a resource hog.  It soaks up large amounts of CPU cycles
 | 
						|
and memory.  Also, it gives very large latencies.  In the worst case, you
 | 
						|
can feed many megabytes of uncompressed data into the library before
 | 
						|
getting any compressed output, so this probably rules out applications
 | 
						|
requiring interactive behaviour.
 | 
						|
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
These aren't faults of my implementation, I hope, but more
 | 
						|
an intrinsic property of the Burrows-Wheeler transform (unfortunately).  
 | 
						|
Maybe this isn't what you want.
 | 
						|
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
If you want a compressor and/or library which is faster, uses less
 | 
						|
memory but gets pretty good compression, and has minimal latency,
 | 
						|
consider Jean-loup
 | 
						|
Gailly's and Mark Adler's work, <CODE>zlib-1.1.2</CODE> and
 | 
						|
<CODE>gzip-1.2.4</CODE>.  Look for them at
 | 
						|
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
<CODE>http://www.cdrom.com/pub/infozip/zlib</CODE> and
 | 
						|
<CODE>http://www.gzip.org</CODE> respectively.
 | 
						|
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
For something faster and lighter still, you might try Markus F X J
 | 
						|
Oberhumer's <CODE>LZO</CODE> real-time compression/decompression library, at
 | 
						|
<BR> <CODE>http://wildsau.idv.uni-linz.ac.at/mfx/lzo.html</CODE>.
 | 
						|
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
If you want to use the <CODE>bzip2</CODE> algorithms to compress small blocks
 | 
						|
of data, 64k bytes or smaller, for example on an on-the-fly disk
 | 
						|
compressor, you'd be well advised not to use this library.  Instead,
 | 
						|
I've made a special library tuned for that kind of use.  It's part of
 | 
						|
<CODE>e2compr-0.40</CODE>, an on-the-fly disk compressor for the Linux
 | 
						|
<CODE>ext2</CODE> filesystem.  Look at
 | 
						|
<CODE>http://www.netspace.net.au/~reiter/e2compr</CODE>.
 | 
						|
 | 
						|
</P>
 | 
						|
 | 
						|
 | 
						|
 | 
						|
<H2><A NAME="SEC48" HREF="manual_toc.html#TOC48">Testing</A></H2>
 | 
						|
 | 
						|
<P>
 | 
						|
A record of the tests I've done.
 | 
						|
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
First, some data sets:
 | 
						|
 | 
						|
<UL>
 | 
						|
<LI>B: a directory containing 6001 files, one for every length in the
 | 
						|
 | 
						|
      range 0 to 6000 bytes.  The files contain random lowercase
 | 
						|
      letters.  18.7 megabytes.
 | 
						|
<LI>H: my home directory tree.  Documents, source code, mail files,
 | 
						|
 | 
						|
      compressed data.  H contains B, and also a directory of 
 | 
						|
      files designed as boundary cases for the sorting; mostly very
 | 
						|
      repetitive, nasty files.  565 megabytes.
 | 
						|
<LI>A: directory tree holding various applications built from source:
 | 
						|
 | 
						|
      <CODE>egcs</CODE>, <CODE>gcc-2.8.1</CODE>, KDE, GTK, Octave, etc.
 | 
						|
      2200 megabytes.
 | 
						|
</UL>
 | 
						|
 | 
						|
<P>
 | 
						|
The tests conducted are as follows.  Each test means compressing 
 | 
						|
(a copy of) each file in the data set, decompressing it and
 | 
						|
comparing it against the original.
 | 
						|
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
First, a bunch of tests with block sizes and internal buffer
 | 
						|
sizes set very small, 
 | 
						|
to detect any problems with the
 | 
						|
blocking and buffering mechanisms.  
 | 
						|
This required modifying the source code so as to try to 
 | 
						|
break it.
 | 
						|
 | 
						|
<OL>
 | 
						|
<LI>Data set H, with
 | 
						|
 | 
						|
      buffer size of 1 byte, and block size of 23 bytes.
 | 
						|
<LI>Data set B, buffer sizes 1 byte, block size 1 byte.
 | 
						|
 | 
						|
<LI>As (2) but small-mode decompression.
 | 
						|
 | 
						|
<LI>As (2) with block size 2 bytes.
 | 
						|
 | 
						|
<LI>As (2) with block size 3 bytes.
 | 
						|
 | 
						|
<LI>As (2) with block size 4 bytes.
 | 
						|
 | 
						|
<LI>As (2) with block size 5 bytes.
 | 
						|
 | 
						|
<LI>As (2) with block size 6 bytes and small-mode decompression.
 | 
						|
 | 
						|
<LI>H with buffer size of 1 byte, but normal block
 | 
						|
 | 
						|
      size (up to 900000 bytes).
 | 
						|
</OL>
 | 
						|
 | 
						|
<P>
 | 
						|
Then some tests with unmodified source code.
 | 
						|
 | 
						|
<OL>
 | 
						|
<LI>H, all settings normal.
 | 
						|
 | 
						|
<LI>As (1), with small-mode decompress.
 | 
						|
 | 
						|
<LI>H, compress with flag <CODE>-1</CODE>.
 | 
						|
 | 
						|
<LI>H, compress with flag <CODE>-s</CODE>, decompress with flag <CODE>-s</CODE>.
 | 
						|
 | 
						|
<LI>Forwards compatibility: H, <CODE>bzip2-0.1pl2</CODE> compressing,
 | 
						|
 | 
						|
      <CODE>bzip2-0.9.5</CODE> decompressing, all settings normal.
 | 
						|
<LI>Backwards compatibility:  H, <CODE>bzip2-0.9.5</CODE> compressing,
 | 
						|
 | 
						|
      <CODE>bzip2-0.1pl2</CODE> decompressing, all settings normal.
 | 
						|
<LI>Bigger tests: A, all settings normal.
 | 
						|
 | 
						|
<LI>As (7), using the fallback (Sadakane-like) sorting algorithm.
 | 
						|
 | 
						|
<LI>As (8), compress with flag <CODE>-1</CODE>, decompress with flag
 | 
						|
 | 
						|
      <CODE>-s</CODE>.
 | 
						|
<LI>H, using the fallback sorting algorithm.
 | 
						|
 | 
						|
<LI>Forwards compatibility: A, <CODE>bzip2-0.1pl2</CODE> compressing,
 | 
						|
 | 
						|
      <CODE>bzip2-0.9.5</CODE> decompressing, all settings normal.
 | 
						|
<LI>Backwards compatibility:  A, <CODE>bzip2-0.9.5</CODE> compressing,
 | 
						|
 | 
						|
      <CODE>bzip2-0.1pl2</CODE> decompressing, all settings normal.
 | 
						|
<LI>Misc test: about 400 megabytes of <CODE>.tar</CODE> files with
 | 
						|
 | 
						|
      <CODE>bzip2</CODE> compiled with Checker (a memory access error
 | 
						|
       detector, like Purify).
 | 
						|
<LI>Misc tests to make sure it builds and runs ok on non-Linux/x86
 | 
						|
 | 
						|
      platforms.
 | 
						|
</OL>
 | 
						|
 | 
						|
<P>
 | 
						|
These tests were conducted on a 225 MHz IDT WinChip machine, running
 | 
						|
Linux 2.0.36.  They represent nearly a week of continuous computation.
 | 
						|
All tests completed successfully.
 | 
						|
 | 
						|
</P>
 | 
						|
 | 
						|
 | 
						|
 | 
						|
<H2><A NAME="SEC49" HREF="manual_toc.html#TOC49">Further reading</A></H2>
 | 
						|
<P>
 | 
						|
<CODE>bzip2</CODE> is not research work, in the sense that it doesn't present
 | 
						|
any new ideas.  Rather, it's an engineering exercise based on existing
 | 
						|
ideas.
 | 
						|
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
Four documents describe essentially all the ideas behind <CODE>bzip2</CODE>:
 | 
						|
 | 
						|
<PRE>
 | 
						|
Michael Burrows and D. J. Wheeler:
 | 
						|
  "A block-sorting lossless data compression algorithm"
 | 
						|
   10th May 1994. 
 | 
						|
   Digital SRC Research Report 124.
 | 
						|
   ftp://ftp.digital.com/pub/DEC/SRC/research-reports/SRC-124.ps.gz
 | 
						|
   If you have trouble finding it, try searching at the
 | 
						|
   New Zealand Digital Library, http://www.nzdl.org.
 | 
						|
 | 
						|
Daniel S. Hirschberg and Debra A. LeLewer
 | 
						|
  "Efficient Decoding of Prefix Codes"
 | 
						|
   Communications of the ACM, April 1990, Vol 33, Number 4.
 | 
						|
   You might be able to get an electronic copy of this
 | 
						|
      from the ACM Digital Library.
 | 
						|
 | 
						|
David J. Wheeler
 | 
						|
   Program bred3.c and accompanying document bred3.ps.
 | 
						|
   This contains the idea behind the multi-table Huffman
 | 
						|
   coding scheme.
 | 
						|
   ftp://ftp.cl.cam.ac.uk/users/djw3/
 | 
						|
 | 
						|
Jon L. Bentley and Robert Sedgewick
 | 
						|
  "Fast Algorithms for Sorting and Searching Strings"
 | 
						|
   Available from Sedgewick's web page,
 | 
						|
   www.cs.princeton.edu/~rs
 | 
						|
</PRE>
 | 
						|
 | 
						|
<P>
 | 
						|
The following paper gives valuable additional insights into the
 | 
						|
algorithm, but is not immediately the basis of any code
 | 
						|
used in bzip2.
 | 
						|
 | 
						|
<PRE>
 | 
						|
Peter Fenwick:
 | 
						|
   Block Sorting Text Compression
 | 
						|
   Proceedings of the 19th Australasian Computer Science Conference,
 | 
						|
     Melbourne, Australia.  Jan 31 - Feb 2, 1996.
 | 
						|
   ftp://ftp.cs.auckland.ac.nz/pub/peter-f/ACSC96paper.ps
 | 
						|
</PRE>
 | 
						|
 | 
						|
<P>
 | 
						|
Kunihiko Sadakane's sorting algorithm, mentioned above,
 | 
						|
is available from:
 | 
						|
 | 
						|
<PRE>
 | 
						|
http://naomi.is.s.u-tokyo.ac.jp/~sada/papers/Sada98b.ps.gz
 | 
						|
</PRE>
 | 
						|
 | 
						|
<P>
 | 
						|
The Manber-Myers suffix array construction
 | 
						|
algorithm is described in a paper
 | 
						|
available from:
 | 
						|
 | 
						|
<PRE>
 | 
						|
http://www.cs.arizona.edu/people/gene/PAPERS/suffix.ps
 | 
						|
</PRE>
 | 
						|
 | 
						|
<P>
 | 
						|
Finally, the following paper documents some recent investigations
 | 
						|
I made into the performance of sorting algorithms:
 | 
						|
 | 
						|
<PRE>
 | 
						|
Julian Seward:
 | 
						|
   On the Performance of BWT Sorting Algorithms
 | 
						|
   Proceedings of the IEEE Data Compression Conference 2000
 | 
						|
     Snowbird, Utah.  28-30 March 2000.
 | 
						|
</PRE>
 | 
						|
 | 
						|
<P><HR><P>
 | 
						|
<p>Go to the <A HREF="manual_1.html">first</A>, <A HREF="manual_3.html">previous</A>, next, last section, <A HREF="manual_toc.html">table of contents</A>.
 | 
						|
</BODY>
 | 
						|
</HTML>
 |