197 lines
		
	
	
		
			7.6 KiB
		
	
	
	
		
			XML
		
	
	
	
	
	
			
		
		
	
	
			197 lines
		
	
	
		
			7.6 KiB
		
	
	
	
		
			XML
		
	
	
	
	
	
<?xml version="1.0" encoding='UTF-8'?>
 | 
						|
<!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook V4.5//EN"
 | 
						|
		"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
 | 
						|
 | 
						|
<sect1 id="using-textbinary"><title>Text and Binary modes</title>
 | 
						|
 | 
						|
<sect2 id="textbin-issue"> <title>The Issue</title>
 | 
						|
 | 
						|
<para>On a UNIX system, when an application reads from a file it gets
 | 
						|
exactly what's in the file on disk and the converse is true for writing.
 | 
						|
The situation is different in the DOS/Windows world where a file can
 | 
						|
be opened in one of two modes, binary or text.  In the binary mode the
 | 
						|
system behaves exactly as in UNIX.  However on writing in text mode, a
 | 
						|
NL (\n, ^J) is transformed into the sequence CR (\r, ^M) NL.
 | 
						|
</para>
 | 
						|
 | 
						|
<para>This can wreak havoc with the seek/fseek calls since the number
 | 
						|
of bytes actually in the file may differ from that seen by the
 | 
						|
application.</para>
 | 
						|
 | 
						|
<para>The mode can be specified explicitly as explained in the Programming
 | 
						|
section below.  In an ideal DOS/Windows world, all programs using lines as
 | 
						|
records (such as <command>bash</command>, <command>make</command>,
 | 
						|
<command>sed</command> ...) would open files (and change the mode of their
 | 
						|
standard input and output) as text.  All other programs (such as
 | 
						|
<command>cat</command>, <command>cmp</command>, <command>tr</command> ...)
 | 
						|
would use binary mode.  In practice with Cygwin, programs that deal
 | 
						|
explicitly with object files specify binary mode (this is the case of
 | 
						|
<command>od</command>, which is helpful to diagnose CR problems).  Most
 | 
						|
other programs (such as <command>sed</command>, <command>cmp</command>,
 | 
						|
<command>tr</command>) use the default mode.</para>
 | 
						|
 | 
						|
</sect2>
 | 
						|
 | 
						|
<sect2 id="textbin-default"><title>The default Cygwin behavior</title>
 | 
						|
 | 
						|
<para>The Cygwin system gives us some flexibility in deciding how files 
 | 
						|
are to be opened when the mode is not specified explicitly. 
 | 
						|
The rules are evolving, this section gives the design goals.</para>
 | 
						|
 | 
						|
<orderedlist numeration="loweralpha">
 | 
						|
<listitem>
 | 
						|
<para>If the filename is specified as a POSIX path and it appears to
 | 
						|
reside on a file system that is mounted (i.e.  if its pathname starts
 | 
						|
with a directory displayed by <command>mount</command>), then the
 | 
						|
default is specified by the mount flag.  If the file is a symbolic link,
 | 
						|
the mode of the target file system applies.</para>
 | 
						|
</listitem>
 | 
						|
 | 
						|
<listitem>
 | 
						|
<para>If the file is specified via a MS-DOS pathname (i.e., it contains a
 | 
						|
backslash or a colon), the default is binary.
 | 
						|
</para>
 | 
						|
</listitem>
 | 
						|
 | 
						|
<listitem>
 | 
						|
<para>Pipes, sockets and non-file devices are opened in binary mode.
 | 
						|
For pipes opened through the pipe() system call you can use the setmode()
 | 
						|
function (see <xref linkend="textbin-devel"></xref> to switch to textmode.
 | 
						|
For pipes opened through popen(), you can simply specify text or binary
 | 
						|
mode just like in calls to fopen().</para>
 | 
						|
</listitem>
 | 
						|
 | 
						|
<listitem>
 | 
						|
<para>Sockets and other non-file devices are always opened in binary mode.
 | 
						|
</para>
 | 
						|
</listitem>
 | 
						|
 | 
						|
<listitem>
 | 
						|
<para> When redirecting, the Cygwin shells uses rules (a-d).
 | 
						|
Non-Cygwin shells always pipe and redirect with binary mode. With
 | 
						|
non-Cygwin shells the commands <command> cat filename | program </command>
 | 
						|
and <command> program < filename </command> are not equivalent when
 | 
						|
<filename>filename</filename> is on a text-mounted partition. </para>
 | 
						|
<para>The programs <command>u2d</command> and <command>d2u</command> can
 | 
						|
be used to add or remove CR's from a file.  <command>u2d</command> add's CR's before a NL.
 | 
						|
<command>d2u</command> removes CR's.  Use the --help option to these commands
 | 
						|
for more information.
 | 
						|
</para>
 | 
						|
</listitem>
 | 
						|
</orderedlist>
 | 
						|
</sect2>
 | 
						|
 | 
						|
<sect2 id="textbin-question"><title>Binary or text?</title>
 | 
						|
 | 
						|
<para>UNIX programs that have been written for maximum portability
 | 
						|
will know the difference between text and binary files and act
 | 
						|
appropriately under Cygwin.  Most programs included in the official
 | 
						|
Cygwin distributions should work well in the default mode. </para>
 | 
						|
 | 
						|
<para>Binmode is the best choice usually since it's faster and
 | 
						|
easier to handle, unless you want to exchange files with native Win32
 | 
						|
applications.  It makes most sense to keep the Cygwin distribution
 | 
						|
and your Cygwin home directory in binmode and generate text files in
 | 
						|
binmode (with UNIX LF lineendings).  Most Windows applications can
 | 
						|
handle binmode files just fine.  A notable exception is the mini-editor
 | 
						|
<command>Notepad</command>, which handles UNIX lineendings incorrectly
 | 
						|
and only produces output files with DOS CRLF lineendings.</para>
 | 
						|
 | 
						|
<para>You can convert files between CRLF and LF lineendings by using
 | 
						|
certain tools in the Cygwin distribution like <command>d2u</command> and
 | 
						|
<command>u2d</command> from the cygutils package.  You can also specify
 | 
						|
a directory in the mount table to be mounted in textmode so you can use
 | 
						|
that directory for exchange purposes.</para>
 | 
						|
 | 
						|
<para>As application programmer you can decide on a file by file base,
 | 
						|
or you can specify default open modes depending on the purpose for which
 | 
						|
the application open files.  See the next section for a description of
 | 
						|
your choices.</para>
 | 
						|
 | 
						|
</sect2>
 | 
						|
 | 
						|
<sect2 id="textbin-devel"><title>Programming</title>
 | 
						|
 | 
						|
<para>In the <function>open()</function> function call, binary mode can be
 | 
						|
specified with the flag <literal>O_BINARY</literal> and text mode with
 | 
						|
<literal>O_TEXT</literal>. These symbols are defined in
 | 
						|
<filename>fcntl.h</filename>.</para>
 | 
						|
 | 
						|
<para>The <function>mkstemp()</function> and <function>mkstemps()</function>
 | 
						|
calls force binary mode.  Use <function>mkostemp()</function> or
 | 
						|
<function>mkostemps()</function> with the same flags
 | 
						|
as <function>open()</function> for more control on temporary files.</para>
 | 
						|
 | 
						|
<para>In the <function>fopen()</function> and <function>popen()</function>
 | 
						|
function calls, binary mode can be specified by adding a <literal>b</literal>
 | 
						|
to the mode string. Text mode is specified by adding a <literal>t</literal>
 | 
						|
to the mode string.</para>
 | 
						|
 | 
						|
<para>The mode of a file can be changed by the call
 | 
						|
<function>setmode(fd,mode)</function> where <literal>fd</literal> is a file
 | 
						|
descriptor (an integer) and <literal>mode</literal> is
 | 
						|
<literal>O_BINARY</literal> or <literal>O_TEXT</literal>. The function
 | 
						|
returns <literal>O_BINARY</literal> or <literal>O_TEXT</literal> depending
 | 
						|
on the mode before the call, and <literal>EOF</literal> on error.</para>
 | 
						|
 | 
						|
<para>There's also a convenient way to set the default open modes used
 | 
						|
in an application by just linking against various object files provided
 | 
						|
by Cygwin.  For instance, if you want to make sure that all files are
 | 
						|
always opened in binary mode by an application, regardless of the mode
 | 
						|
of the underlying mount point, just add the file
 | 
						|
<filename>/lib/binmode.o</filename> to the link stage of the application
 | 
						|
in your project, like this:</para>
 | 
						|
 | 
						|
<screen>
 | 
						|
  $ gcc my_tiny_app.c /lib/binmode.o -o my_tiny_app
 | 
						|
</screen>
 | 
						|
 | 
						|
<para>Even simpler:</para>
 | 
						|
 | 
						|
<screen>
 | 
						|
  $ gcc my_tiny_app.c -lbinmode -o my_tiny_app
 | 
						|
</screen>
 | 
						|
 | 
						|
<para>This adds code which sets the default open mode for all files
 | 
						|
opened by <command>my_tiny_app</command> to binary for reading and
 | 
						|
writing.</para>
 | 
						|
 | 
						|
<para>Cygwin provides the following libraries and object files to set the
 | 
						|
default open mode just by linking an application against them:</para>
 | 
						|
 | 
						|
<itemizedlist mark="bullet">
 | 
						|
 | 
						|
<listitem>
 | 
						|
<screen>
 | 
						|
/lib/libautomode.a      -  Open files for reading in textmode,
 | 
						|
/lib/automode.o            open files for writing in binary mode
 | 
						|
</screen>
 | 
						|
</listitem>
 | 
						|
 | 
						|
<listitem>
 | 
						|
<screen>
 | 
						|
/lib/libbinmode.a       -  Open files for reading and writing in binary mode
 | 
						|
/lib/binmode.o
 | 
						|
</screen>
 | 
						|
</listitem>
 | 
						|
 | 
						|
<listitem>
 | 
						|
<screen>
 | 
						|
/lib/libtextmode.a      -  Open files for reading and writing in textmode
 | 
						|
/lib/textmode.o
 | 
						|
</screen>
 | 
						|
</listitem>
 | 
						|
 | 
						|
<listitem>
 | 
						|
<screen>
 | 
						|
/lib/libtextreadmode.a  -  Open files for reading in textmode,
 | 
						|
/lib/textreadmode.o        keep default behaviour for writing.
 | 
						|
</screen>
 | 
						|
</listitem>
 | 
						|
 | 
						|
</itemizedlist>
 | 
						|
 | 
						|
</sect2>
 | 
						|
 | 
						|
</sect1>
 |