newlib/winsup/doc/overview2.sgml
Corinna Vinschen 8fd422fe4e * new-features.sgml (ov-new1.7-file): Add new hardlink behaviour on
filesystems not supporting hardlinks.
	* overview2.sgml (ov-hi-files): Change descripton accordingly.
2009-09-21 11:01:19 +00:00

434 lines
23 KiB
Plaintext

<sect1 id="ov-ex-win">
<title>Quick Start Guide for those more experienced with Windows</title>
<para>
If you are new to the world of UNIX, you may find it difficult to
understand at first. This guide is not meant to be comprehensive,
so we recommend that you use the many available Internet resources
to become acquainted with UNIX basics (search for "UNIX basics" or
"UNIX tutorial").
</para>
<para>
To install a basic Cygwin environment, run the
<command>setup.exe</command> program and click <literal>Next</literal>
at each page. The default settings are correct for most users. If you
want to know more about what each option means, see
<xref linkend="internet-setup"></xref>. Use <command>setup.exe</command>
any time you want to update or install a Cygwin package. If you are
installing Cygwin for a specific purpose, use it to install the tools
that you need. For example, if you want to compile C++ programs, you
need the <systemitem>gcc-g++</systemitem> package and probably a text
editor like <systemitem>nano</systemitem>. When running
<command>setup.exe</command>, clicking on categories and packages in the
package installation screen will provide you with the ability to control
what is installed or updated.
</para>
<para>
Another option is to install everything by clicking on the
<literal>Default</literal> field next to the <literal>All</literal>
category. However, be advised that this will download and install
several hundreds of megabytes of software to your computer. The best
plan is probably to click on individual categories and install either
entire categories or packages from the categories themselves.
After installation, you can find Cygwin-specific documentation in
the <literal>/usr/share/doc/Cygwin/</literal> directory.
</para>
<para>
Developers coming from a Windows background will be able to write
console or GUI executables that rely on the Microsoft Win32 API instead
of Cygwin using the -mno-cygwin option to GCC. The <command>-shared</command>
option allows to write Windows Dynamically Linked Libraries (DLLs). The
resource compiler <command>windres</command> is also provided.
</para>
</sect1>
<sect1 id="ov-ex-unix">
<title>Quick Start Guide for those more experienced with UNIX</title>
<para>
If you are an experienced UNIX user who misses a powerful command-line
environment, you will enjoy Cygwin.
Developers coming from a UNIX background will find a set of utilities
they are already comfortable using, including a working UNIX shell. The
compiler tools are the standard GNU compilers most people will have previously
used under UNIX, only ported to the Windows host. Programmers wishing to port
UNIX software to Windows NT will find that the Cygwin library provides
an easy way to port many UNIX packages, with only minimal source code
changes.
</para>
<para>
Note that there are some workarounds
that cause Cygwin to behave differently than most UNIX-like operating
systems; these are described in more detail in
<xref linkend="using-effectively"></xref>.
</para>
<para>
Use the graphical command <command>setup.exe</command> any time you want
to update or install a Cygwin package. This program must be run
manually every time you want to check for updated packages since Cygwin
does not currently include a mechanism for automatically detecting
package updates.
</para>
<para>
By default, <command>setup.exe</command> only installs a minimal subset of
packages. Add any other packages by clicking on the <literal>+</literal>
next to the Category name and selecting the package from the displayed
list. You may search for specfic tools by using the
<ulink url="http://cygwin.com/packages/">Setup Package Search</ulink>
at the Cygwin web site.
</para>
<para>
Another option is to install everything by clicking on the
<literal>Default</literal> field next to the <literal>All</literal>
category. However, be advised that this will download and install
several hundreds of megabytes of software to your computer. The best
plan is probably to click on individual categories and install either
entire categories or packages from the categories themselves.
After installation, you can find Cygwin-specific documentation in
the <literal>/usr/share/doc/Cygwin/</literal> directory.
</para>
<para>
For more information about what each option in
<command>setup.exe</command> means, see <xref
linkend="internet-setup"></xref>.
</para>
</sect1>
<sect1 id="highlights"><title>Highlights of Cygwin Functionality</title>
<sect2 id="ov-hi-intro"><title>Introduction</title> <para>When a binary linked
against the library is executed, the Cygwin DLL is loaded into the
application's text segment. Because we are trying to emulate a UNIX kernel
which needs access to all processes running under it, the first Cygwin DLL to
run creates shared memory areas and global synchronization objects that other
processes using separate instances of the DLL can access. This is used to keep track of open file descriptors and to assist fork and exec, among other
purposes. Every process also has a per_process structure that contains
information such as process id, user id, signal masks, and other similar
process-specific information.</para>
<para>The DLL is implemented as a standard DLL in the Win32 subsystem. Under
the hood it's using the Win32 API, as well as the native NT API, where
appropriate.</para>
<para>Because processes run under the standard Win32 subsystem, they
can access both the UNIX compatibility calls provided by Cygwin as well as
any of the Win32 API calls. This gives the programmer complete flexibility in
designing the structure of their program in terms of the APIs used. For
example, they could write a Win32-specific GUI using Win32 API calls on top of
a UNIX back-end that uses Cygwin.</para>
<para>The native NT API is used mainly for speed, as well as to access
NT capabilities which are useful to implement certain POSIX features, but
are hidden to the Win32 API.
</para>
<para>Due to some restrictions in Windows, it's not always possible
to strictly adhere to existing UNIX standards like POSIX.1. Fortunately
these are mostely border cases.</para>
</sect2>
<sect2 id="ov-hi-perm"><title>Permissions and Security</title>
<para>Windows NT includes a sophisticated security model based on Access
Control Lists (ACLs). Cygwin maps Win32 file ownership and permissions to
ACLs by default, on file systems supporting them (usually NTFS). Solaris
style ACLs and accompanying function calls are also supported.
The chmod call maps UNIX-style permissions back to the Win32 equivalents.
Because many programs expect to be able to find the
<filename>/etc/passwd</filename> and
<filename>/etc/group</filename> files, we provide <ulink
url="http://cygwin.com/cygwin-ug-net/using-utils.html">utilities</ulink>
that can be used to construct them from the user and group information
provided by the operating system.</para>
<para>Users with Administrator rights are permitted to chown files.
With version 1.1.3 Cygwin introduced a mechanism for setting real and
effective UIDs. This is described in <xref linkend="ntsec"></xref>. As
of version 1.5.13, the Cygwin developers are not aware of any feature in
the Cygwin DLL that would allow users to gain privileges or to access
objects to which they have no rights under Windows. However there is no
guarantee that Cygwin is as secure as the Windows it runs on. Cygwin
processes share some variables and are thus easier targets of denial of
service type of attacks.
</para>
</sect2>
<sect2 id="ov-hi-files"><title>File Access</title> <para>Cygwin supports
both POSIX- and Win32-style paths, using either forward or back slashes as the
directory delimiter. Paths coming into the DLL are translated from POSIX to
native NT as needed. From the application perspective, the file system is
a POSIX-compliant one. The implementation details are safely hidden in the
Cygwin DLL. UNC pathnames (starting with two slashes) are supported for
network paths.</para>
<para>Since version 1.7.0, the layout of this POSIX view of the Windows file
system space is stored in the <filename>/etc/fstab</filename> file. Actually,
there is a system-wide <filename>/etc/fstab</filename> file as well as a
user-specific fstab file <filename>/etc/fstab.d/${USER}</filename>.</para>
<para>At startup the DLL has to find out where it can find the
<filename>/etc/fstab</filename> file. The mechanism used for this is simple.
First it retrieves it's own path, for instance
<filename>C:\Cygwin\bin\cygwin1.dll</filename>. From there it deduces
that the root path is <filename>C:\Cygwin</filename>. So it looks for the
<filename>fstab</filename> file in <filename>C:\Cygwin\etc\fstab</filename>.
The layout of this file is very similar to the layout of the
<filename>fstab</filename> file on Linux. Just instead of block devices,
the mount points point to Win32 paths. An installation with
<command>setup.exe</command> installs a <filename>fstab</filename> file by
default, which can easily be changed using the editor of your choice.</para>
<para>In addition to selecting the root partition, the
<filename>fstab</filename> file allows mounting arbitrary Win32 paths into
the POSIX file system space. A special case is the so-called cygdrive prefix.
It's the path under which every available drive in the system is mounted
under its drive letter. The default value is <filename>/cygdrive</filename>,
so you can access the drives as <filename>/cygdrive/c</filename>,
<filename>/cygdrive/d</filename>, etc... The cygdrive prefix can be set to
some other value (<filename>/mnt</filename> for instance) in the
<filename>fstab</filename> file(s).</para>
<para>The library exports several Cygwin-specific functions that can be used
by external programs to convert a path or path list from Win32 to POSIX or vice
versa. Shell scripts and Makefiles cannot call these functions directly.
Instead, they can do the same path translations by executing the
<command>cygpath</command> utility program that we provide with Cygwin.</para>
<para>Win32 applications handle filenames in a case preserving, but case
insensitive manner. Cygwin supports case sensitivity on file systems
supporting that. Since Windows XP, the OS only supports case
sensitivity when a specific registry value is changed. Therefore, case
sensitivity is not usually the default.</para>
<para>Symbolic links are not present and supported on Windows up to and
including Windows Server 2003 R2. Native symlinks are available starting
with Windows Vista. Due to their strange implementation, however,
they are not useful in a POSIX emulation layer. Cygwin recognizes
native symlinks, but does not create them.</para>
<para>Symbolic links are potentially created in two different ways.
The file style symlinks are files containing a magic cookie followed by
the path to which the link points. They are marked with the System DOS
attribute so that only files with that attribute have to be read to
determine whether or not the file is a symbolic link. The shortcut style
symlinks are Windows shortcut files with a special header and the
Readonly DOS attribute set. The advantage of file symlinks is speed,
the advantage of shortcut symlinks is the fact that they can be utilized
by non-Cygwin Win32 tools as well.</para>
<para>Starting with Cygwin 1.7, symbolic links are using UTF-16 to encode
the filename of the target file, to better support internationalization.
Symlinks created by older Cygwin releases can be read just fine. However,
you could run into problems with them if you're now using another character
set than the one you used when creating these symlinks
(see <xref linkend="setup-locale-problems"></xref>. Please note that this
new UTF-16 style of symlinks is not compatible with older Cygwin release,
which can't read the target filename correctly.</para>
<para>Hard links are fully supported on NTFS and NFS file systems. On FAT
and other file systems which don't support hardlinks, the call returns with
an error, just like on other POSIX systems.</para>
<para>On file systems which don't support unique persistent file IDs (FAT,
older Samba shares) the inode number for a file is calculated by hashing its
full Win32 path. The inode number generated by the stat call always matches
the one returned in <literal>d_ino</literal> of the <literal>dirent</literal>
structure. It is worth noting that the number produced by this method is not
guaranteed to be unique. However, we have not found this to be a significant
problem because of the low probability of generating a duplicate inode number.
</para>
<para><function>chroot(2)</function> is supported since Cygwin 1.1.3.
However, chroot is not a concept known by Windows. This implies some
restrictions. First of all, the <function>chroot</function> call isn't a
privileged call. Any user may call it. Second, the chroot environment
isn't safe against native windows processes. If you want to use a
chroot environment to, for example, allow anonymous ftp with restricted
access, you must make sure care that only native Cygwin applications
are accessible inside of the chroot environment. Since those applications
are only using the Cygwin POSIX API to access the file system their access
can be restricted as it is intended. This includes not only POSIX paths but
Win32 paths containing drive letter and/or backslashes as well as UNC paths
(<filename>//server/share</filename> or <filename>\\server\share</filename>).
</para>
</sect2>
<sect2 id="ov-hi-textvsbinary"><title>Text Mode vs. Binary Mode</title>
<para>It is often important that files created by native Windows
applications be interoperable with Cygwin applications. For example, a
file created by a native Windows text editor should be readable by a
Cygwin application, and vice versa.</para>
<para>Unfortunately, UNIX and Win32 have different end-of-line
conventions in text files. A UNIX text file will have a single newline
character (LF) whereas a Win32 text file will instead use a two
character sequence (CR+LF). Consequently, the two character sequence
must be translated on the fly by Cygwin into a single character newline
when reading in text mode.</para>
<para>This solution addresses the newline interoperability concern at
the expense of violating the POSIX requirement that text and binary mode
be identical. Consequently, processes that attempt to lseek through
text files can no longer rely on the number of bytes read to be an
accurate indicator of position within the file. For this reason, Cygwin
allows you to choose the mode in which a file is read in several ways.</para>
</sect2>
<sect2 id="ov-hi-ansiclib"><title>ANSI C Library</title>
<para>We chose to include Red Hat's own existing ANSI C library
"newlib" as part of the library, rather than write all of the lib C
and math calls from scratch. Newlib is a BSD-derived ANSI C library,
previously only used by cross-compilers for embedded systems
development. Other functions, which are not supported by newlib have
been added to the Cygwin sources using BSD implementations as much as
possible.</para>
<para>The reuse of existing free implementations of such things
as the glob, regexp, and getopt libraries saved us considerable
effort. In addition, Cygwin uses Doug Lea's free malloc
implementation that successfully balances speed and compactness. The
library accesses the malloc calls via an exported function pointer.
This makes it possible for a Cygwin process to provide its own
malloc if it so desires.</para>
</sect2>
<sect2 id="ov-hi-process"><title>Process Creation</title>
<para>The <function>fork</function> call in Cygwin is particularly interesting
because it does not map well on top of the Win32 API. This makes it very
difficult to implement correctly. Currently, the Cygwin fork is a
non-copy-on-write implementation similar to what was present in early
flavors of UNIX.</para>
<para>The first thing that happens when a parent process
forks a child process is that the parent initializes a space in the
Cygwin process table for the child. It then creates a suspended
child process using the Win32 CreateProcess call. Next, the parent
process calls setjmp to save its own context and sets a pointer to
this in a Cygwin shared memory area (shared among all Cygwin
tasks). It then fills in the child's .data and .bss sections by
copying from its own address space into the suspended child's address
space. After the child's address space is initialized, the child is
run while the parent waits on a mutex. The child discovers it has
been forked and longjumps using the saved jump buffer. The child then
sets the mutex the parent is waiting on and blocks on another mutex.
This is the signal for the parent to copy its stack and heap into the
child, after which it releases the mutex the child is waiting on and
returns from the fork call. Finally, the child wakes from blocking on
the last mutex, recreates any memory-mapped areas passed to it via the
shared area, and returns from fork itself.</para>
<para>While we have some
ideas as to how to speed up our fork implementation by reducing the
number of context switches between the parent and child process, fork
will almost certainly always be inefficient under Win32. Fortunately,
in most circumstances the spawn family of calls provided by Cygwin
can be substituted for a fork/exec pair with only a little effort.
These calls map cleanly on top of the Win32 API. As a result, they
are much more efficient. Changing the compiler's driver program to
call spawn instead of fork was a trivial change and increased
compilation speeds by twenty to thirty percent in our
tests.</para>
<para>However, spawn and exec present their own set of
difficulties. Because there is no way to do an actual exec under
Win32, Cygwin has to invent its own Process IDs (PIDs). As a
result, when a process performs multiple exec calls, there will be
multiple Windows PIDs associated with a single Cygwin PID. In some
cases, stubs of each of these Win32 processes may linger, waiting for
their exec'd Cygwin process to exit.</para>
</sect2>
<sect2 id="ov-hi-signals"><title>Signals</title>
<para>When
a Cygwin process starts, the library starts a secondary thread for
use in signal handling. This thread waits for Windows events used to
pass signals to the process. When a process notices it has a signal,
it scans its signal bitmask and handles the signal in the appropriate
fashion.</para>
<para>Several complications in the implementation arise from the
fact that the signal handler operates in the same address space as the
executing program. The immediate consequence is that Cygwin system
functions are interruptible unless special care is taken to avoid
this. We go to some lengths to prevent the sig_send function that
sends signals from being interrupted. In the case of a process
sending a signal to another process, we place a mutex around sig_send
such that sig_send will not be interrupted until it has completely
finished sending the signal.</para>
<para>In the case of a process sending
itself a signal, we use a separate semaphore/event pair instead of the
mutex. sig_send starts by resetting the event and incrementing the
semaphore that flags the signal handler to process the signal. After
the signal is processed, the signal handler signals the event that it
is done. This process keeps intraprocess signals synchronous, as
required by POSIX.</para>
<para>Most standard UNIX signals are provided. Job
control works as expected in shells that support
it.</para>
</sect2>
<sect2 id="ov-hi-sockets"><title>Sockets</title>
<para>Socket-related calls in Cygwin basically call the functions by the
same name in Winsock, Microsoft's implementation of Berkeley sockets, but
with lots of tweaks. All sockets are non-blocking under the hood to allow
to interrupt blocking calls by POSIX signals. Additional bookkeeping is
necessary to implement correct socket sharing POSIX semantics and especially
for the select call. Some socket-related functions are not implemented at
all in Winsock, as, for example, socketpair. Starting with Windows Vista,
Microsoft removed the legacy calls <function>rcmd(3)</function>,
<function>rexec(3)</function> and <function>rresvport(3)</function>.
Recent versions of Cygwin now implement all these calls internally.</para>
<para>An especially troublesome feature of Winsock is that it must be
initialized before the first socket function is called. As a result, Cygwin
has to perform this initialization on the fly, as soon as the first
socket-related function is called by the application. In order to support
sockets across fork calls, child processes initialize Winsock if any
inherited file descriptor is a socket.</para>
<para>AF_UNIX (AF_LOCAL) sockets are not available in Winsock. They are
implemented in Cygwin by using local AF_INET sockets instead. This is
completely transparent to the application. Cygwin's implementation also
supports the getpeereid BSD extension. However, Cygwin does not yet support
descriptor passing.</para>
<para>IPv6 is supported beginning with Cygwin release 1.7.0. This
support is dependent, however, on the availability of the Windows IPv6
stack. The IPv6 stack was "experimental", i.e. not feature complete in
Windows 2003 and earlier. Full IPv6 support became available starting
with Windows Vista and Windows Server 2008. Cygwin does not depend on
the underlying OS for the (newly implemented) <function>getaddrinfo</function>
and <function>getnameinfo</function> functions. Cygwin 1.7.0 adds
replacement functions which implement the full functionality for IPv4.</para>
</sect2>
<sect2 id="ov-hi-select"><title>Select</title>
<para>The UNIX <function>select</function> function is another
call that does not map cleanly on top of the Win32 API. Much to our
dismay, we discovered that the Win32 select in Winsock only worked on
socket handles. Our implementation allows select to function normally
when given different types of file descriptors (sockets, pipes,
handles, and a custom /dev/windows Windows messages
pseudo-device).</para>
<para>Upon entry into the select function, the first
operation is to sort the file descriptors into the different types.
There are then two cases to consider. The simple case is when at
least one file descriptor is a type that is always known to be ready
(such as a disk file). In that case, select returns immediately as
soon as it has polled each of the other types to see if they are
ready. The more complex case involves waiting for socket or pipe file
descriptors to be ready. This is accomplished by the main thread
suspending itself, after starting one thread for each type of file
descriptor present. Each thread polls the file descriptors of its
respective type with the appropriate Win32 API call. As soon as a
thread identifies a ready descriptor, that thread signals the main
thread to wake up. This case is now the same as the first one since
we know at least one descriptor is ready. So select returns, after
polling all of the file descriptors one last time.</para>
</sect2>
</sect1>