8fd422fe4e
filesystems not supporting hardlinks. * overview2.sgml (ov-hi-files): Change descripton accordingly.
434 lines
23 KiB
Plaintext
434 lines
23 KiB
Plaintext
<sect1 id="ov-ex-win">
|
|
<title>Quick Start Guide for those more experienced with Windows</title>
|
|
<para>
|
|
If you are new to the world of UNIX, you may find it difficult to
|
|
understand at first. This guide is not meant to be comprehensive,
|
|
so we recommend that you use the many available Internet resources
|
|
to become acquainted with UNIX basics (search for "UNIX basics" or
|
|
"UNIX tutorial").
|
|
</para>
|
|
<para>
|
|
To install a basic Cygwin environment, run the
|
|
<command>setup.exe</command> program and click <literal>Next</literal>
|
|
at each page. The default settings are correct for most users. If you
|
|
want to know more about what each option means, see
|
|
<xref linkend="internet-setup"></xref>. Use <command>setup.exe</command>
|
|
any time you want to update or install a Cygwin package. If you are
|
|
installing Cygwin for a specific purpose, use it to install the tools
|
|
that you need. For example, if you want to compile C++ programs, you
|
|
need the <systemitem>gcc-g++</systemitem> package and probably a text
|
|
editor like <systemitem>nano</systemitem>. When running
|
|
<command>setup.exe</command>, clicking on categories and packages in the
|
|
package installation screen will provide you with the ability to control
|
|
what is installed or updated.
|
|
</para>
|
|
<para>
|
|
Another option is to install everything by clicking on the
|
|
<literal>Default</literal> field next to the <literal>All</literal>
|
|
category. However, be advised that this will download and install
|
|
several hundreds of megabytes of software to your computer. The best
|
|
plan is probably to click on individual categories and install either
|
|
entire categories or packages from the categories themselves.
|
|
After installation, you can find Cygwin-specific documentation in
|
|
the <literal>/usr/share/doc/Cygwin/</literal> directory.
|
|
</para>
|
|
<para>
|
|
Developers coming from a Windows background will be able to write
|
|
console or GUI executables that rely on the Microsoft Win32 API instead
|
|
of Cygwin using the -mno-cygwin option to GCC. The <command>-shared</command>
|
|
option allows to write Windows Dynamically Linked Libraries (DLLs). The
|
|
resource compiler <command>windres</command> is also provided.
|
|
</para>
|
|
</sect1>
|
|
|
|
<sect1 id="ov-ex-unix">
|
|
<title>Quick Start Guide for those more experienced with UNIX</title>
|
|
<para>
|
|
If you are an experienced UNIX user who misses a powerful command-line
|
|
environment, you will enjoy Cygwin.
|
|
Developers coming from a UNIX background will find a set of utilities
|
|
they are already comfortable using, including a working UNIX shell. The
|
|
compiler tools are the standard GNU compilers most people will have previously
|
|
used under UNIX, only ported to the Windows host. Programmers wishing to port
|
|
UNIX software to Windows NT will find that the Cygwin library provides
|
|
an easy way to port many UNIX packages, with only minimal source code
|
|
changes.
|
|
</para>
|
|
<para>
|
|
Note that there are some workarounds
|
|
that cause Cygwin to behave differently than most UNIX-like operating
|
|
systems; these are described in more detail in
|
|
<xref linkend="using-effectively"></xref>.
|
|
</para>
|
|
<para>
|
|
Use the graphical command <command>setup.exe</command> any time you want
|
|
to update or install a Cygwin package. This program must be run
|
|
manually every time you want to check for updated packages since Cygwin
|
|
does not currently include a mechanism for automatically detecting
|
|
package updates.
|
|
</para>
|
|
<para>
|
|
By default, <command>setup.exe</command> only installs a minimal subset of
|
|
packages. Add any other packages by clicking on the <literal>+</literal>
|
|
next to the Category name and selecting the package from the displayed
|
|
list. You may search for specfic tools by using the
|
|
<ulink url="http://cygwin.com/packages/">Setup Package Search</ulink>
|
|
at the Cygwin web site.
|
|
</para>
|
|
<para>
|
|
Another option is to install everything by clicking on the
|
|
<literal>Default</literal> field next to the <literal>All</literal>
|
|
category. However, be advised that this will download and install
|
|
several hundreds of megabytes of software to your computer. The best
|
|
plan is probably to click on individual categories and install either
|
|
entire categories or packages from the categories themselves.
|
|
After installation, you can find Cygwin-specific documentation in
|
|
the <literal>/usr/share/doc/Cygwin/</literal> directory.
|
|
</para>
|
|
<para>
|
|
For more information about what each option in
|
|
<command>setup.exe</command> means, see <xref
|
|
linkend="internet-setup"></xref>.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="highlights"><title>Highlights of Cygwin Functionality</title>
|
|
|
|
<sect2 id="ov-hi-intro"><title>Introduction</title> <para>When a binary linked
|
|
against the library is executed, the Cygwin DLL is loaded into the
|
|
application's text segment. Because we are trying to emulate a UNIX kernel
|
|
which needs access to all processes running under it, the first Cygwin DLL to
|
|
run creates shared memory areas and global synchronization objects that other
|
|
processes using separate instances of the DLL can access. This is used to keep track of open file descriptors and to assist fork and exec, among other
|
|
purposes. Every process also has a per_process structure that contains
|
|
information such as process id, user id, signal masks, and other similar
|
|
process-specific information.</para>
|
|
|
|
<para>The DLL is implemented as a standard DLL in the Win32 subsystem. Under
|
|
the hood it's using the Win32 API, as well as the native NT API, where
|
|
appropriate.</para>
|
|
|
|
<para>Because processes run under the standard Win32 subsystem, they
|
|
can access both the UNIX compatibility calls provided by Cygwin as well as
|
|
any of the Win32 API calls. This gives the programmer complete flexibility in
|
|
designing the structure of their program in terms of the APIs used. For
|
|
example, they could write a Win32-specific GUI using Win32 API calls on top of
|
|
a UNIX back-end that uses Cygwin.</para>
|
|
|
|
<para>The native NT API is used mainly for speed, as well as to access
|
|
NT capabilities which are useful to implement certain POSIX features, but
|
|
are hidden to the Win32 API.
|
|
</para>
|
|
|
|
<para>Due to some restrictions in Windows, it's not always possible
|
|
to strictly adhere to existing UNIX standards like POSIX.1. Fortunately
|
|
these are mostely border cases.</para>
|
|
</sect2>
|
|
|
|
<sect2 id="ov-hi-perm"><title>Permissions and Security</title>
|
|
<para>Windows NT includes a sophisticated security model based on Access
|
|
Control Lists (ACLs). Cygwin maps Win32 file ownership and permissions to
|
|
ACLs by default, on file systems supporting them (usually NTFS). Solaris
|
|
style ACLs and accompanying function calls are also supported.
|
|
The chmod call maps UNIX-style permissions back to the Win32 equivalents.
|
|
Because many programs expect to be able to find the
|
|
<filename>/etc/passwd</filename> and
|
|
<filename>/etc/group</filename> files, we provide <ulink
|
|
url="http://cygwin.com/cygwin-ug-net/using-utils.html">utilities</ulink>
|
|
that can be used to construct them from the user and group information
|
|
provided by the operating system.</para>
|
|
|
|
<para>Users with Administrator rights are permitted to chown files.
|
|
With version 1.1.3 Cygwin introduced a mechanism for setting real and
|
|
effective UIDs. This is described in <xref linkend="ntsec"></xref>. As
|
|
of version 1.5.13, the Cygwin developers are not aware of any feature in
|
|
the Cygwin DLL that would allow users to gain privileges or to access
|
|
objects to which they have no rights under Windows. However there is no
|
|
guarantee that Cygwin is as secure as the Windows it runs on. Cygwin
|
|
processes share some variables and are thus easier targets of denial of
|
|
service type of attacks.
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
<sect2 id="ov-hi-files"><title>File Access</title> <para>Cygwin supports
|
|
both POSIX- and Win32-style paths, using either forward or back slashes as the
|
|
directory delimiter. Paths coming into the DLL are translated from POSIX to
|
|
native NT as needed. From the application perspective, the file system is
|
|
a POSIX-compliant one. The implementation details are safely hidden in the
|
|
Cygwin DLL. UNC pathnames (starting with two slashes) are supported for
|
|
network paths.</para>
|
|
|
|
<para>Since version 1.7.0, the layout of this POSIX view of the Windows file
|
|
system space is stored in the <filename>/etc/fstab</filename> file. Actually,
|
|
there is a system-wide <filename>/etc/fstab</filename> file as well as a
|
|
user-specific fstab file <filename>/etc/fstab.d/${USER}</filename>.</para>
|
|
|
|
<para>At startup the DLL has to find out where it can find the
|
|
<filename>/etc/fstab</filename> file. The mechanism used for this is simple.
|
|
First it retrieves it's own path, for instance
|
|
<filename>C:\Cygwin\bin\cygwin1.dll</filename>. From there it deduces
|
|
that the root path is <filename>C:\Cygwin</filename>. So it looks for the
|
|
<filename>fstab</filename> file in <filename>C:\Cygwin\etc\fstab</filename>.
|
|
The layout of this file is very similar to the layout of the
|
|
<filename>fstab</filename> file on Linux. Just instead of block devices,
|
|
the mount points point to Win32 paths. An installation with
|
|
<command>setup.exe</command> installs a <filename>fstab</filename> file by
|
|
default, which can easily be changed using the editor of your choice.</para>
|
|
|
|
<para>In addition to selecting the root partition, the
|
|
<filename>fstab</filename> file allows mounting arbitrary Win32 paths into
|
|
the POSIX file system space. A special case is the so-called cygdrive prefix.
|
|
It's the path under which every available drive in the system is mounted
|
|
under its drive letter. The default value is <filename>/cygdrive</filename>,
|
|
so you can access the drives as <filename>/cygdrive/c</filename>,
|
|
<filename>/cygdrive/d</filename>, etc... The cygdrive prefix can be set to
|
|
some other value (<filename>/mnt</filename> for instance) in the
|
|
<filename>fstab</filename> file(s).</para>
|
|
|
|
<para>The library exports several Cygwin-specific functions that can be used
|
|
by external programs to convert a path or path list from Win32 to POSIX or vice
|
|
versa. Shell scripts and Makefiles cannot call these functions directly.
|
|
Instead, they can do the same path translations by executing the
|
|
<command>cygpath</command> utility program that we provide with Cygwin.</para>
|
|
|
|
<para>Win32 applications handle filenames in a case preserving, but case
|
|
insensitive manner. Cygwin supports case sensitivity on file systems
|
|
supporting that. Since Windows XP, the OS only supports case
|
|
sensitivity when a specific registry value is changed. Therefore, case
|
|
sensitivity is not usually the default.</para>
|
|
|
|
<para>Symbolic links are not present and supported on Windows up to and
|
|
including Windows Server 2003 R2. Native symlinks are available starting
|
|
with Windows Vista. Due to their strange implementation, however,
|
|
they are not useful in a POSIX emulation layer. Cygwin recognizes
|
|
native symlinks, but does not create them.</para>
|
|
|
|
<para>Symbolic links are potentially created in two different ways.
|
|
The file style symlinks are files containing a magic cookie followed by
|
|
the path to which the link points. They are marked with the System DOS
|
|
attribute so that only files with that attribute have to be read to
|
|
determine whether or not the file is a symbolic link. The shortcut style
|
|
symlinks are Windows shortcut files with a special header and the
|
|
Readonly DOS attribute set. The advantage of file symlinks is speed,
|
|
the advantage of shortcut symlinks is the fact that they can be utilized
|
|
by non-Cygwin Win32 tools as well.</para>
|
|
|
|
<para>Starting with Cygwin 1.7, symbolic links are using UTF-16 to encode
|
|
the filename of the target file, to better support internationalization.
|
|
Symlinks created by older Cygwin releases can be read just fine. However,
|
|
you could run into problems with them if you're now using another character
|
|
set than the one you used when creating these symlinks
|
|
(see <xref linkend="setup-locale-problems"></xref>. Please note that this
|
|
new UTF-16 style of symlinks is not compatible with older Cygwin release,
|
|
which can't read the target filename correctly.</para>
|
|
|
|
<para>Hard links are fully supported on NTFS and NFS file systems. On FAT
|
|
and other file systems which don't support hardlinks, the call returns with
|
|
an error, just like on other POSIX systems.</para>
|
|
|
|
<para>On file systems which don't support unique persistent file IDs (FAT,
|
|
older Samba shares) the inode number for a file is calculated by hashing its
|
|
full Win32 path. The inode number generated by the stat call always matches
|
|
the one returned in <literal>d_ino</literal> of the <literal>dirent</literal>
|
|
structure. It is worth noting that the number produced by this method is not
|
|
guaranteed to be unique. However, we have not found this to be a significant
|
|
problem because of the low probability of generating a duplicate inode number.
|
|
</para>
|
|
|
|
<para><function>chroot(2)</function> is supported since Cygwin 1.1.3.
|
|
However, chroot is not a concept known by Windows. This implies some
|
|
restrictions. First of all, the <function>chroot</function> call isn't a
|
|
privileged call. Any user may call it. Second, the chroot environment
|
|
isn't safe against native windows processes. If you want to use a
|
|
chroot environment to, for example, allow anonymous ftp with restricted
|
|
access, you must make sure care that only native Cygwin applications
|
|
are accessible inside of the chroot environment. Since those applications
|
|
are only using the Cygwin POSIX API to access the file system their access
|
|
can be restricted as it is intended. This includes not only POSIX paths but
|
|
Win32 paths containing drive letter and/or backslashes as well as UNC paths
|
|
(<filename>//server/share</filename> or <filename>\\server\share</filename>).
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="ov-hi-textvsbinary"><title>Text Mode vs. Binary Mode</title>
|
|
<para>It is often important that files created by native Windows
|
|
applications be interoperable with Cygwin applications. For example, a
|
|
file created by a native Windows text editor should be readable by a
|
|
Cygwin application, and vice versa.</para>
|
|
|
|
<para>Unfortunately, UNIX and Win32 have different end-of-line
|
|
conventions in text files. A UNIX text file will have a single newline
|
|
character (LF) whereas a Win32 text file will instead use a two
|
|
character sequence (CR+LF). Consequently, the two character sequence
|
|
must be translated on the fly by Cygwin into a single character newline
|
|
when reading in text mode.</para>
|
|
|
|
<para>This solution addresses the newline interoperability concern at
|
|
the expense of violating the POSIX requirement that text and binary mode
|
|
be identical. Consequently, processes that attempt to lseek through
|
|
text files can no longer rely on the number of bytes read to be an
|
|
accurate indicator of position within the file. For this reason, Cygwin
|
|
allows you to choose the mode in which a file is read in several ways.</para>
|
|
</sect2>
|
|
|
|
<sect2 id="ov-hi-ansiclib"><title>ANSI C Library</title>
|
|
<para>We chose to include Red Hat's own existing ANSI C library
|
|
"newlib" as part of the library, rather than write all of the lib C
|
|
and math calls from scratch. Newlib is a BSD-derived ANSI C library,
|
|
previously only used by cross-compilers for embedded systems
|
|
development. Other functions, which are not supported by newlib have
|
|
been added to the Cygwin sources using BSD implementations as much as
|
|
possible.</para>
|
|
|
|
<para>The reuse of existing free implementations of such things
|
|
as the glob, regexp, and getopt libraries saved us considerable
|
|
effort. In addition, Cygwin uses Doug Lea's free malloc
|
|
implementation that successfully balances speed and compactness. The
|
|
library accesses the malloc calls via an exported function pointer.
|
|
This makes it possible for a Cygwin process to provide its own
|
|
malloc if it so desires.</para>
|
|
</sect2>
|
|
|
|
<sect2 id="ov-hi-process"><title>Process Creation</title>
|
|
<para>The <function>fork</function> call in Cygwin is particularly interesting
|
|
because it does not map well on top of the Win32 API. This makes it very
|
|
difficult to implement correctly. Currently, the Cygwin fork is a
|
|
non-copy-on-write implementation similar to what was present in early
|
|
flavors of UNIX.</para>
|
|
|
|
<para>The first thing that happens when a parent process
|
|
forks a child process is that the parent initializes a space in the
|
|
Cygwin process table for the child. It then creates a suspended
|
|
child process using the Win32 CreateProcess call. Next, the parent
|
|
process calls setjmp to save its own context and sets a pointer to
|
|
this in a Cygwin shared memory area (shared among all Cygwin
|
|
tasks). It then fills in the child's .data and .bss sections by
|
|
copying from its own address space into the suspended child's address
|
|
space. After the child's address space is initialized, the child is
|
|
run while the parent waits on a mutex. The child discovers it has
|
|
been forked and longjumps using the saved jump buffer. The child then
|
|
sets the mutex the parent is waiting on and blocks on another mutex.
|
|
This is the signal for the parent to copy its stack and heap into the
|
|
child, after which it releases the mutex the child is waiting on and
|
|
returns from the fork call. Finally, the child wakes from blocking on
|
|
the last mutex, recreates any memory-mapped areas passed to it via the
|
|
shared area, and returns from fork itself.</para>
|
|
|
|
<para>While we have some
|
|
ideas as to how to speed up our fork implementation by reducing the
|
|
number of context switches between the parent and child process, fork
|
|
will almost certainly always be inefficient under Win32. Fortunately,
|
|
in most circumstances the spawn family of calls provided by Cygwin
|
|
can be substituted for a fork/exec pair with only a little effort.
|
|
These calls map cleanly on top of the Win32 API. As a result, they
|
|
are much more efficient. Changing the compiler's driver program to
|
|
call spawn instead of fork was a trivial change and increased
|
|
compilation speeds by twenty to thirty percent in our
|
|
tests.</para>
|
|
|
|
<para>However, spawn and exec present their own set of
|
|
difficulties. Because there is no way to do an actual exec under
|
|
Win32, Cygwin has to invent its own Process IDs (PIDs). As a
|
|
result, when a process performs multiple exec calls, there will be
|
|
multiple Windows PIDs associated with a single Cygwin PID. In some
|
|
cases, stubs of each of these Win32 processes may linger, waiting for
|
|
their exec'd Cygwin process to exit.</para>
|
|
</sect2>
|
|
|
|
<sect2 id="ov-hi-signals"><title>Signals</title>
|
|
<para>When
|
|
a Cygwin process starts, the library starts a secondary thread for
|
|
use in signal handling. This thread waits for Windows events used to
|
|
pass signals to the process. When a process notices it has a signal,
|
|
it scans its signal bitmask and handles the signal in the appropriate
|
|
fashion.</para>
|
|
|
|
<para>Several complications in the implementation arise from the
|
|
fact that the signal handler operates in the same address space as the
|
|
executing program. The immediate consequence is that Cygwin system
|
|
functions are interruptible unless special care is taken to avoid
|
|
this. We go to some lengths to prevent the sig_send function that
|
|
sends signals from being interrupted. In the case of a process
|
|
sending a signal to another process, we place a mutex around sig_send
|
|
such that sig_send will not be interrupted until it has completely
|
|
finished sending the signal.</para>
|
|
|
|
<para>In the case of a process sending
|
|
itself a signal, we use a separate semaphore/event pair instead of the
|
|
mutex. sig_send starts by resetting the event and incrementing the
|
|
semaphore that flags the signal handler to process the signal. After
|
|
the signal is processed, the signal handler signals the event that it
|
|
is done. This process keeps intraprocess signals synchronous, as
|
|
required by POSIX.</para>
|
|
|
|
<para>Most standard UNIX signals are provided. Job
|
|
control works as expected in shells that support
|
|
it.</para>
|
|
</sect2>
|
|
|
|
<sect2 id="ov-hi-sockets"><title>Sockets</title>
|
|
<para>Socket-related calls in Cygwin basically call the functions by the
|
|
same name in Winsock, Microsoft's implementation of Berkeley sockets, but
|
|
with lots of tweaks. All sockets are non-blocking under the hood to allow
|
|
to interrupt blocking calls by POSIX signals. Additional bookkeeping is
|
|
necessary to implement correct socket sharing POSIX semantics and especially
|
|
for the select call. Some socket-related functions are not implemented at
|
|
all in Winsock, as, for example, socketpair. Starting with Windows Vista,
|
|
Microsoft removed the legacy calls <function>rcmd(3)</function>,
|
|
<function>rexec(3)</function> and <function>rresvport(3)</function>.
|
|
Recent versions of Cygwin now implement all these calls internally.</para>
|
|
|
|
<para>An especially troublesome feature of Winsock is that it must be
|
|
initialized before the first socket function is called. As a result, Cygwin
|
|
has to perform this initialization on the fly, as soon as the first
|
|
socket-related function is called by the application. In order to support
|
|
sockets across fork calls, child processes initialize Winsock if any
|
|
inherited file descriptor is a socket.</para>
|
|
|
|
<para>AF_UNIX (AF_LOCAL) sockets are not available in Winsock. They are
|
|
implemented in Cygwin by using local AF_INET sockets instead. This is
|
|
completely transparent to the application. Cygwin's implementation also
|
|
supports the getpeereid BSD extension. However, Cygwin does not yet support
|
|
descriptor passing.</para>
|
|
|
|
<para>IPv6 is supported beginning with Cygwin release 1.7.0. This
|
|
support is dependent, however, on the availability of the Windows IPv6
|
|
stack. The IPv6 stack was "experimental", i.e. not feature complete in
|
|
Windows 2003 and earlier. Full IPv6 support became available starting
|
|
with Windows Vista and Windows Server 2008. Cygwin does not depend on
|
|
the underlying OS for the (newly implemented) <function>getaddrinfo</function>
|
|
and <function>getnameinfo</function> functions. Cygwin 1.7.0 adds
|
|
replacement functions which implement the full functionality for IPv4.</para>
|
|
|
|
</sect2>
|
|
|
|
<sect2 id="ov-hi-select"><title>Select</title>
|
|
<para>The UNIX <function>select</function> function is another
|
|
call that does not map cleanly on top of the Win32 API. Much to our
|
|
dismay, we discovered that the Win32 select in Winsock only worked on
|
|
socket handles. Our implementation allows select to function normally
|
|
when given different types of file descriptors (sockets, pipes,
|
|
handles, and a custom /dev/windows Windows messages
|
|
pseudo-device).</para>
|
|
|
|
<para>Upon entry into the select function, the first
|
|
operation is to sort the file descriptors into the different types.
|
|
There are then two cases to consider. The simple case is when at
|
|
least one file descriptor is a type that is always known to be ready
|
|
(such as a disk file). In that case, select returns immediately as
|
|
soon as it has polled each of the other types to see if they are
|
|
ready. The more complex case involves waiting for socket or pipe file
|
|
descriptors to be ready. This is accomplished by the main thread
|
|
suspending itself, after starting one thread for each type of file
|
|
descriptor present. Each thread polls the file descriptors of its
|
|
respective type with the appropriate Win32 API call. As soon as a
|
|
thread identifies a ready descriptor, that thread signals the main
|
|
thread to wake up. This case is now the same as the first one since
|
|
we know at least one descriptor is ready. So select returns, after
|
|
polling all of the file descriptors one last time.</para>
|
|
</sect2>
|
|
</sect1>
|
|
|