2000-02-17 20:38:33 +01:00
|
|
|
<sect1 id="setup-env"><title>Environment Variables</title>
|
|
|
|
|
|
|
|
<para>
|
2009-04-03 13:51:31 +02:00
|
|
|
You may wish to specify settings of several important environment
|
|
|
|
variables that affect Cygwin's operation. Some of these settings need
|
|
|
|
to be in effect prior to launching the initial Cygwin session (before
|
2009-12-02 10:36:54 +01:00
|
|
|
starting your bash shell, for instance). They should therefore be set
|
|
|
|
in the Windows environment; all Windows environment variables are
|
|
|
|
imported when Cygwin starts. Such settings can be
|
2009-04-03 13:51:31 +02:00
|
|
|
placed in a .bat file. An initial file is named Cygwin.bat and is created
|
|
|
|
in the Cygwin root directory that you specified during setup. Note that
|
|
|
|
the "Cygwin" option of the Start Menu points to Cygwin.bat. Edit
|
|
|
|
Cygwin.bat to your liking or create your own .bat files to start
|
|
|
|
Cygwin processes.</para>
|
2000-02-17 20:38:33 +01:00
|
|
|
|
|
|
|
<para>
|
2001-12-04 05:20:31 +01:00
|
|
|
The <envar>CYGWIN</envar> variable is used to configure many global
|
|
|
|
settings for the Cygwin runtime system. Initially you can leave
|
|
|
|
<envar>CYGWIN</envar> unset or set it to <literal>tty</literal> (e.g.
|
|
|
|
to support job control with ^Z etc...) using a syntax like this in the
|
2009-03-25 11:37:06 +01:00
|
|
|
DOS shell, before launching bash.</para>
|
2000-02-17 20:38:33 +01:00
|
|
|
|
|
|
|
<screen>
|
2001-12-04 05:20:31 +01:00
|
|
|
<prompt>C:\></prompt> <userinput>set CYGWIN=tty notitle glob</userinput>
|
2000-02-17 20:38:33 +01:00
|
|
|
</screen>
|
|
|
|
|
2009-03-25 11:37:06 +01:00
|
|
|
<para>
|
|
|
|
Locale support is controlled by the <envar>LANG</envar> and
|
|
|
|
<envar>LC_xxx</envar> environment variables. You can set all of them
|
|
|
|
but Cygwin itself only honors the variables <envar>LC_ALL</envar>,
|
|
|
|
<envar>LC_CTYPE</envar>, and <envar>LANG</envar>, in this order, according
|
|
|
|
to the POSIX standard. The first one found rules. For a more detailed
|
|
|
|
description see <xref linkend="setup-locale"></xref>.
|
|
|
|
</para>
|
|
|
|
|
2000-02-17 20:38:33 +01:00
|
|
|
<para>
|
|
|
|
The <envar>PATH</envar> environment variable is used by Cygwin
|
|
|
|
applications as a list of directories to search for executable files
|
|
|
|
to run. This environment variable is converted from Windows format
|
2008-07-17 13:49:45 +02:00
|
|
|
(e.g. <filename>C:\Windows\system32;C:\Windows</filename>) to UNIX format
|
|
|
|
(e.g., <filename>/cygdrive/c/Windows/system32:/cygdrive/c/Windows</filename>)
|
|
|
|
when a Cygwin process first starts.
|
2001-12-04 05:20:31 +01:00
|
|
|
Set it so that it contains at least the <filename>x:\cygwin\bin</filename>
|
|
|
|
directory where "<filename>x:\cygwin</filename> is the "root" of your
|
|
|
|
cygwin installation if you wish to use cygwin tools outside of bash.
|
2008-07-17 13:49:45 +02:00
|
|
|
This is usually done by the batch file you're starting your shell with.
|
2000-02-17 20:38:33 +01:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The <envar>HOME</envar> environment variable is used by many programs to
|
|
|
|
determine the location of your home directory and we recommend that it be
|
|
|
|
defined. This environment variable is also converted from Windows format
|
2008-07-17 13:49:45 +02:00
|
|
|
when a Cygwin process first starts. It's usually set in the shell
|
|
|
|
profile scripts in the /etc directory.
|
2000-02-17 20:38:33 +01:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The <envar>TERM</envar> environment variable specifies your terminal
|
2001-12-04 05:20:31 +01:00
|
|
|
type. It is automatically set to <literal>cygwin</literal> if you have
|
|
|
|
not set it to something else.
|
2000-02-17 20:38:33 +01:00
|
|
|
</para>
|
|
|
|
|
2001-12-04 05:20:31 +01:00
|
|
|
<para>The <envar>LD_LIBRARY_PATH</envar> environment variable is used by
|
|
|
|
the Cygwin function <function>dlopen ()</function> as a list of
|
|
|
|
directories to search for .dll files to load. This environment variable
|
|
|
|
is converted from Windows format to UNIX format when a Cygwin process
|
|
|
|
first starts. Most Cygwin applications do not make use of the
|
2000-02-17 20:38:33 +01:00
|
|
|
<function>dlopen ()</function> call and do not need this variable.
|
|
|
|
</para>
|
|
|
|
|
2009-12-02 10:36:54 +01:00
|
|
|
<para>
|
|
|
|
In addition to <envar>PATH</envar>, <envar>HOME</envar>,
|
|
|
|
and <envar>LD_LIBRARY_PATH</envar>, there are three other environment
|
|
|
|
variables which, if they exist in the Windows environment, are
|
|
|
|
converted to UNIX format: <envar>TMPDIR</envar>, <envar>TMP</envar>,
|
|
|
|
and <envar>TEMP</envar>. The first is not set by default in the
|
|
|
|
Windows environment but the other two are, and they point to the
|
|
|
|
default Windows temporary directory. If set, these variables will be
|
|
|
|
used by some Cygwin applications, possibly with unexpected results.
|
|
|
|
You may therefore want to unset them by adding the following two lines
|
|
|
|
to your <filename>~/.bashrc</filename> file:
|
|
|
|
|
|
|
|
<screen>
|
|
|
|
unset TMP
|
|
|
|
unset TEMP
|
|
|
|
</screen>
|
|
|
|
|
|
|
|
This is done in the default <filename>~/.bashrc</filename> file.
|
|
|
|
Alternatively, you could set <envar>TMP</envar>
|
|
|
|
and <envar>TEMP</envar> to point to <filename>/tmp</filename> or to
|
|
|
|
any other temporary directory of your choice. For example:
|
|
|
|
|
|
|
|
<screen>
|
|
|
|
export TMP=/tmp
|
|
|
|
export TEMP=/tmp
|
|
|
|
</screen>
|
|
|
|
</para>
|
|
|
|
|
2000-02-17 20:38:33 +01:00
|
|
|
</sect1>
|
|
|
|
|
2003-02-06 03:52:14 +01:00
|
|
|
<sect1 id="setup-maxmem"><title>Changing Cygwin's Maximum Memory</title>
|
|
|
|
|
|
|
|
<para>
|
2008-09-22 18:55:30 +02:00
|
|
|
Cygwin's heap is extensible. However, it does start out at a fixed size
|
|
|
|
and attempts to extend it may run into memory which has been previously
|
|
|
|
allocated by Windows. In some cases, this problem can be solved by
|
|
|
|
adding an entry in the either the <literal>HKEY_LOCAL_MACHINE</literal>
|
|
|
|
(to change the limit for all users) or
|
|
|
|
<literal>HKEY_CURRENT_USER</literal> (for just the current user) section
|
|
|
|
of the registry. </para>
|
2003-02-06 03:52:14 +01:00
|
|
|
|
|
|
|
<para>
|
|
|
|
Add the <literal>DWORD</literal> value <literal>heap_chunk_in_mb</literal>
|
|
|
|
and set it to the desired memory limit in decimal MB. It is preferred to do
|
|
|
|
this in Cygwin using the <command>regtool</command> program included in the
|
|
|
|
Cygwin package.
|
|
|
|
(For more information about <command>regtool</command> or the other Cygwin
|
2009-04-03 13:51:31 +02:00
|
|
|
utilities, see <xref linkend="using-utils"></xref> or use the
|
2003-02-06 03:52:14 +01:00
|
|
|
<literal>--help</literal> option of each util.) You should always be careful
|
|
|
|
when using <command>regtool</command> since damaging your system registry can
|
|
|
|
result in an unusable system. This example sets memory limit to 1024 MB:
|
|
|
|
|
|
|
|
<screen>
|
2008-07-17 13:49:45 +02:00
|
|
|
regtool -i set /HKLM/Software/Cygwin/heap_chunk_in_mb 1024
|
|
|
|
regtool -v list /HKLM/Software/Cygwin
|
2003-02-06 03:52:14 +01:00
|
|
|
</screen>
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Exit all running Cygwin processes and restart them. Memory can be allocated up
|
|
|
|
to the size of the system swap space minus any the size of any running
|
|
|
|
processes. The system swap should be at least as large as the physically
|
|
|
|
installed RAM and can be modified under the System category of the
|
|
|
|
Control Panel.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Here is a small program written by DJ Delorie that tests the
|
|
|
|
memory allocation limit on your system:
|
|
|
|
|
|
|
|
<screen>
|
|
|
|
main()
|
|
|
|
{
|
|
|
|
unsigned int bit=0x40000000, sum=0;
|
|
|
|
char *x;
|
|
|
|
|
|
|
|
while (bit > 4096)
|
|
|
|
{
|
|
|
|
x = malloc(bit);
|
|
|
|
if (x)
|
|
|
|
sum += bit;
|
|
|
|
bit >>= 1;
|
|
|
|
}
|
|
|
|
printf("%08x bytes (%.1fMb)\n", sum, sum/1024.0/1024.0);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
</screen>
|
|
|
|
|
|
|
|
You can compile this program using:
|
|
|
|
<screen>
|
|
|
|
gcc max_memory.c -o max_memory.exe
|
|
|
|
</screen>
|
|
|
|
|
|
|
|
Run the program and it will output the maximum amount of allocatable memory.
|
|
|
|
</para>
|
2008-07-17 13:49:45 +02:00
|
|
|
|
2003-02-06 03:52:14 +01:00
|
|
|
</sect1>
|
|
|
|
|
2009-03-25 11:37:06 +01:00
|
|
|
<sect1 id="setup-locale"><title>Internationalization</title>
|
|
|
|
|
|
|
|
<sect2 id="setup-locale-ov"><title>Overview</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Internationalization support is controlled by the <envar>LANG</envar> and
|
|
|
|
<envar>LC_xxx</envar> environment variables. You can set all of them
|
|
|
|
but Cygwin itself only honors the variables <envar>LC_ALL</envar>,
|
|
|
|
<envar>LC_CTYPE</envar>, and <envar>LANG</envar>, in this order, according
|
|
|
|
to the POSIX standard. The content of these variables should follow the
|
|
|
|
POSIX standard for a locale specifier. The correct form of a locale
|
|
|
|
specifier is</para>
|
|
|
|
|
|
|
|
<screen>
|
|
|
|
language[[_TERRITORY][.charset][@modifier]]
|
|
|
|
</screen>
|
|
|
|
|
2010-01-17 15:55:57 +01:00
|
|
|
<para>"language" is a lowercase two character string per ISO 639-1, or,
|
|
|
|
if there is no ISO 639-1 code for the language (for instance, "Lower Sorbian"),
|
|
|
|
a three character string per ISO 639-3.</para>
|
|
|
|
|
|
|
|
<para>"TERRITORY" is an uppercase two character string per ISO 3166, charset is
|
2009-03-25 11:37:06 +01:00
|
|
|
one of a list of supported character sets, and the modifier doesn't matter
|
|
|
|
here (though it might for some applications). If you're interested in the
|
|
|
|
exact description, you can find it in the online publication of the POSIX
|
|
|
|
manual pages on the homepage of the
|
|
|
|
<ulink url="http://www.opengroup.org/">Open Group</ulink>.</para>
|
|
|
|
|
|
|
|
<para>Typical locale specifiers are</para>
|
|
|
|
|
|
|
|
<screen>
|
|
|
|
"de_CH" language = German, territory = Switzerland, default charset
|
|
|
|
"fr_FR.UTF-8" language = french, territory = France, charset = UTF-8
|
|
|
|
"ko_KR.eucKR" language = korean, territory = South Korea, charset = eucKR
|
2010-01-17 15:55:57 +01:00
|
|
|
"syr_SY" language = Syriac, territory = Syria, default charset
|
2009-03-25 11:37:06 +01:00
|
|
|
</screen>
|
|
|
|
|
|
|
|
<para>
|
2009-09-30 11:45:01 +02:00
|
|
|
At application startup, the application's locale is set to the default
|
2010-01-17 15:55:57 +01:00
|
|
|
"C" or "POSIX" locale. Under Cygwin 1.7.2 and later, this locale defaults
|
|
|
|
to the ASCII character set on the application level. If you want to stick
|
|
|
|
to the "C" locale and only change to another charset, you can define this
|
|
|
|
by setting one of the locale environment variables to "C.charset". For
|
|
|
|
instance</para>
|
2009-09-30 11:45:01 +02:00
|
|
|
|
|
|
|
<screen>
|
2009-10-02 14:35:52 +02:00
|
|
|
"C.ISO-8859-1"
|
2009-09-30 11:45:01 +02:00
|
|
|
</screen>
|
|
|
|
|
2010-01-17 15:55:57 +01:00
|
|
|
<note><para>The default locale in the absence of the aforementioned locale
|
|
|
|
environment variables is "C.UTF-8".</para></note>
|
2009-10-28 11:09:54 +01:00
|
|
|
|
2009-09-30 11:45:01 +02:00
|
|
|
<para>Windows uses the UTF-16 charset exclusively to store the names
|
|
|
|
of any object used by the Operating System. This is especially important
|
|
|
|
with filenames. Cygwin uses the setting of the locale environment variables
|
|
|
|
<envar>LC_ALL</envar>, <envar>LC_CTYPE</envar>, and <envar>LANG</envar>, to
|
|
|
|
determine how to convert Windows filenames from their UTF-16 representation
|
2009-10-02 14:35:52 +02:00
|
|
|
to the singlebyte or multibyte character set used by Cygwin.</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The setting of the locale environment variables at process startup
|
|
|
|
is effective for Cygwin's internal conversions to and from the Windows UTF-16
|
|
|
|
object names for the entire lifetime of the current process. Changing
|
2009-09-30 11:45:01 +02:00
|
|
|
the environment variables to another value changes the way filenames are
|
2009-10-02 14:35:52 +02:00
|
|
|
converted in subsequently started child processes, but not within the same
|
|
|
|
process.</para>
|
2009-09-30 11:45:01 +02:00
|
|
|
|
|
|
|
<para>
|
|
|
|
However, even if one of the locale environment variables is set to
|
|
|
|
some other value than "C", this does <emphasis>only</emphasis> affect
|
|
|
|
how Cygwin itself converts filenames. As the POSIX standard requires,
|
2010-01-17 15:55:57 +01:00
|
|
|
it's the application's responsibility to activate that locale for its
|
|
|
|
own purposes, typically by using the call</para>
|
2009-09-30 11:45:01 +02:00
|
|
|
|
|
|
|
<screen>
|
|
|
|
setlocale (LC_ALL, "");
|
|
|
|
</screen>
|
|
|
|
|
2009-10-28 11:09:54 +01:00
|
|
|
<para>early in the application code. Again, so that this doesn't get
|
|
|
|
lost: If the application calls setlocale as above, and there is none
|
|
|
|
of the important locale variables set in the environment, the locale
|
|
|
|
is set to the default locale, which is "C.UTF-8".</para>
|
2009-03-25 11:37:06 +01:00
|
|
|
|
2010-01-17 15:55:57 +01:00
|
|
|
<para>But what about applications which are not locale-aware? Per POSIX,
|
|
|
|
they are running in the "C" or "POSIX" locale, which implies the ASCII
|
|
|
|
charset. The Cygwin DLL itself, however, will nevertheless use the locale
|
|
|
|
set in the environment (or the "C.UTF-8" default locale) for converting
|
|
|
|
filenames etc.</para>
|
|
|
|
|
2010-01-22 23:32:42 +01:00
|
|
|
<para>When the locale in the environment specifies an ASCII charset,
|
2010-01-17 15:55:57 +01:00
|
|
|
for example "C" or "en_US.ASCII", Cygwin will still use UTF-8
|
|
|
|
under the hood to translate filenames. This allows for easier
|
|
|
|
interoperability with applications running in the default "C.UTF-8" locale.
|
|
|
|
</para>
|
|
|
|
|
2009-03-25 11:37:06 +01:00
|
|
|
<para>
|
2010-01-22 23:32:42 +01:00
|
|
|
Starting with Cygwin 1.7.2, the language and territory are used to
|
|
|
|
fetch locale-dependent information from Windows. If the language and
|
|
|
|
territory are not known to Windows, the <function>setlocale</function>
|
|
|
|
function fails.</para>
|
2009-06-19 11:33:45 +02:00
|
|
|
|
2010-01-22 23:32:42 +01:00
|
|
|
<para>The modifier is used for two cases.</para>
|
2009-06-19 11:33:45 +02:00
|
|
|
|
2010-01-22 23:32:42 +01:00
|
|
|
<itemizedlist mark="bullet">
|
|
|
|
|
|
|
|
<listitem><para>For languages which default to one of the ISO-8859 character
|
|
|
|
sets, the modifier "@euro" can be added to enforce usage of the ISO-8859-15
|
|
|
|
character set, which includes a character for the "Euro" currency sign .</para>
|
|
|
|
</listitem>
|
2009-09-30 11:45:01 +02:00
|
|
|
|
2010-01-22 23:32:42 +01:00
|
|
|
<listitem><para>There's a class of characters in the Unicode character set,
|
|
|
|
called the "CJK Ambiguous Width Character set". For these characters the width
|
|
|
|
returned by the wcwidth/wcswidth function is usually 1. This is often a
|
|
|
|
problem in East-Asian languages, which historically use character sets in
|
|
|
|
which these characters have a width of 2. By default, the wcwidth/wcswidth
|
|
|
|
functions return 1 as the width of these characters, except if the language is
|
|
|
|
specifed as "ja" (Japanese), "ko" (Korean), or "zh" (Chinese). In these
|
|
|
|
languages wcwidth and wcswidth return 2 for these characters. This is not
|
|
|
|
correct in all circumstances, so the user of one of these languages can specify
|
|
|
|
the modifier "@cjknarrow", which modifies the behaviour of wcwidth/wcswidth to
|
|
|
|
return 1 for the ambiguous width characters.</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
</itemizedlist>
|
2009-03-25 11:37:06 +01:00
|
|
|
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2 id="setup-locale-how"><title>How to set the locale</title>
|
|
|
|
|
|
|
|
<itemizedlist mark="bullet">
|
|
|
|
|
|
|
|
<listitem><para>
|
2009-04-03 13:51:31 +02:00
|
|
|
Assume that you've set one of the aforementioned environment variables to some
|
2009-09-30 11:45:01 +02:00
|
|
|
valid POSIX locale value, other than "C" and "POSIX". Assume further that
|
|
|
|
you're living in Japan. You might want to use the language code "ja" and the
|
|
|
|
territory "JP", thus setting, say, <envar>LANG</envar> to "ja_JP". You didn't
|
2010-01-22 23:32:42 +01:00
|
|
|
set a character set, so what will Cygwin use now? Starting with Cygwin 1.7.2,
|
|
|
|
the default character set is determined by the default Windows ANSI codepage
|
|
|
|
for this language and territory. Cygwin uses a character set which is the
|
|
|
|
typical Unix-equivalent to the Windows ANSI codepage. For instance:</para>
|
2009-03-25 11:37:06 +01:00
|
|
|
|
2010-01-22 23:32:42 +01:00
|
|
|
<screen>
|
|
|
|
"en_US" ISO-8859-1
|
|
|
|
"el_GR" ISO-8859-7
|
|
|
|
"pl_PL" ISO-8859-2
|
|
|
|
"pl_PL@euro" ISO-8859-15
|
|
|
|
"ja_JP" EUCJP
|
|
|
|
"ko_KR" EUCKR
|
|
|
|
"te_IN" UTF-8
|
|
|
|
</screen>
|
2009-03-25 11:37:06 +01:00
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem><para>
|
2010-01-22 23:32:42 +01:00
|
|
|
You don't want to use the default character set? In that case you have to
|
|
|
|
specify the charset explicitly. For instance, assume you're from Japan and
|
|
|
|
don't want to use the japanese default charset EUC-JP, but the Windows
|
|
|
|
default charset SJIS. What you can do, for instance, is to set the
|
|
|
|
<envar>LANG</envar> variable in the <filename>C:\cygwin\Cygwin.bat</filename>
|
|
|
|
file which is the batch file to start a Cygwin session from the "Cygwin"
|
|
|
|
desktop shortcut.</para>
|
2009-03-25 11:37:06 +01:00
|
|
|
|
|
|
|
<screen>
|
|
|
|
@echo off
|
|
|
|
|
|
|
|
C:
|
|
|
|
chdir C:\cygwin\bin
|
2010-01-22 23:32:42 +01:00
|
|
|
set LANG=ja_JP.SJIS
|
2009-03-25 11:37:06 +01:00
|
|
|
bash --login -i
|
|
|
|
</screen>
|
2010-01-22 23:32:42 +01:00
|
|
|
|
|
|
|
<note><para>For a list of locales supported by your Windows machine, use the new
|
|
|
|
><command>getlocale -a</command> command, which is part of the Cygwin package.
|
|
|
|
For a description see <xref linkend="getlocale"></xref></para></note>
|
|
|
|
|
|
|
|
<note><para>For a list of supported character sets, see
|
|
|
|
<xref linkend="setup-locale-charsetlist"></xref>
|
|
|
|
</para></note>
|
2009-03-25 11:37:06 +01:00
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem><para>
|
2009-09-30 11:45:01 +02:00
|
|
|
Last, but not least, most singlebyte or doublebyte charsets have a big
|
|
|
|
disadvantage. Windows filesystems use the Unicode character set in the
|
|
|
|
UTF-16 encoding to store filename information. Not all characters
|
2009-03-25 11:37:06 +01:00
|
|
|
from the Unicode character set are available in a singlebyte or doublebyte
|
|
|
|
charset. While Cygwin has a workaround to access files with unusual
|
|
|
|
characters (see <xref linkend="pathnames-unusual"></xref>), a better
|
2009-09-30 11:45:01 +02:00
|
|
|
workaround is to use always the UTF-8 character set.i</para>
|
|
|
|
|
|
|
|
<para><emphasis>UTF-8 is the only multibyte character set which can represent
|
|
|
|
every Unicode character.</emphasis></para>
|
2009-03-25 11:37:06 +01:00
|
|
|
|
|
|
|
<screen>
|
|
|
|
set LANG=es_MX.UTF-8
|
|
|
|
</screen>
|
|
|
|
|
|
|
|
<para>For a description of the Unicode standard, see the homepage of the
|
|
|
|
<ulink url="http://www.unicode.org/">Unicode Consortium</ulink>.
|
|
|
|
</para></listitem>
|
|
|
|
|
|
|
|
</itemizedlist>
|
|
|
|
|
|
|
|
</sect2>
|
|
|
|
|
2009-04-07 15:04:43 +02:00
|
|
|
<sect2 id="setup-locale-console"><title>The Windows Console character set</title>
|
|
|
|
|
|
|
|
<para>Most of the time the Windows console is used to run Cygwin applications.
|
|
|
|
While terminal emulations like <command>xterm</command> or
|
|
|
|
<command>mintty</command> have a distinct way to set the character set
|
|
|
|
used for in- and output, the Windows console hasn't such a way, since it's
|
|
|
|
not an application in its own right.</para>
|
|
|
|
|
2009-09-30 11:45:01 +02:00
|
|
|
<para>This problem is solved in Cygwin as follows. When a Cygwin
|
2009-05-27 04:30:42 +02:00
|
|
|
process is started in a Windows console (either explicitly from cmd.exe,
|
2009-04-07 15:04:43 +02:00
|
|
|
or implicitly by, for instance, clicking on the Cygwin desktop icon, or
|
|
|
|
running the Cygwin.bat file), the Console character set is determined by the
|
|
|
|
setting of the aforementioned internationalization environment variables,
|
|
|
|
the same way as described in <xref linkend="setup-locale-how"></xref>.
|
|
|
|
</para>
|
|
|
|
|
2009-09-30 11:45:01 +02:00
|
|
|
<para>What is that good for? Why not switch the console character set with
|
|
|
|
the applications requirements? After all, the application knows if it uses
|
|
|
|
localization or not. However, what if a non-localized application calls
|
|
|
|
a remote application which itself is localized? This can happen with
|
|
|
|
<command>ssh</command> or <command>rlogin</command>. Both commands don't
|
|
|
|
have and don't need localization and they never call
|
|
|
|
<function>setlocale</function>. Setting one of the internationalization
|
|
|
|
environment variable to the same charset as the remote machine before
|
|
|
|
starting <command>ssh</command> or <command>rlogin</command> fixes that
|
|
|
|
problem.</para>
|
2009-04-07 15:04:43 +02:00
|
|
|
|
|
|
|
</sect2>
|
|
|
|
|
2009-04-06 12:25:28 +02:00
|
|
|
<sect2 id="setup-locale-problems"><title>Potential Problems when using Locales</title>
|
2009-03-25 11:37:06 +01:00
|
|
|
|
|
|
|
<para>
|
|
|
|
You can set the above internationalization variables not only in
|
|
|
|
<filename>Cygwin.bat</filename> or in the Windows environment, but also
|
|
|
|
in your Cygwin shell on the fly, even switch to yet another character
|
|
|
|
set, and yet another. In bash for instance:</para>
|
|
|
|
|
|
|
|
<screen>
|
|
|
|
<prompt>bash$</prompt> export LC_CTYPE="nl_BE.UTF-8"
|
|
|
|
</screen>
|
|
|
|
|
|
|
|
<para>However, here's a problem. At the start of the first Cygwin process
|
2009-09-30 11:45:01 +02:00
|
|
|
in a session, the Windows environment is converted from UTF-16 to UTF-8.
|
|
|
|
The environment is another of the system objects stored in UTF-16 in
|
|
|
|
Windows.</para>
|
2009-03-25 11:37:06 +01:00
|
|
|
|
|
|
|
<para>As long as the environment only contains ASCII characters, this is
|
2009-05-13 17:11:39 +02:00
|
|
|
no problem at all. But if it contains native characters, and you're planning
|
|
|
|
to use, say, GBK, the environment will result in invalid characters in
|
|
|
|
the GBK charset. This would be especially a problem in variables like
|
2009-09-30 11:45:01 +02:00
|
|
|
<envar>PATH</envar>. To circumvent the worst problems, Cygwin converts
|
|
|
|
the <envar>PATH</envar> environment variable to the charset set in the
|
|
|
|
environment, if it's different from the UTF-8 charset.</para>
|
2009-03-25 11:37:06 +01:00
|
|
|
|
|
|
|
<note><para>Per POSIX, the name of an environment variable should only
|
|
|
|
consist of valid ASCII characters, and only of uppercase letters, digits, and
|
|
|
|
the underscore for maximum portablilty.</para></note>
|
|
|
|
|
2009-04-03 13:51:31 +02:00
|
|
|
<para>Symbolic links, too, may pose a problem when switching charsets on
|
|
|
|
the fly. A symbolic link contains the filename of the target file the
|
|
|
|
symlink points to. When a symlink had been created with older versions
|
|
|
|
of Cygwin, the current ANSI or OEM character set had been used to store
|
|
|
|
the target filename, dependent on the old <envar>CYGWIN</envar>
|
|
|
|
environment variable setting <envar>codepage</envar> (see <xref
|
|
|
|
linkend="cygwinenv-removed-options"></xref>. If the target filename
|
|
|
|
contains non-ASCII characters and you use another character set than
|
|
|
|
your default ANSI/OEM charset, the target filename of the symlink is now
|
|
|
|
potentially an invalid character sequence in the new character set.
|
|
|
|
This behaviour is not different from the behaviour in other Operating
|
|
|
|
Systems. So, if you suddenly can't access a symlink anymore which
|
|
|
|
worked all these years before, maybe it's because you switched to
|
2009-03-26 13:25:11 +01:00
|
|
|
another character set. This doesn't occur with symlinks created with
|
2009-04-03 13:51:31 +02:00
|
|
|
Cygwin 1.7 or later. </para>
|
2009-03-25 11:37:06 +01:00
|
|
|
|
2009-09-25 14:27:46 +02:00
|
|
|
<para>Another problem you might encounter is that older versions of
|
|
|
|
Windows did not install all charsets by default. If you are running
|
|
|
|
Windows XP or older, you can open the "Regional and Language Options"
|
|
|
|
portion of the Control Panel, select the "Advanced" tab, and select
|
|
|
|
entries from the "Code page conversion tables" list. The following
|
2009-09-25 18:42:36 +02:00
|
|
|
entries are useful to cygwin: 932/SJIS, 936/GBK, 949/EUC-KR, 950/Big5,
|
2009-09-25 14:27:46 +02:00
|
|
|
20932/EUC-JP.</para>
|
|
|
|
|
2009-03-25 11:37:06 +01:00
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2 id="setup-locale-missing"><title>What does not work?</title>
|
|
|
|
|
|
|
|
<para>
|
2010-01-22 23:32:42 +01:00
|
|
|
The environment variable and locale setting <envar>LC_MESSAGES</envar>
|
|
|
|
is ignored right now. There's no known WIndows function to fetch the
|
|
|
|
regular expressions to recognize user input with the meaning of "yes"
|
|
|
|
or "no" from some Windows function. Therefore,
|
|
|
|
<function>nl_langinfo(YESEXPR)</function> and
|
|
|
|
<function>nl_langinfo(NOEXPR)</function> always return a string
|
|
|
|
suitable only for the English language.</para>
|
|
|
|
|
|
|
|
<para>If somebody knows a simple solution to this problem, feel free
|
|
|
|
to notify us on the
|
|
|
|
<ulink url="mailto:cygwin@cygin.com">Cygwin mailing list</ulink>.
|
2009-03-25 11:37:06 +01:00
|
|
|
</para>
|
|
|
|
|
|
|
|
</sect2>
|
2009-08-22 10:44:04 +02:00
|
|
|
|
2009-03-25 11:37:06 +01:00
|
|
|
<sect2 id="setup-locale-charsetlist"><title>List of supported character sets</title>
|
|
|
|
|
|
|
|
<para>Last but not least, here's the list of currently supported character
|
|
|
|
sets. The left-hand expression is the name of the charset, as you would use
|
|
|
|
it in the internationalization environment variables as outlined above.
|
2009-08-22 10:44:04 +02:00
|
|
|
Note that charset specifiers are case-insensitive. <literal>EUCJP</literal>
|
|
|
|
is equivalent to <literal>eucJP</literal> or <literal>eUcJp</literal>.
|
|
|
|
Writing the charset in the exact case as given in the list below is a
|
|
|
|
good convention, though.
|
2009-03-25 11:37:06 +01:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>The right-hand side is the number of the equivalent Windows
|
|
|
|
codepage as well as the Windows name of the codepage. They are only
|
|
|
|
noted here for reference. Don't try to use the bare codepage number or
|
|
|
|
the Windows name of the codepage as charset in locale specifiers, unless
|
|
|
|
they happen to be identical with the left-hand side. Especially in case
|
2009-08-22 10:44:04 +02:00
|
|
|
of the "CPxxx" style charsets, always use them with the trailing "CP".</para>
|
2009-03-25 11:37:06 +01:00
|
|
|
|
|
|
|
<para>This works:</para>
|
|
|
|
|
|
|
|
<screen>
|
|
|
|
set LC_ALL=en_US.CP437
|
|
|
|
</screen>
|
|
|
|
|
|
|
|
<para>This does <emphasis>not</emphasis> work:</para>
|
|
|
|
|
|
|
|
<screen>
|
|
|
|
set LC_ALL=en_US.437
|
|
|
|
</screen>
|
|
|
|
|
|
|
|
<para>You can find a full list of Windows codepages on the Microsoft MSDN page
|
|
|
|
<ulink url="http://msdn.microsoft.com/en-us/library/dd317756(VS.85).aspx">Code Page Identifiers</ulink>.</para>
|
|
|
|
|
|
|
|
<screen>
|
|
|
|
Charset Codepage
|
2010-01-23 16:03:06 +01:00
|
|
|
------------------- -------------------------------------------
|
|
|
|
ASCII 20127 (US_ASCII)
|
2009-03-25 11:37:06 +01:00
|
|
|
|
|
|
|
CP437 437 (OEM United States)
|
|
|
|
CP720 720 (DOS Arabic)
|
|
|
|
CP737 737 (OEM Greek)
|
|
|
|
CP775 775 (OEM Baltic)
|
|
|
|
CP850 850 (OEM Latin 1, Western European)
|
|
|
|
CP852 852 (OEM Latin 2, Central European)
|
|
|
|
CP855 855 (OEM Cyrillic)
|
|
|
|
CP857 857 (OEM Turkish)
|
|
|
|
CP858 858 (OEM Latin 1 + Euro Symbol)
|
|
|
|
CP862 862 (OEM Hebrew)
|
|
|
|
CP866 866 (OEM Russian)
|
|
|
|
CP874 874 (ANSI/OEM Thai)
|
2010-01-23 17:44:00 +01:00
|
|
|
CP932 932 (Shift_JIS, not exactly identical to SJIS)
|
2009-03-25 11:37:06 +01:00
|
|
|
CP1125 1125 (OEM Ukraine)
|
|
|
|
CP1250 1250 (ANSI Central European)
|
|
|
|
CP1251 1251 (ANSI Cyrillic)
|
|
|
|
CP1252 1252 (ANSI Latin 1, Western European)
|
|
|
|
CP1253 1253 (ANSI Greek)
|
|
|
|
CP1254 1254 (ANSI Turkish)
|
|
|
|
CP1255 1255 (ANSI Hebrew)
|
|
|
|
CP1256 1256 (ANSI Arabic)
|
|
|
|
CP1257 1257 (ANSI Baltic)
|
|
|
|
CP1258 1258 (ANSI/OEM Vietnamese)
|
|
|
|
|
|
|
|
ISO-8859-1 28591 (ISO-8859-1)
|
|
|
|
ISO-8859-2 28592 (ISO-8859-2)
|
|
|
|
ISO-8859-3 28593 (ISO-8859-3)
|
|
|
|
ISO-8859-4 28594 (ISO-8859-4)
|
|
|
|
ISO-8859-5 28595 (ISO-8859-5)
|
|
|
|
ISO-8859-6 28596 (ISO-8859-6)
|
|
|
|
ISO-8859-7 28597 (ISO-8859-7)
|
|
|
|
ISO-8859-8 28598 (ISO-8859-8)
|
|
|
|
ISO-8859-9 28599 (ISO-8859-9)
|
|
|
|
ISO-8859-10 - (not available)
|
|
|
|
ISO-8859-11 - (not available)
|
2009-07-20 10:32:39 +02:00
|
|
|
ISO-8859-13 28603 (ISO-8859-13)
|
2009-03-25 11:37:06 +01:00
|
|
|
ISO-8859-14 - (not available)
|
2009-07-20 10:32:39 +02:00
|
|
|
ISO-8859-15 28605 (ISO-8859-15)
|
2009-03-25 11:37:06 +01:00
|
|
|
ISO-8859-16 - (not available)
|
|
|
|
|
2009-08-22 17:01:03 +02:00
|
|
|
KOI8-R 20866 (KOI8-R Russian Cyrillic)
|
|
|
|
KOI8-U 21866 (KOI8-U Ukrainian Cyrillic)
|
2010-01-23 17:44:00 +01:00
|
|
|
SJIS - (not available, almost, but not exactly CP932)
|
2009-03-26 11:31:08 +01:00
|
|
|
GBK 936 (ANSI/OEM Simplified Chinese)
|
2009-03-25 11:37:06 +01:00
|
|
|
Big5 950 (ANSI/OEM Traditional Chinese)
|
2010-01-23 16:03:06 +01:00
|
|
|
EUCJP or euc-JP 20932 (EUC Japanese)
|
|
|
|
EUCKR or euc-KR 949 (EUC Korean)
|
|
|
|
TIS620 or TIS-620 874 (ANSI/OEM Thai)
|
2009-03-25 11:37:06 +01:00
|
|
|
|
2010-01-23 16:03:06 +01:00
|
|
|
UTF-8 or utf8 65001 (UTF-8)
|
2009-03-25 11:37:06 +01:00
|
|
|
</screen>
|
|
|
|
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
2000-02-17 20:38:33 +01:00
|
|
|
<sect1 id="setup-files"><title>Customizing bash</title>
|
|
|
|
|
|
|
|
<para>
|
2009-04-03 13:51:31 +02:00
|
|
|
To set up bash so that cut and paste work properly, click on the
|
2000-02-17 20:38:33 +01:00
|
|
|
"Properties" button of the window, then on the "Misc" tab. Make sure
|
2008-07-17 13:49:45 +02:00
|
|
|
that "QuickEdit mode" and "Insert mode" are checked. These settings
|
|
|
|
will be remembered next time you run bash from that shortcut. Similarly
|
|
|
|
you can set the working directory inside the "Program" tab. The entry
|
|
|
|
"%HOME%" is valid, but requires that you set <envar>HOME</envar> in
|
|
|
|
the Windows environment.
|
2000-02-17 20:38:33 +01:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Your home directory should contain three initialization files
|
|
|
|
that control the behavior of bash. They are
|
|
|
|
<filename>.profile</filename>, <filename>.bashrc</filename> and
|
2008-07-17 13:49:45 +02:00
|
|
|
<filename>.inputrc</filename>. The Cygwin base installation creates
|
|
|
|
stub files when you start bash for the first time.</para>
|
2000-02-17 20:38:33 +01:00
|
|
|
|
|
|
|
<para>
|
|
|
|
<filename>.profile</filename> (other names are also valid, see the bash man
|
|
|
|
page) contains bash commands. It is executed when bash is started as login
|
2004-01-20 18:20:34 +01:00
|
|
|
shell, e.g. from the command <command>bash --login</command>.
|
|
|
|
This is a useful place to define and
|
2000-02-17 20:38:33 +01:00
|
|
|
export environment variables and bash functions that will be used by bash
|
|
|
|
and the programs invoked by bash. It is a good place to redefine
|
|
|
|
<envar>PATH</envar> if needed. We recommend adding a ":." to the end of
|
|
|
|
<envar>PATH</envar> to also search the current working directory (contrary
|
|
|
|
to DOS, the local directory is not searched by default). Also to avoid
|
|
|
|
delays you should either <command>unset</command> <envar>MAILCHECK</envar>
|
|
|
|
or define <envar>MAILPATH</envar> to point to your existing mail inbox.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
<filename>.bashrc</filename> is similar to
|
|
|
|
<filename>.profile</filename> but is executed each time an interactive
|
|
|
|
bash shell is launched. It serves to define elements that are not
|
|
|
|
inherited through the environment, such as aliases. If you do not use
|
|
|
|
login shells, you may want to put the contents of
|
|
|
|
<filename>.profile</filename> as discussed above in this file
|
|
|
|
instead.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
<screen>
|
|
|
|
shopt -s nocaseglob
|
|
|
|
</screen>
|
|
|
|
will allow bash to glob filenames in a case-insensitive manner.
|
|
|
|
Note that <filename>.bashrc</filename> is not called automatically for login
|
|
|
|
shells. You can source it from <filename>.profile</filename>.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
<filename>.inputrc</filename> controls how programs using the readline
|
2004-01-20 18:20:34 +01:00
|
|
|
library (including <command>bash</command>) behave. It is loaded
|
|
|
|
automatically. For full details see the <literal>Function and Variable
|
|
|
|
Index</literal> section of the GNU <systemitem>readline</systemitem> manual.
|
2000-02-17 20:38:33 +01:00
|
|
|
Consider the following settings:
|
|
|
|
<screen>
|
2004-01-20 18:20:34 +01:00
|
|
|
# Ignore case while completing
|
|
|
|
set completion-ignore-case on
|
2000-02-17 20:38:33 +01:00
|
|
|
# Make Bash 8bit clean
|
|
|
|
set meta-flag on
|
|
|
|
set convert-meta off
|
|
|
|
set output-meta on
|
|
|
|
</screen>
|
2004-01-20 18:20:34 +01:00
|
|
|
The first command makes filename completion case insensitive, which can
|
|
|
|
be convenient in a Windows environment. The next three commands allow
|
|
|
|
<command>bash</command> to display 8-bit characters, useful for
|
|
|
|
languages with accented characters. Note that tools that do not use
|
|
|
|
<systemitem>readline</systemitem> for display, such as
|
|
|
|
<command>less</command> and <command>ls</command>, require additional
|
|
|
|
settings, which could be put in your <filename>.bashrc</filename>:
|
|
|
|
<screen>
|
|
|
|
alias less='/bin/less -r'
|
|
|
|
alias ls='/bin/ls -F --color=tty --show-control-chars'
|
|
|
|
</screen>
|
2000-02-17 20:38:33 +01:00
|
|
|
</para>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|