Environment Variables
Before starting bash, you may set some environment variables. A .bat
file is provided where the most important ones are set before bash in
launched. This is the safest way to launch bash initially. The .bat
file is installed in the root directory that you specified during setup
and pointed to in the Start Menu under the "Cygwin" option. You can
edit it this file your liking.
The CYGWIN variable is used to configure many global
settings for the Cygwin runtime system. Initially you can leave
CYGWIN unset or set it to tty (e.g.
to support job control with ^Z etc...) using a syntax like this in the
DOS shell, before launching bash.
C:\> set CYGWIN=tty notitle glob
Locale support is controlled by the LANG and
LC_xxx environment variables. You can set all of them
but Cygwin itself only honors the variables LC_ALL,
LC_CTYPE, and LANG, in this order, according
to the POSIX standard. The first one found rules. For a more detailed
description see .
The PATH environment variable is used by Cygwin
applications as a list of directories to search for executable files
to run. This environment variable is converted from Windows format
(e.g. C:\Windows\system32;C:\Windows) to UNIX format
(e.g., /cygdrive/c/Windows/system32:/cygdrive/c/Windows)
when a Cygwin process first starts.
Set it so that it contains at least the x:\cygwin\bin
directory where "x:\cygwin is the "root" of your
cygwin installation if you wish to use cygwin tools outside of bash.
This is usually done by the batch file you're starting your shell with.
The HOME environment variable is used by many programs to
determine the location of your home directory and we recommend that it be
defined. This environment variable is also converted from Windows format
when a Cygwin process first starts. It's usually set in the shell
profile scripts in the /etc directory.
The TERM environment variable specifies your terminal
type. It is automatically set to cygwin if you have
not set it to something else.
The LD_LIBRARY_PATH environment variable is used by
the Cygwin function dlopen () as a list of
directories to search for .dll files to load. This environment variable
is converted from Windows format to UNIX format when a Cygwin process
first starts. Most Cygwin applications do not make use of the
dlopen () call and do not need this variable.
Changing Cygwin's Maximum Memory
Cygwin's heap is extensible. However, it does start out at a fixed size
and attempts to extend it may run into memory which has been previously
allocated by Windows. In some cases, this problem can be solved by
adding an entry in the either the HKEY_LOCAL_MACHINE
(to change the limit for all users) or
HKEY_CURRENT_USER (for just the current user) section
of the registry.
Add the DWORD value heap_chunk_in_mb
and set it to the desired memory limit in decimal MB. It is preferred to do
this in Cygwin using the regtool program included in the
Cygwin package.
(For more information about regtool or the other Cygwin
utilities, see or use each the
--help option of each util.) You should always be careful
when using regtool since damaging your system registry can
result in an unusable system. This example sets memory limit to 1024 MB:
regtool -i set /HKLM/Software/Cygwin/heap_chunk_in_mb 1024
regtool -v list /HKLM/Software/Cygwin
Exit all running Cygwin processes and restart them. Memory can be allocated up
to the size of the system swap space minus any the size of any running
processes. The system swap should be at least as large as the physically
installed RAM and can be modified under the System category of the
Control Panel.
Here is a small program written by DJ Delorie that tests the
memory allocation limit on your system:
main()
{
unsigned int bit=0x40000000, sum=0;
char *x;
while (bit > 4096)
{
x = malloc(bit);
if (x)
sum += bit;
bit >>= 1;
}
printf("%08x bytes (%.1fMb)\n", sum, sum/1024.0/1024.0);
return 0;
}
You can compile this program using:
gcc max_memory.c -o max_memory.exe
Run the program and it will output the maximum amount of allocatable memory.
Internationalization
Overview
Internationalization support is controlled by the LANG and
LC_xxx environment variables. You can set all of them
but Cygwin itself only honors the variables LC_ALL,
LC_CTYPE, and LANG, in this order, according
to the POSIX standard. The content of these variables should follow the
POSIX standard for a locale specifier. The correct form of a locale
specifier is
language[[_TERRITORY][.charset][@modifier]]
"language" is a lowercase two character string per ISO 639-1,
"TERRITORY" is an uppercase two character string per ISO 3166, charset is
one of a list of supported character sets, and the modifier doesn't matter
here (though it might for some applications). If you're interested in the
exact description, you can find it in the online publication of the POSIX
manual pages on the homepage of the
Open Group.
Typical locale specifiers are
"de_CH" language = German, territory = Switzerland, default charset
"fr_FR.UTF-8" language = french, territory = France, charset = UTF-8
"ko_KR.eucKR" language = korean, territory = South Korea, charset = eucKR
And let's not forget the default locale called "C" or "POSIX"
which basically only supports plain ASCII code. If the aforementioned
environment variables are not set, or set to "C" or "POSIX", you get the
default ASCII-only behaviour.
Right now the language and territory content is not evaluated by Cygwin any
further. The only important part so far is the character set. How does that
work?
How to set the locale
The default locale is the "C" or "POSIX" locale. In this locale, basically
only ASCII characters are supported. Even if one of the aforementioned
environment variables are set to something else, it's the application's
responsibility to call the function setlocale,
typically like this
setlocale (LC_ALL, "");
to switch to another locale according to the settings of the
internationalization environment variables.
Assuming you set one of the aforementioned environment variables to some
valid POSIX locale value, other than "C" and "POSIX", and assuming you
call an application which calls setlocale as above.
Assuming further you're living in Japan. So you might want to use
the language code "ja" and the territory "JP", thus setting, say,
LANG to "ja_JP". You didn't set a character set, so
what will Cygwin use now? Easy! It will use the default Windows ANSI
codepage of your system, if it's supported by Cygwin. Hopefully Cygwin
supports all relevant default ANSI codepages...
For a list of supported character sets, see
You don't want to use the default Windows codepage as character set?
In that case you have to specify the charset explicitely. For instance,
assume you're from Italy and don't want to use the default Windows codepage
1252, but the more portable ISO-8859-15 character set. What you can do is
to set the LANG variable in the
C:\cygwin\Cygwin.bat file which is the batch file
to start a Cygwin session from the "Cygwin" desktop shortcut.
@echo off
C:
chdir C:\cygwin\bin
set LANG=it_IT.ISO-8859-15
bash --login -i
Most singlebyte or doublebyte charsets have a disadvantage. Windows
filesystems use the Unicode character set in the UTF-16 encoding to store filename information. Not all characters
from the Unicode character set are available in a singlebyte or doublebyte
charset. While Cygwin has a workaround to access files with unusual
characters (see ), a better
workaround is to use always the UTF-8 character set. UTF-8 is the only
multibyte character set which can represent every
Unicode character.
set LANG=es_MX.UTF-8
For a description of the Unicode standard, see the homepage of the
Unicode Consortium.
Potential Problems
You can set the above internationalization variables not only in
Cygwin.bat or in the Windows environment, but also
in your Cygwin shell on the fly, even switch to yet another character
set, and yet another. In bash for instance:
bash$ export LC_CTYPE="nl_BE.UTF-8"
However, here's a problem. At the start of the first Cygwin process
in a session, the Windows environment has to be converted from UTF-16 to
some singlebyte or multibyte charset. If the internationalization environment
variable hasn't been set before starting this process,
Cygwin has to make an educated guess which charset to use to convert
the environment itself. The only reproducible way to do that in the absence
of LC_ALL, LC_CTYPE, or LANG,
is to use the current Windows ANSI codepage.
As long as the environment only contains ASCII characters, this is
no problem. But if it does, and you're planning to use, say, UTF-8,
the environment will result in invalid characters in the UTF-8 charset.
This would be especially a problem in variables like PATH.
Per POSIX, the name of an environment variable should only
consist of valid ASCII characters, and only of uppercase letters, digits, and
the underscore for maximum portablilty.
And here's another problem when switching charsets on the fly.
Symbolic links. A symbolic link contains the filename of the target
file the symlink points to. When a symlink is created, the current
character set is used to store the target filename. If the target
filename contains non-ASCII characters and you switch to another
character set, the target filename of the symlink is now potentially
an invalid character sequence in the new character set. This behaviour
is not different from the behaviour in other Operating Systems. So,
if you suddenly can't access a symlink anymore, maybe it's because you
switched to another character set?
What does not work?
Except for LC_ALL, LC_CTYPE,
and LANG, all other LC_xxx environment variables,
LC_COLLATE, LC_MESSAGES,
LC_MONETARY, LC_NUMERIC,
and LC_TIME, are ignored right now. This means, while Cygwin
supports different character sets, it does not support
real localization so far. There's no support for locale-specific monetary
symbols, for a decimalpoint other than '.', no support for native time
formats, and no support for native language sorting orders.
However, internationalization is work in progress and we would be glad
for coding help in this area.
List of supported character sets
Last but not least, here's the list of currently supported character
sets. The left-hand expression is the name of the charset, as you would use
it in the internationalization environment variables as outlined above.
The right-hand side is the number of the equivalent Windows
codepage as well as the Windows name of the codepage. They are only
noted here for reference. Don't try to use the bare codepage number or
the Windows name of the codepage as charset in locale specifiers, unless
they happen to be identical with the left-hand side. Especially in case
oif the "CPxxx" style charsets, always use them with the trailing "CP".
This works:
set LC_ALL=en_US.CP437
This does not work:
set LC_ALL=en_US.437
You can find a full list of Windows codepages on the Microsoft MSDN page
Code Page Identifiers.
Charset Codepage
CP437 437 (OEM United States)
CP720 720 (DOS Arabic)
CP737 737 (OEM Greek)
CP775 775 (OEM Baltic)
CP850 850 (OEM Latin 1, Western European)
CP852 852 (OEM Latin 2, Central European)
CP855 855 (OEM Cyrillic)
CP857 857 (OEM Turkish)
CP858 858 (OEM Latin 1 + Euro Symbol)
CP862 862 (OEM Hebrew)
CP866 866 (OEM Russian)
CP874 874 (ANSI/OEM Thai)
CP1125 1125 (OEM Ukraine)
CP1250 1250 (ANSI Central European)
CP1251 1251 (ANSI Cyrillic)
CP1252 1252 (ANSI Latin 1, Western European)
CP1253 1253 (ANSI Greek)
CP1254 1254 (ANSI Turkish)
CP1255 1255 (ANSI Hebrew)
CP1256 1256 (ANSI Arabic)
CP1257 1257 (ANSI Baltic)
CP1258 1258 (ANSI/OEM Vietnamese)
ISO-8859-1 28591 (ISO-8859-1)
ISO-8859-2 28592 (ISO-8859-2)
ISO-8859-3 28593 (ISO-8859-3)
ISO-8859-4 28594 (ISO-8859-4)
ISO-8859-5 28595 (ISO-8859-5)
ISO-8859-6 28596 (ISO-8859-6)
ISO-8859-7 28597 (ISO-8859-7)
ISO-8859-8 28598 (ISO-8859-8)
ISO-8859-9 28599 (ISO-8859-9)
ISO-8859-10 - (not available)
ISO-8859-11 - (not available)
ISO-8859-13 28563 (ISO-8859-13)
ISO-8859-14 - (not available)
ISO-8859-15 28565 (ISO-8859-15)
ISO-8859-16 - (not available)
SJIS 932 (ANSI/OEM Japanese)
GBK 936 (ANSI/OEM Simplified Chinese)
Big5 950 (ANSI/OEM Traditional Chinese)
JIS 50220 (ISO2022 Japanese w/o halfwidth Katakana)
eucJP 51932 (EUC Japanese)
eucKR 51949 (EUC Korean)
UTF-8 65001 (UTF-8)
Customizing bash
To set bash up so that cut and paste work properly, click on the
"Properties" button of the window, then on the "Misc" tab. Make sure
that "QuickEdit mode" and "Insert mode" are checked. These settings
will be remembered next time you run bash from that shortcut. Similarly
you can set the working directory inside the "Program" tab. The entry
"%HOME%" is valid, but requires that you set HOME in
the Windows environment.
Your home directory should contain three initialization files
that control the behavior of bash. They are
.profile, .bashrc and
.inputrc. The Cygwin base installation creates
stub files when you start bash for the first time.
.profile (other names are also valid, see the bash man
page) contains bash commands. It is executed when bash is started as login
shell, e.g. from the command bash --login.
This is a useful place to define and
export environment variables and bash functions that will be used by bash
and the programs invoked by bash. It is a good place to redefine
PATH if needed. We recommend adding a ":." to the end of
PATH to also search the current working directory (contrary
to DOS, the local directory is not searched by default). Also to avoid
delays you should either unset MAILCHECK
or define MAILPATH to point to your existing mail inbox.
.bashrc is similar to
.profile but is executed each time an interactive
bash shell is launched. It serves to define elements that are not
inherited through the environment, such as aliases. If you do not use
login shells, you may want to put the contents of
.profile as discussed above in this file
instead.
shopt -s nocaseglob
will allow bash to glob filenames in a case-insensitive manner.
Note that .bashrc is not called automatically for login
shells. You can source it from .profile.
.inputrc controls how programs using the readline
library (including bash) behave. It is loaded
automatically. For full details see the Function and Variable
Index section of the GNU readline manual.
Consider the following settings:
# Ignore case while completing
set completion-ignore-case on
# Make Bash 8bit clean
set meta-flag on
set convert-meta off
set output-meta on
The first command makes filename completion case insensitive, which can
be convenient in a Windows environment. The next three commands allow
bash to display 8-bit characters, useful for
languages with accented characters. Note that tools that do not use
readline for display, such as
less and ls, require additional
settings, which could be put in your .bashrc:
alias less='/bin/less -r'
alias ls='/bin/ls -F --color=tty --show-control-chars'