* new-features.sgml (ov-new1.7.2): Add chapter for news in 1.7.2.

* setup2.sgml (setup-locale-ov): Describe how valid locales are
	determined by Windows locale support.  Change description for modifiers
	in locale environment variables.
	(setup-locale-how): Describe new charset behaviour.  Mention new
	getlocale tool to fetch valid locale information from Windows.
	(setup-locale-missing): Drop now implemented LC_foo options.
	Explain missing LC_MESSAGES in more detail.
This commit is contained in:
Corinna Vinschen 2010-01-22 22:32:42 +00:00
parent be822de2a1
commit ff0056d45e
3 changed files with 115 additions and 46 deletions

View File

@ -1,3 +1,14 @@
2010-01-22 Corinna Vinschen <corinna@vinschen.de>
* new-features.sgml (ov-new1.7.2): Add chapter for news in 1.7.2.
* setup2.sgml (setup-locale-ov): Describe how valid locales are
determined by Windows locale support. Change description for modifiers
in locale environment variables.
(setup-locale-how): Describe new charset behaviour. Mention new
getlocale tool to fetch valid locale information from Windows.
(setup-locale-missing): Drop now implemented LC_foo options.
Explain missing LC_MESSAGES in more detail.
2010-01-17 Corinna Vinschen <corinna@vinschen.de> 2010-01-17 Corinna Vinschen <corinna@vinschen.de>
* setup2.sgml (setup-locale): Mention three character codes per * setup2.sgml (setup-locale): Mention three character codes per

View File

@ -1,5 +1,43 @@
<sect1 id="ov-new1.7"><title>What's new and what changed in Cygwin 1.7</title> <sect1 id="ov-new1.7"><title>What's new and what changed in Cygwin 1.7</title>
<sect2 id="ov-new1.7.2"><title>What's new and what changed from 1.7.1 to 1.7.2</title>
<screen>
- Localization support has been much improved.
- Cygwin now handles locales using the underlying Windows locale support.
The locale must exists in Windows to be recognized.
- New tool "getlocale" to fetch valid locale values from Windows.
- Default charset for locales without explicit charset is now choosen
from a list of Linx-compatible charsets. For instance en_US -> ISO-8859-1,
ja_JP -> EUC-JP.
- Support for the @euro locale modifier to switch to the ISO-8859-15
charset.
- Default charset in the "C" or "POSIX" locale has been changed back from
UTF-8 to ASCII, to circumvent problems with applications expecting a
singlebyte charset in the "C"/"POSIX" locale. Still use UTF-8 internally
for filename conversion in this case.
- LC_COLLATE, LC_MONETARY, LC_NUMERIC, and LC_TIME localization is enabled
via Windows locale support.
- New strfmon(3) call.
- Support open(2) flags O_CLOEXEC and O_TTY_INIT flags. Support
fcntl flag F_DUPFD_CLOEXEC. Support socket flags SOCK_CLOEXEC and
SOCK_NONBLOCK).
- Add new Linux-compatible API calls accept4(2), dup3(2), and pipe2(2).
- fnmatch(3) call is now multibyte-aware.
</screen>
</sect2>
<sect2 id="ov-new1.7-os"><title>OS related changes</title> <sect2 id="ov-new1.7-os"><title>OS related changes</title>
<screen> <screen>

View File

@ -255,35 +255,41 @@ charset. The Cygwin DLL itself, however, will nevertheless use the locale
set in the environment (or the "C.UTF-8" default locale) for converting set in the environment (or the "C.UTF-8" default locale) for converting
filenames etc.</para> filenames etc.</para>
<para>When the locale set in the environment specifies an ASCII charset, <para>When the locale in the environment specifies an ASCII charset,
for example "C" or "en_US.ASCII", Cygwin will still use UTF-8 for example "C" or "en_US.ASCII", Cygwin will still use UTF-8
under the hood to translate filenames. This allows for easier under the hood to translate filenames. This allows for easier
interoperability with applications running in the default "C.UTF-8" locale. interoperability with applications running in the default "C.UTF-8" locale.
</para> </para>
<para> <para>
Right now the language and territory, as well as the modifier, are not Starting with Cygwin 1.7.2, the language and territory are used to
important to Cygwin, except to fix a single problem. There's a class of fetch locale-dependent information from Windows. If the language and
characters in the Unicode character set, called the "CJK Ambiguous Width territory are not known to Windows, the <function>setlocale</function>
Character set". For these characters the width returned by the function fails.</para>
wcwidth/wcswidth function is usually 1. This is often a problem in
East-Asian languages, which historically use character sets in which
these characters have a width of 2. Kind of explains why they are
called "ambiguous"...</para>
<para> <para>The modifier is used for two cases.</para>
The problem has been fixed like this. wcwidth/wcswidth usually
return 1 as the width of these characters. However, if the language is
specifed as "ja" (Japanese), "ko" (Korean), or "zh" (Chinese), wcwidth
returns 2 for these characters. Unfortunately this isn't correct in
all circumstances, so the user can specify the modifier "@cjknarrow",
which modifies the behaviour of wcwidth/wcswidth to return 1 for the
ambiguous width characters to return 1 even in those languages.</para>
<para> <itemizedlist mark="bullet">
Other than that, the only important part so far is the character set.
How does that work?</para> <listitem><para>For languages which default to one of the ISO-8859 character
sets, the modifier "@euro" can be added to enforce usage of the ISO-8859-15
character set, which includes a character for the "Euro" currency sign .</para>
</listitem>
<listitem><para>There's a class of characters in the Unicode character set,
called the "CJK Ambiguous Width Character set". For these characters the width
returned by the wcwidth/wcswidth function is usually 1. This is often a
problem in East-Asian languages, which historically use character sets in
which these characters have a width of 2. By default, the wcwidth/wcswidth
functions return 1 as the width of these characters, except if the language is
specifed as "ja" (Japanese), "ko" (Korean), or "zh" (Chinese). In these
languages wcwidth and wcswidth return 2 for these characters. This is not
correct in all circumstances, so the user of one of these languages can specify
the modifier "@cjknarrow", which modifies the behaviour of wcwidth/wcswidth to
return 1 for the ambiguous width characters.</para>
</listitem>
</itemizedlist>
</sect2> </sect2>
@ -296,32 +302,47 @@ Assume that you've set one of the aforementioned environment variables to some
valid POSIX locale value, other than "C" and "POSIX". Assume further that valid POSIX locale value, other than "C" and "POSIX". Assume further that
you're living in Japan. You might want to use the language code "ja" and the you're living in Japan. You might want to use the language code "ja" and the
territory "JP", thus setting, say, <envar>LANG</envar> to "ja_JP". You didn't territory "JP", thus setting, say, <envar>LANG</envar> to "ja_JP". You didn't
set a character set, so what will Cygwin use now? Easy! It will use the set a character set, so what will Cygwin use now? Starting with Cygwin 1.7.2,
default Windows ANSI codepage of your system, if it's supported by Cygwin. the default character set is determined by the default Windows ANSI codepage
Hopefully Cygwin supports all relevant default ANSI codepages...</para> for this language and territory. Cygwin uses a character set which is the
typical Unix-equivalent to the Windows ANSI codepage. For instance:</para>
<note><para>For a list of supported character sets, see <screen>
<xref linkend="setup-locale-charsetlist"></xref> "en_US" ISO-8859-1
</para></note> "el_GR" ISO-8859-7
"pl_PL" ISO-8859-2
"pl_PL@euro" ISO-8859-15
"ja_JP" EUCJP
"ko_KR" EUCKR
"te_IN" UTF-8
</screen>
</listitem> </listitem>
<listitem><para> <listitem><para>
You don't want to use the default Windows codepage as character set? You don't want to use the default character set? In that case you have to
In that case you have to specify the charset explicitly. For instance, specify the charset explicitly. For instance, assume you're from Japan and
assume you're from Italy and don't want to use the Italian default Windows don't want to use the japanese default charset EUC-JP, but the Windows
ANSI codepage 1252, but the more portable ISO-8859-15 character set. default charset SJIS. What you can do, for instance, is to set the
What you can do, for instance, is to set the <envar>LANG</envar> variable <envar>LANG</envar> variable in the <filename>C:\cygwin\Cygwin.bat</filename>
in the <filename>C:\cygwin\Cygwin.bat</filename> file which is the batch file file which is the batch file to start a Cygwin session from the "Cygwin"
to start a Cygwin session from the "Cygwin" desktop shortcut.</para> desktop shortcut.</para>
<screen> <screen>
@echo off @echo off
C: C:
chdir C:\cygwin\bin chdir C:\cygwin\bin
set LANG=it_IT.ISO-8859-15 set LANG=ja_JP.SJIS
bash --login -i bash --login -i
</screen> </screen>
<note><para>For a list of locales supported by your Windows machine, use the new
><command>getlocale -a</command> command, which is part of the Cygwin package.
For a description see <xref linkend="getlocale"></xref></para></note>
<note><para>For a list of supported character sets, see
<xref linkend="setup-locale-charsetlist"></xref>
</para></note>
</listitem> </listitem>
<listitem><para> <listitem><para>
@ -435,19 +456,18 @@ entries are useful to cygwin: 932/SJIS, 936/GBK, 949/EUC-KR, 950/Big5,
<sect2 id="setup-locale-missing"><title>What does not work?</title> <sect2 id="setup-locale-missing"><title>What does not work?</title>
<para> <para>
Except for <envar>LC_ALL</envar>, <envar>LC_CTYPE</envar>, The environment variable and locale setting <envar>LC_MESSAGES</envar>
and <envar>LANG</envar>, all other LC_xxx environment variables, is ignored right now. There's no known WIndows function to fetch the
<envar>LC_COLLATE</envar>, <envar>LC_MESSAGES</envar>, regular expressions to recognize user input with the meaning of "yes"
<envar>LC_MONETARY</envar>, <envar>LC_NUMERIC</envar>, or "no" from some Windows function. Therefore,
and <envar>LC_TIME</envar>, are ignored right now. This means, while Cygwin <function>nl_langinfo(YESEXPR)</function> and
supports different character sets, it does <emphasis>not</emphasis> support <function>nl_langinfo(NOEXPR)</function> always return a string
real localization so far. There's no support for locale-specific monetary suitable only for the English language.</para>
symbols, for a decimalpoint other than '.', no support for native time
formats, and no support for native language sorting orders.
</para>
<para>Cygwin's internationalization support is work in progress and we would <para>If somebody knows a simple solution to this problem, feel free
be glad for coding help in this area.</para> to notify us on the
<ulink url="mailto:cygwin@cygin.com">Cygwin mailing list</ulink>.
</para>
</sect2> </sect2>