* new-features.sgml (ov-new1.7.2): Add chapter for news in 1.7.2.

* setup2.sgml (setup-locale-ov): Describe how valid locales are determined by Windows locale support. Change description for modifiers in locale environment variables. (setup-locale-how): Describe new charset behaviour. Mention new getlocale tool to fetch valid locale information from Windows. (setup-locale-missing): Drop now implemented LC_foo options. Explain missing LC_MESSAGES in more detail.
2010-01-22 22:32:42 +00:00
parent be822de2a1
commit ff0056d45e
3 changed files with 115 additions and 46 deletions
--- a/winsup/doc/ChangeLog
+++ b/winsup/doc/ChangeLog
@ -1,3 +1,14 @@
 2010-01-22  Corinna Vinschen  <corinna@vinschen.de>
 	* new-features.sgml (ov-new1.7.2): Add chapter for news in 1.7.2.
 	* setup2.sgml (setup-locale-ov): Describe how valid locales are
 	determined by Windows locale support.  Change description for modifiers
 	in locale environment variables.
 	(setup-locale-how): Describe new charset behaviour.  Mention new
 	getlocale tool to fetch valid locale information from Windows.
 	(setup-locale-missing): Drop now implemented LC_foo options.
 	Explain missing LC_MESSAGES in more detail.
 2010-01-17  Corinna Vinschen  <corinna@vinschen.de>
 	* setup2.sgml (setup-locale): Mention three character codes per
--- a/winsup/doc/new-features.sgml
+++ b/winsup/doc/new-features.sgml
@ -1,5 +1,43 @@
 <sect1 id="ov-new1.7"><title>What's new and what changed in Cygwin 1.7</title>
 <sect2 id="ov-new1.7.2"><title>What's new and what changed from 1.7.1 to 1.7.2</title>
 <screen>
 - Localization support has been much improved.
  - Cygwin now handles locales using the underlying Windows locale support.
    The locale must exists in Windows to be recognized.
  - New tool "getlocale" to fetch valid locale values from Windows.
  - Default charset for locales without explicit charset is now choosen
    from a list of Linx-compatible charsets.  For instance en_US -> ISO-8859-1,
    ja_JP -> EUC-JP.
  - Support for the @euro locale modifier to switch to the ISO-8859-15
    charset.
  - Default charset in the "C" or "POSIX" locale has been changed back from
    UTF-8 to ASCII, to circumvent problems with applications expecting a
    singlebyte charset in the "C"/"POSIX" locale.  Still use UTF-8 internally
    for filename conversion in this case.
  - LC_COLLATE, LC_MONETARY, LC_NUMERIC, and LC_TIME localization is enabled
    via Windows locale support.
  - New strfmon(3) call.
 - Support open(2) flags O_CLOEXEC and O_TTY_INIT flags.  Support
  fcntl flag F_DUPFD_CLOEXEC.  Support socket flags SOCK_CLOEXEC and
  SOCK_NONBLOCK).
 - Add new Linux-compatible API calls accept4(2), dup3(2), and pipe2(2).
 - fnmatch(3) call is now multibyte-aware.
 </screen>
 </sect2>
 <sect2 id="ov-new1.7-os"><title>OS related changes</title>
 <screen>
--- a/winsup/doc/setup2.sgml
+++ b/winsup/doc/setup2.sgml
@ -255,35 +255,41 @@ charset.  The Cygwin DLL itself, however, will nevertheless use the locale
 set in the environment (or the "C.UTF-8" default locale) for converting
 filenames etc.</para>
-<para>When the locale set in the environment specifies an ASCII charset,
+<para>When the locale in the environment specifies an ASCII charset,
 for example "C" or "en_US.ASCII", Cygwin will still use UTF-8
 under the hood to translate filenames.  This allows for easier
 interoperability with applications running in the default "C.UTF-8" locale.
 </para>
 <para>
-Right now the language and territory, as well as the modifier, are not
+Starting with Cygwin 1.7.2, the language and territory are used to
-important to Cygwin, except to fix a single problem.  There's a class of
+fetch locale-dependent information from Windows.  If the language and
-characters in the Unicode character set, called the "CJK Ambiguous Width
+territory are not known to Windows, the <function>setlocale</function>
-Character set".  For these characters the width returned by the
+function fails.</para>
 wcwidth/wcswidth function is usually 1.  This is often a problem in
 East-Asian languages, which historically use character sets in which
 these characters have a width of 2.  Kind of explains why they are
 called "ambiguous"...</para>
-<para>
+<para>The modifier is used for two cases.</para>
 The problem has been fixed like this.  wcwidth/wcswidth usually
 return 1 as the width of these characters.  However, if the language is
 specifed as "ja" (Japanese), "ko" (Korean), or "zh" (Chinese), wcwidth
 returns 2 for these characters.  Unfortunately this isn't correct in
 all circumstances, so the user can specify the modifier "@cjknarrow",
 which modifies the behaviour of wcwidth/wcswidth to return 1 for the
 ambiguous width characters to return 1 even in those languages.</para>
-<para>
+<itemizedlist mark="bullet">
 Other than that, the only important part so far is the character set.
-How does that work?</para>
+<listitem><para>For languages which default to one of the ISO-8859 character
 sets, the modifier "@euro" can be added to enforce usage of the ISO-8859-15
 character set, which includes a character for the "Euro" currency sign .</para>
 </listitem>
 <listitem><para>There's a class of characters in the Unicode character set,
 called the "CJK Ambiguous Width Character set".  For these characters the width
 returned by the wcwidth/wcswidth function is usually 1.  This is often a
 problem in East-Asian languages, which historically use character sets in
 which these characters have a width of 2.  By default, the wcwidth/wcswidth
 functions return 1 as the width of these characters, except if the language is
 specifed as "ja" (Japanese), "ko" (Korean), or "zh" (Chinese).  In these
 languages wcwidth and wcswidth return 2 for these characters.  This is not
 correct in all circumstances, so the user of one of these languages can specify
 the modifier "@cjknarrow", which modifies the behaviour of wcwidth/wcswidth to
 return 1 for the ambiguous width characters.</para>
 </listitem>
 </itemizedlist>
 </sect2>
@ -296,32 +302,47 @@ Assume that you've set one of the aforementioned environment variables to some
 valid POSIX locale value, other than "C" and "POSIX".  Assume further that
 you're living in Japan.  You might want to use the language code "ja" and the
 territory "JP", thus setting, say, <envar>LANG</envar> to "ja_JP".  You didn't
-set a character set, so what will Cygwin use now?  Easy!  It will use the
+set a character set, so what will Cygwin use now?  Starting with Cygwin 1.7.2,
-default Windows ANSI codepage of your system, if it's supported by Cygwin.
+the default character set is determined by the default Windows ANSI codepage
-Hopefully Cygwin supports all relevant default ANSI codepages...</para>
+for this language and territory.  Cygwin uses a character set which is the
 typical Unix-equivalent to the Windows ANSI codepage.  For instance:</para>
-<note><para>For a list of supported character sets, see
+<screen>
-<xref linkend="setup-locale-charsetlist"></xref>
+  "en_US"		ISO-8859-1
-</para></note>
+  "el_GR"		ISO-8859-7
  "pl_PL"		ISO-8859-2
  "pl_PL@euro"		ISO-8859-15
  "ja_JP"		EUCJP
  "ko_KR"		EUCKR
  "te_IN"		UTF-8
 </screen>
 </listitem>
 <listitem><para>
-You don't want to use the default Windows codepage as character set?
+You don't want to use the default character set?  In that case you have to
-In that case you have to specify the charset explicitly.  For instance,
+specify the charset explicitly.  For instance, assume you're from Japan and
-assume you're from Italy and don't want to use the Italian default Windows
+don't want to use the japanese default charset EUC-JP, but the Windows
-ANSI codepage 1252, but the more portable ISO-8859-15 character set.
+default charset SJIS.  What you can do, for instance, is to set the
-What you can do, for instance, is to set the <envar>LANG</envar> variable
+<envar>LANG</envar> variable in the <filename>C:\cygwin\Cygwin.bat</filename>
-in the <filename>C:\cygwin\Cygwin.bat</filename> file which is the batch file
+file which is the batch file to start a Cygwin session from the "Cygwin"
-to start a Cygwin session from the "Cygwin" desktop shortcut.</para>
+desktop shortcut.</para>
 <screen>
  @echo off
  C:
  chdir C:\cygwin\bin
-  set LANG=it_IT.ISO-8859-15
+  set LANG=ja_JP.SJIS
  bash --login -i
 </screen>
 <note><para>For a list of locales supported by your Windows machine, use the new
 ><command>getlocale -a</command> command, which is part of the Cygwin package.
 For a description see <xref linkend="getlocale"></xref></para></note>
 <note><para>For a list of supported character sets, see
 <xref linkend="setup-locale-charsetlist"></xref>
 </para></note>
 </listitem>
 <listitem><para>
@ -435,19 +456,18 @@ entries are useful to cygwin: 932/SJIS, 936/GBK, 949/EUC-KR, 950/Big5,
 <sect2 id="setup-locale-missing"><title>What does not work?</title>
 <para>
-Except for <envar>LC_ALL</envar>, <envar>LC_CTYPE</envar>,
+The environment variable and locale setting <envar>LC_MESSAGES</envar>
-and <envar>LANG</envar>, all other LC_xxx environment variables,
+is ignored right now.  There's no known WIndows function to fetch the
-<envar>LC_COLLATE</envar>, <envar>LC_MESSAGES</envar>,
+regular expressions to recognize user input with the meaning of "yes"
-<envar>LC_MONETARY</envar>, <envar>LC_NUMERIC</envar>,
+or "no" from some Windows function.  Therefore,
-and <envar>LC_TIME</envar>, are ignored right now.  This means, while Cygwin
+<function>nl_langinfo(YESEXPR)</function> and
-supports different character sets, it does <emphasis>not</emphasis> support
+<function>nl_langinfo(NOEXPR)</function> always return a string
-real localization so far.  There's no support for locale-specific monetary
+suitable only for the English language.</para>
 symbols, for a decimalpoint other than '.', no support for native time
 formats, and no support for native language sorting orders.
 </para>
-<para>Cygwin's internationalization support is work in progress and we would
+<para>If somebody knows a simple solution to this problem, feel free
-be glad for coding help in this area.</para>
+to notify us on the 
 <ulink url="mailto:cygwin@cygin.com">Cygwin mailing list</ulink>.
 </para>
 </sect2>