From 1c6743b74d5dc40545daa4b18577ae304340a446 Mon Sep 17 00:00:00 2001 From: Corinna Vinschen Date: Tue, 24 Mar 2009 12:37:02 +0000 Subject: [PATCH] * cygwinenv.sgml: Move "codepage:xxx" to the removed options section. Change text accordingly. * new-features.sgml: Try to explain new way to define character sets. --- winsup/doc/ChangeLog | 6 ++++++ winsup/doc/cygwinenv.sgml | 33 ++++++++++----------------------- winsup/doc/new-features.sgml | 28 +++++++++++++++++++++++----- 3 files changed, 39 insertions(+), 28 deletions(-) diff --git a/winsup/doc/ChangeLog b/winsup/doc/ChangeLog index 281c5107d..4ead25882 100644 --- a/winsup/doc/ChangeLog +++ b/winsup/doc/ChangeLog @@ -1,3 +1,9 @@ +2009-03-24 Corinna Vinschen + + * cygwinenv.sgml: Move "codepage:xxx" to the removed options section. + Change text accordingly. + * new-features.sgml: Try to explain new way to define character sets. + 2009-03-18 Corinna Vinschen * cygwin-ug-net.in.sgml: Update date. diff --git a/winsup/doc/cygwinenv.sgml b/winsup/doc/cygwinenv.sgml index 48cb5a6c8..c7f1e98ff 100644 --- a/winsup/doc/cygwinenv.sgml +++ b/winsup/doc/cygwinenv.sgml @@ -11,29 +11,6 @@ by prefixing with no. - -codepage:[ansi|oem|utf8] - This option controls -which single- or multibyte character set is used for file and console -operations. Windows is using UTF-16 characters internally and this -option specifies how 8-byte character sets are converted to UTF-16 and -vice versa. The default setting is ansi which means, -conversion is based on the current ANSI codepage, typically 1252 in -many Western language versions of Windows. The name originates from the -ANSI Latin1 (ISO 8859-1) standard, used in Windows 1.0, though the -character sets have since diverged from any standard. The second -setting selects an older, DOS-based character set, containing various -line drawing and special characters. It is called oem -since it was originally encoded in the firmware of IBM PCs by original -equipment manufacturers (OEMs). -If you find that some characters (especially non-US or 'graphical' ones) -do not display correctly in Cygwin, you can use this option to select an -appropriate codepage. Finally, utf8 treats all file names -and console characters as UTF-8 chars. Please note that, for correct -operation, you have to set the environment variable LANG or LC_ALL to -somthing like "en_US.UTF-8", otherwise many applications will not be -able to recognize UTF-8 strings correctly. - - (no)dosfilewarning - If set, Cygwin will warn the first time a user uses an "MS-DOS" style path name rather than a POSIX-style @@ -194,6 +171,16 @@ information, read the documentation in and . + +codepage:[ansi|oem] - This option controled +which character set is used for file and console operations. Since Cygwin +is now doing all character conversion by itself, depending on the +application call to the setlocale() function, and in +turn by the setting of the environment variables $LANG, +$LC_ALL, or $LC_CTYPE, this setting +got useless. + + (no)ntea - This option has been removed since it only fakes security which is considered dangerous and useless. It also diff --git a/winsup/doc/new-features.sgml b/winsup/doc/new-features.sgml index 57bac4f44..4f8db0f02 100644 --- a/winsup/doc/new-features.sgml +++ b/winsup/doc/new-features.sgml @@ -17,13 +17,18 @@ are only local to the current session and disappear when the last Cygwin process in the session exits. +- If a filename cannot be represented in the current character set, + the character will be converted to a sequence Ctrl-N + UTF-8 representation + of the character. This allows to access all files, even those not + having a valid representation of their filename in the current character + set (codepage). To have always a valid string, use the UTF-8 charset + by setting the environment variable $LANG, $LC_ALL, or $LC_CTYPE to a + valid POSIX value, for instance in Cygwin.bat like this: + + set LC_CTYPE=en_US.UTF-8 + - PATH_MAX is now 4096. Internally, path names can be as long as the underlying OS can handle (32K). - -- UTF-8 filenames are supported now. So far, this requires to set - the environment variable CYGWIN to contain "codepage:utf8". but this - will likely disappear at one point. The setting of $LANG or $LC_CTYPE - will be used instead. - struct dirent now supports d_type, filled out with DT_REG or DT_DIR. All other file types return as DT_UNKNOWN for performance reasons. @@ -176,6 +181,19 @@ Other POSIX related changes +- A lot of character sets are supported now via a call to setlocale(). + The setting of the environment variables $LANG, $LC_ALL or $LC_CTYPE will + be used. For instance, setting $LANG to "de_DE.ISO-8859-15" before + starting a Cygwin session will use the ISO-8859-15 character set in + the entire session. UTF-8 is supported as well, as in "en_US.UTF-8". + + The full list of supported character sets: "ASCII", "ISO-8859-x" with x + in 1-16, except 12, "UTF-8", Windows codepages "CPxxx", with xxx in + (437, 720, 737, 775, 850, 852, 855, 857, 858, 862, 866, 874, 1125, + 1250, 1251, 1252, 1253, 1254, 1255, 1256, 1257, 1258), "JIS", "SJIS", + "eucJP", "Big5". The leading language and territory part (en_US) is not + used by Cygwin yet, but is required for POSIX compatibility. + - Allow multiple concurrent read locks per thread for pthread_rwlock_t. - Implement pthread_kill(thread, 0) as per POSIX.