- Remove charset parameter from low level __foo_wctomb/__foo_mbtowc calls.
- Instead, create array of function for ISO and Windows codepages to point
to function which does not require to evaluate the charset string on
each call. Create matching helper functions. I.e., __iso_wctomb,
__iso_mbtowc, __cp_wctomb and __cp_mbtowc are functions returning the
right function pointer now.
- Create __WCTOMB/__MBTOWC macros utilizing per-reent locale and replace
calls to __wctomb/__mbtowc with calls to __WCTOMB/__MBTOWC.
- Drop global __wctomb/__mbtowc vars.
- Utilize aforementioned changes in Cygwin to get rid of charset in other,
calling functions and simplify the code.
- In Cygwin restrict global cygheap locale info to the job performed
by internal_setlocale. Use UTF-8 instead of ASCII on the fly in
internal conversion functions.
- In Cygwin dll_entry, make sure to initialize a TLS area with a NULL
_REENT->_locale pointer. Add comment to explain why.
Signed-off by: Corinna Vinschen <corinna@vinschen.de>
Bump GPLv2+ to GPLv3+ for some files, clarify BSD 2-clause.
Everything else stays under GPLv3+.
New Linking Exception exempts resulting executables from LGPLv3 section 4.
Add CONTRIBUTORS file to keep track of licensing.
Remove 'Copyright Red Hat Inc' comments.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
* dcrt0.cc (dll_crt0_1), environ.cc (environ_init, getwinenveq,
build_env), strfuncs.cc (sys_wcstombs, sys_wcstombs_alloc),
wchar.c (sys_wcstombs, sys_wcstombs_alloc): avoid mis-conversions
of text that does not, actually, refer to a path or file name
Detailed explanation:
Our WCS -> UTF conversion handles the private Unicode page specially
to allow for otherwise invalid file names. However, this handling makes
no sense for command-lines, nor environment variables, which we would
rather convert verbatim.
As a stop-gap solution, let's just introduce a version of the
sys_wcstombs() function that specifically excludes that file name
conversion magic.
The proper solution is to change sys_wcstombs() to assume that it is not
a path that wants to be converted, and introduce sys_wcstombs_path()
that does, but that is a bigger task which we leave for another patch.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
(mainly in fhandler*) start fixing gcc 4.7.2 mismatch between regparm
definitions and declarations.
* gendef: Define some functions to take @ declaration to accommodate _regN
defines which use __stdcall.
* gentls_offsets: Define __regN macros as empty.
* autoload.cc (wsock_init): Remove unneeded regparm attribute.
* winsup.h (__reg1): Define.
(__reg2): Define.
(__reg3): Define.
* advapi32.cc (DuplicateTokenEx): Coerce some initializers to avoid warnings
from gcc 4.7.2.
* exceptions.cc (status_info): Declare struct to use NTSTATUS.
(cygwin_exception::dump_exception): Coerce e->ExceptionCode to NTSTATUS.
* fhandler_clipboard.cc (cygnativeformat): Redefine as UINT to avoid gcc 4.7.2
warnings.
(fhandler_dev_clipboard::read): Ditto.
* Makefile.in (install-headers): Remove extra command to install
regex.h.
(uninstall-headers): Remove extra command to uninstall regex.h.
* nlsfuncs.cc (collate_lcid): Make externally available to allow
access to collation internals from regex functions.
(collate_charset): Ditto.
* wchar.h: Add __cplusplus guards to make C-clean.
* include/regex.h: New file, replacing regex/regex.h. Remove UCB
advertising clause.
* regex/COPYRIGHT: Accommodate BSD license. Remove UCB advertising
clause.
* regex/cclass.h: Remove.
* regex/cname.h: New file from FreeBSD.
* regex/engine.c: Ditto.
(NONCHAR): Tweak for Cygwin.
* regex/engine.ih: Remove.
* regex/mkh: Remove.
* regex/regcomp.c: New file from FreeBSD. Tweak slightly for Cygwin.
Import required collate internals from nlsfunc.cc.
(p_ere_exp): Add GNU-specific \< and \> handling for word boundaries.
(p_simp_re): Ditto.
(__collate_range_cmp): Define.
(p_b_term): Use Cygwin-specific collate internals.
(findmust): Ditto.
* regex/regcomp.ih: Remove.
* regex/regerror.c: New file from FreeBSD. Fix a few compiler warnings.
* regex/regerror.ih: Remove.
* regex/regex.7: New file from FreeBSD. Remove UCB advertising clause.
* regex/regex.h: Remove. Replaced by include/regex.h.
* regex/regexec.c: New file from FreeBSD. Fix a few compiler warnings.
* regex/regfree.c: New file from FreeBSD.
* regex/tests: Remove.
* regex/utils.h: New file from FreeBSD.
* autoload.cc (LocaleNameToLCID): Define.
* cygwin.din (strfmon): Export.
* nlsfuncs.cc: New file. Define a lot of internal functions called
from setlocale.
(wcscoll): Implement locale-aware here, using CompareStringW function.
(strcoll): Ditto.
(wcsxfrm): Implement locale-aware here, usingLCMapStringW function.
(strxfrm): Ditto.
(__set_charset_from_locale): Replace __set_charset_from_codepage.
Return Linux-compatible charset.
* strfuncs.cc (__set_charset_from_codepage): Remove.
* wchar.h (__set_charset_from_codepage): Drop definition.
* wincap.h (wincaps::has_localenames): New element.
* wincap.cc: Implement above element throughout.
* libc/strfmon.c: New file.
* libc/strptime.cc: Remove locale constant strings in favor of
access to locale-specifc data.
(strptime): Point _CurrentTimeLocale to locale-specific data.
Throughout use correct locale-specific format fields for all
locale-specific formats.
* include/monetary.h: New file.
* include/cygwin/version.h (CYGWIN_VERSION_API_MINOR): Bump.
str_to_con.
* fhandler_console.cc (dev_console::con_to_str): Simplify. Always
default to the current internal locale.
(dev_console::get_console_cp): Always use codepage 437 for alternate
charset.
(dev_console::str_to_con): Constify charset parameter.
(fhandler_console::write_normal): Always use codepage 437 for alternate
charset. Otherwise always default to the current internal locale.
Replace ASCII SO with ASCII CAN.
* strfuncs.cc: Tweka comments according to below changes.
(sys_cp_wcstombs): Constify charset parameter. Convert all wchar_t
values in the Unicode private use area U+F0xx to the singlebyte
counterpart. Drop special handling creating ASCII SO sequence from
U+DCxx value. Rearrange for performance. Replace ASCII SO with
ASCII CAN.
(sys_cp_mbstowcs): Constify charset parameter. Replace ASCII SO with
ASCII CAN. Drop special case for U+DCxx ASCII SO sequences. Always
create a replacement from the Unicode private use area U+F0xx for
invalid byte values in a multibyte sequence. Do the same for wchar_t
values from the U+F0xx range to make them roundtrip safe.
* wchar.h (sys_cp_wcstombs): Constify charset parameter.
(sys_cp_mbstowcs): Ditto.
* wincap.cc: Implement above element throughout.
* wchar.h (__sjis_mbtowc): Declare.
(__eucjp_mbtowc): Ditto.
(__gbk_mbtowc): Ditto.
(__kr_mbtowc): Ditto.
(__big5_mbtowc): Ditto.
* syscalls.cc (internal_setlocale): Convert to char * function.
Return parameter by default. Return NULL if request to use a
charset can't be satisfied due to missing codepage support in the
underlying OS. Fix comment.
(setlocale): Store original locale. Restore to original locale if
internal_setlocale returns NULL.
* cygheap.h (struct cygheap_locale): New structure.
(struct user_heap_info): Add cygheap_locale member locale.
* dcrt0.cc (dll_crt0_1): Revert to calling _setlocale_r so that only
the applications locale is reverted to "C".
* environ.cc (environ_init): Remove unused got_lc variable.
* fhandler.h (class dev_console): Remove now unsed locale variables.
* fhandler_console.cc (fhandler_console::get_tty_stuff): Remove
setting dev_console's locale members.
(dev_console::con_to_str): Use internal locale settings. Default to
__ascii_wctomb if charset is "ASCII".
(fhandler_console::write_normal): Ditto.
* strfuncs.cc (__ascii_wctomb): Drop declaration.
(__db_wctomb): Use fixed value 2 instead of not
necessarily matching MB_CUR_MAX.
(__eucjp_wctomb): Use 3 instead of MB_CUR_MAX.
(sys_cp_wcstombs): Remove special case for "C" locale.
(sys_wcstombs): Implement here. Use internal locale data stored on
cygheap.
(sys_cp_mbstowcs): Remove special case for "C" locale.
(sys_mbstowcs): Implement here. Use internal locale data stored on
cygheap.
* syscalls.cc (internal_setlocale): New function to set cygheap locale
data and to reset CWD posix path.
(setlocale): Just call internal_setlocale from here if necessary.
* wchar.h (__ascii_wctomb): Declare.
(sys_wcstombs): Don't define inline, just declare.
(sys_mbstowcs): Ditto.
and con_charset.
(dev_console::str_to_con): Take mbtowc function pointer and charset
as additional parameters.
* fhandler_console.cc (fhandler_console::get_tty_stuff): Initialize
aforementioned new members. Explain why.
(dev_console::con_to_str): Remove useless comment. Call new
sys_cp_wcstombs function rather than sys_wcstombs.
(dev_console::str_to_con): Take mbtowc function pointer and charset
as additional parameters. Call sys_cp_mbstowcs accordingly.
(fhandler_console::write_normal): Only initialize f_mbtowc and charset
once. Accommodate changed str_to_con.
* strfuncs.cc (sys_cp_wcstombs): Renamed from sys_wcstombs. Take
wctomb function pointer and charset as parameters. Use throughout.
(sys_cp_mbstowcs): Take wctomb function pointer and charset as
parameters instead of codepage. Remove matching local variables and
their initialization.
* wchar.h (ENCODING_LEN): Define as in newlib.
(__mbtowc): Use mbtowc_p typedef for declaration.
(wctomb_f): New type.
(wctomb_p): New type.
(__wctomb): Declare.
(__utf8_wctomb): Use wctomb_f typedef for declaration.
(sys_cp_wcstombs): Move declaration from winsup.h here.
(sys_wcstombs): Ditto.
(sys_wcstombs_alloc): Ditto.
(sys_cp_mbstowcs): Ditto.
(sys_mbstowcs): Ditto.
(sys_mbstowcs_alloc): Ditto.
* winsup.h: Move declaration of sys_FOO functions to wchar.h. Include
wchar.h instead.
(__ctype_default): New character class array for default ASCII
character set.
(__ctype_iso): New array of character class array for ISO charsets.
(__ctype_cp): Ditto for singlebyte Windows codepages.
(tolower): Implement as distinct function to support any singlebyte
charset.
(toupper): Ditto.
(__set_ctype): New function to copy singlebyte character classes
corresponding to current charset to ctype_b array.
Align copyright text to upstream.
* dcrt0.cc (dll_crt0_1): Reset current locale to "C" per POSIX.
* environ.cc (set_file_api_mode): Remove.
(codepage_init): Remove.
(parse_thing): Remove "codepage" setting.
(environ_init): Set locale according to environment settings, or
to current codepage, before converting environment to multibyte.
* fhandler.h (fhandler_console::write_replacement_char): Drop argument.
* fhandler_console.cc (dev_console::str_to_con): Call sys_cp_mbstowcs
rather than MultiByteToWideChar.
(fhandler_console::write_replacement_char): Always print a funny
half filled square if a character isn't in the current charset.
(fhandler_console::write_normal): Convert to using __mbtowc
rather than next_char.
* fork.cc (frok::child): Drop call to set_file_api_mode.
* globals.cc (enum codepage_type) Remove.
(current_codepage): Remove.
* miscfuncs.cc (cygwin_wcslwr): Unused, dangerous. Remove.
(cygwin_wcsupr): Ditto.
(is_cp_multibyte): Remove.
(next_char): Remove.
* miscfuncs.h (is_cp_multibyte): Drop declaration.
(next_char): Ditto.
* strfuncs.cc (get_cp): Remove.
(__db_wctomb): New function to implement _wctomb_r functionality for
doublebyte charsets using WideCharToMultiByte.
(__sjis_wctomb): New function to replace unusable newlib function.
(__jis_wctomb): Ditto.
(__eucjp_wctomb): Ditto.
(__gbk_wctomb): New function.
(__kr_wctomb): Ditto.
(__big5_wctomb): Ditto.
(__db_mbtowc): New function to implement _mbtowc_r functionality for
doublebyte charsets using MultiByteToWideChar.
(__sjis_mbtowc): New function to replace unusable newlib function.
(__jis_mbtowc): Ditto.
(__eucjp_mbtowc): Ditto.
(__gbk_mbtowc): New function.
(__kr_mbtowc): New function
(__big5_mbtowc): New function
(__set_charset_from_codepage): New function.
(sys_wcstombs): Reimplement, basically using same wide char to multibyte
conversion as newlib's application level functions. Plus extras.
Add lengthy comment to explain. Change return type to size_t.
(sys_wcstombs_alloc): Just use sys_wcstombs. Change return type to
size_t.
(sys_cp_mbstowcs): Replace sys_mbstowcs, take additional codepage
argument. Explain why. Change return type to size_t.
(sys_mbstowcs_alloc): Just use sys_mbstowcs. Change return type to
size_t.
* wchar.h: Declare internal functions implemented in strfuncs.cc.
(wcscasecmp): Remove.
(wcsncasecmp): Remove.
(wcslwr): Remove.
(wcsupr): Remove.
* winsup.h (codepage_init): Remove declaration.
(get_cp): Ditto.
(sys_wcstombs): Align declaration to new implementation.
(sys_wcstombs_alloc): Ditto.
(sys_cp_mbstowcs): Add declaration.
(sys_mbstowcs): Define as inline function.
(sys_mbstowcs_alloc): Align declaration to new implementation.
(set_file_api_mode): Remove declaration.
* include/ctype.h (isblank): Redefine to use _B character class.
(toupper): Remove ASCII-only definition.
(tolower): Ditto.
(initial_env): Use small_printf's %P specifier.
* dll_init.cc (dll_list::alloc): Use PATH_MAX instead of CYG_MAX_PATH
for path name buffer size.
* dll_init.h (struct dll): Ditto.
* environ.cc: Include string.h.
(win_env::add_cache): Use temporary local buffer for path conversion.
(posify): Ditto.
* exceptions.cc (try_to_debug): Use CreateProcessW to allow long path
names.
* miscfuncs.cc: Drop unused implementations of strcasematch and
strncasematch.
(ch_case_eq): Drop.
(strcasestr): Drop.
(cygwin_wcscasecmp): New function.
(cygwin_wcsncasecmp): New function.
(cygwin_strcasecmp): New function.
(cygwin_strncasecmp): New function.
(cygwin_wcslwr): New function.
(cygwin_wcsupr): New function.
(cygwin_strlwr): New function.
(cygwin_strupr): New function.
* ntdll.h (RtlDowncaseUnicodeString): Declare.
(RtlUpcaseUnicodeString): Declare.
(RtlInt64ToHexUnicodeString): Fix typo in comment.
* string.h: Disable not NLS aware implementations of strcasematch
and strncasematch.
(cygwin_strcasecmp): Declare.
(strcasecmp): Define as cygwin_strcasecmp.
(cygwin_strncasecmp): Declare.
(strncasecmp): Define as cygwin_strncasecmp.
(strcasematch):Define using cygwin_strcasecmp.
(strncasematch):Define using cygwin_strncasecmp.
(cygwin_strlwr): Declare.
(strlwr): Define as cygwin_strlwr.
(cygwin_strupr): Declare.
(strupr): Define as cygwin_strupr.
* wchar.h: New file.
* wincap.cc (wincapc::init): Use "NT" as fix OS string.
* winsup.h (strcasematch): Drop declaration.
(strncasematch): Ditto.
(strcasestr): Ditto.