a bit more with POSIX and the other shells
I considered http://austingroupbugs.net/view.php?id=253 but the use
of bi_errorf() is interesting, especially as it’s often enough a
noreturn function, and funnily enough, 'cd -P /foo' returns 0 while
'chdir -P /foo' fails (so idk where to put -e)…
and switches to the TARGET_OS=Linux
• introduce android as regression test suite category
• add an android specific standard alias
• clean up redundant ‘-o sh’ arg in a few checks
UTF-8 BOM instead (UTFMODE has a separate value now for activated
during BOM skipping)
• parsing a COMSUB now skips UTF-8 BOM, too, but only temporarily
• PIPESTATUS now supported (like bash 2) whose last member
may actually differ from $? since the latter may not be the
result of a pipeline partial command
• add regression tests, documentation, etc.
• in interactive mode, always look up {LC_{ALL,CTYPE},LANG} environment
variables if setlocale/nl_langinfo(CODESET) doesn’t suffice
• add the ability to call any builtin (some don't make sense or wouldn't
work) directly by analysing argv[0]
• for direct builtin calls, the {LC_{ALL,CTYPE},LANG} environment
variables determine utf8-mode, even if MKSH_ASSUME_UTF8 was set
• when called as builtin, echo behaves POSIXish
• add domainname as alias for true on MirBSD only, to be able to link it
• sync mksh Makefiles with Build.sh output
• adjust manpage wrt release plans
• link some things to mksh now that we have callable builtins:
bin/echo bin/kill bin/pwd bin/sleep (exact matches)
bin/test bin/[ (were scripts before)
bin/domainname=usr/bin/true usr/bin/false (move to /bin/ now)
• drop linked utilities and, except for echo and kill, their manpages
• adjust instbin and link a few more there as well
– possible integer overflows in memory allocation, mostly
‣ multiplication: all are checked now
‣ addition: reviewed them, most were “proven” or guessed to be
“almost” impossible to run over (e.g. when we have a string
whose length is taken it is assumed that the length will be
more than only a few bytes below SIZE_MAX, since code and
stack have to fit); some are checked now (e.g. when one of
the summands is an off_t); most of the unchecked ones are
annotated now
⇒ cost (MirBSD/i386 static): +76 .text
⇒ cost (Debian sid/i386): +779 .text -4 .data
– on Linux targets, setuid() setresuid() setresgid() can fail
with EAGAIN; check for that and, if so, warn once and retry
infinitely (other targets to be added later once we know that
they are “insane”)
⇒ cost (Debian sid/i386): +192 .text (includes .rodata)
• setmode.c: Do overflow checking for realloc() too; switch back
from calloc() to a checked malloc() for simplification while there
• define -DIN_MKSH and let setmode.c look a tad nicer while here
│Don't alias 'stop' to 'kill -STOP'
│
│Android has already has a stop command used
│to stop the main runtime and the alias
│interferes with testing tools that expect
│stop to kill the runtime.
│
│Change-Id: I02b7efb9203dc39e97f63eb702a54ff79935b316
Although, this is closer to his first patchset and only takes
care of the alias, not the testsuite (which doesn’t run, at
least not out-of-the-box, nicely anyway) using #ifdef ANDROID.
We certainly want a more flexible testsuite…
and vendor pdksh versions, re-introduce FPOSIX alongside FSH. The semantics
are now:
‣ set -o posix ⇒
• disable brace expansion and FSH when triggered
• use Debian Policy 10.4 compliant non-XSI “echo” builtin
• do not keep file descriptors > 2 to ksh
‣ set -o sh ⇒
• set automatically #ifdef MKSH_BINSHREDUCED
• disable brace expansion and FPOSIX when triggered
• use Debian Policy 10.4 compliant non-XSI “echo” builtin
• do not keep file descriptors > 2 to ksh
• trigger MKSH_MIDNIGHTBSD01ASH_COMPAT mode if compiled in
• make “set -- $(getopt ab:c "$@")” construct work
Note that the set/getopt one used to behave POSIXly only with FSH or
FPOSIX (depending on the mksh version) set and Bourne-ish with it not
set, so this changes default mksh behaviour to POSIX!
• merge the rest of branch tg-wcswidth-behaviour
• enhance test cases for wcswidth-like behaviour
• switch hash table collision resolution algorithm to Python’s as announced
• bump vsn
• use a combination of the one-at-a-time hash and an LCG for handling
the $RANDOM special if !HAVE_ARC4RANDOM instead of rand(3)/srand(3)
and get rid of time(3) usage to reduce import footprint
• raise entropy state (mostly in the !HAVE_ARC4RANDOM case though…)
• simplify handling of the $RANDOM_SPECIAL generally
• tweak hash() to save a temp var for non-optimising compilers
• some int → mksh_ari_t and other type fixes
• general tweaking of code and comments
just a "somewhat more POSIX" but also a "/bin/sh legacy kludge" mode
* consistently capitalise POSIX and SUSv3/SUSv4 (same as AT&T ksh) and
Bourne shell
call it only if $RANDOM is indeed set (although pool extension would be a
possibility we do have arc4random_atexit which does it nicely too)
• avoid calling setspec for int→str conversion just before execve()
to it are now either arc4random or rand/srand, but srand retains the old
state; set +o arc4random is no longer possible, but if it's there we use
arc4random(3), if not, we use rand(3) for $RANDOM reads; optimise special
variable handling too and fix a few consts and other minor things
MKSH_S_EDIT for small (Emacs) editing mode, MKSH_S_FEAT for all the dis-
abled language features), which can be set to 0 despite MKSH_SMALL being
defined to re-enable the Vi command line editing mode (which I wouldn't,
but fits into the general mastermind scheme)
some GNU bash extensions (suggested by cnuke@) and bind macros
* make the random cache more efficient (and the code potentially
smaller, although we have a new implementation of the oaat hash
function, alongside the old one, now) and pushb only if needed
(i.e. state has changed or user has set $RANDOM, but not onfork)
• shell flags are now handled in one single place (sh_flags.h)
• sync comments (between enum and array) and manpage with reality
• FMONITOR is now no longer needed for Hartz IV shells
• we must not set the item pointer to NULL, since subsequent ktscan()
would stop there and not find any later occurrences
possible resolution strategies:
‣ still keep tablep; store a dummy value (either (void *)-1 or, probably
more portable, &ktenter or something like that) as is-free marker
⇒ retains benefit of keeping count of actually used entries
⇒ see below for further discussion
‣ don't keep tablep; revert back to setting entry->flag = 0
⇒ need to ktwalk() or ktsort() for getting number of entries
⇒ most simple code
‣ same but with a twist: make ktscan() set pp to the first one with
!(entry->flag & DEFINED)¹ so that it can subsequently be re-used,
or, more accurate, free’d and the entry pointer re-used
⇒ less chance of texpand()ing when not needed
‣ similar (from kabelaffe@): in ktsearch(), move the one we DID find
to the first unused one
⇒ doesn’t need tablep or something, but has the overall best
memory use
⇒ more complicated ktscan(): needs to check pointer for NULL, for
dummyval, then entry->flag
⇒ makes lookup more expensive
⇒ benefit: self-optimising hash tables
⇒ loss: still need ktwalk() or ktsort()
• when afree()ing in ktremove(), …
① need to take FINUSE into account
• Python-2.5.4/Objects/dictnotes.txt talks about cache lines
‣ linear backward scan is much worse than linear forward scan
(even if we have to calculate the upper C-array bound)
‣ dereferencing the entry pointer in ktscan() is a penalty
• Python-2.5.4/Objects/dictobject.c has a lot of comments and
a rather interesting collision resolution algorithm, which
seems to de-cluster better than linear search at not much
more cost
• clib and libobjfw have unusable (for looking-at-for-ideas)
hash table implementations
this is a no-op change breaking ifdef-out-d code; the most likely
to happen is to switch to the following scheme:
• keep tablep in struct tbl
• use a magic pointer value for ktremove’d entries, deallocate
the struct tbl as soon as possible – if not FINUSE, immediately
inside ktremove()
‣ memory gain, despite needing to have tablep around
• nuke ktdelete, so that all ops go through kt{enter,remove}
‣ gains us accurate fill information
‣ speed gain: ktscan() needs no longer dereference removed entries
‣ memory (ktsort) and speed (ktwalk) gain: removed entries are now
ignored right from the beginning, so tstate->left and the size
of the sorted array are accurate
‣ removed entries no longer can cause texpand() to be invoked
⇒ this does not give us self-optimising tables, but a speed and
memory benefit plus, probably, simplicity of code; we accurately
know how many non-deleted entries are in a keytab so we can cal-
culate if we need to expand, how much space ktsort() is going to
need, and, for when indexed arrays will be converted to use key-
tabs instead of singly linked linear lists, ${#foo[*]} is fast
(although ${!foo[*]}² and ${foo[*]}³ will need some tweaking and
may run a little less quickly)
• shuffle code around, so that things like search/scan and garbage
collection can be re-used
• use Python’s collision resolution algorithm ipv linear search
② the list of keys needs to be sorted, at least for indexed arrays⁴
③ this needs to be sorted by keys, at least for indexed arrays⁴
④ … but this is a nice-to-have for associative arrays⁵ as well
⑤ which we however do not have
it with the array index; var.c says that
│ 1244 /* The table entry is always [0] */
so that we can have a special flag and a union which stores hval for
the table index, the array index otherwise (coïncidentally *hint hint*
they have the same size)
return information needed to do a real ktremove instead of the pseudo
ktdelete operation which merely unsets the DEFINED flag to mark it as
eligible for texpand garbage collection (even worse, !DEFINED entries
are still counted)