a bit more with POSIX and the other shells
I considered http://austingroupbugs.net/view.php?id=253 but the use
of bi_errorf() is interesting, especially as it’s often enough a
noreturn function, and funnily enough, 'cd -P /foo' returns 0 while
'chdir -P /foo' fails (so idk where to put -e)…
and switches to the TARGET_OS=Linux
• introduce android as regression test suite category
• add an android specific standard alias
• clean up redundant ‘-o sh’ arg in a few checks
UTF-8 BOM instead (UTFMODE has a separate value now for activated
during BOM skipping)
• parsing a COMSUB now skips UTF-8 BOM, too, but only temporarily
• PIPESTATUS now supported (like bash 2) whose last member
may actually differ from $? since the latter may not be the
result of a pipeline partial command
• add regression tests, documentation, etc.
• in interactive mode, always look up {LC_{ALL,CTYPE},LANG} environment
variables if setlocale/nl_langinfo(CODESET) doesn’t suffice
• add the ability to call any builtin (some don't make sense or wouldn't
work) directly by analysing argv[0]
• for direct builtin calls, the {LC_{ALL,CTYPE},LANG} environment
variables determine utf8-mode, even if MKSH_ASSUME_UTF8 was set
• when called as builtin, echo behaves POSIXish
• add domainname as alias for true on MirBSD only, to be able to link it
• sync mksh Makefiles with Build.sh output
• adjust manpage wrt release plans
• link some things to mksh now that we have callable builtins:
bin/echo bin/kill bin/pwd bin/sleep (exact matches)
bin/test bin/[ (were scripts before)
bin/domainname=usr/bin/true usr/bin/false (move to /bin/ now)
• drop linked utilities and, except for echo and kill, their manpages
• adjust instbin and link a few more there as well
– possible integer overflows in memory allocation, mostly
‣ multiplication: all are checked now
‣ addition: reviewed them, most were “proven” or guessed to be
“almost” impossible to run over (e.g. when we have a string
whose length is taken it is assumed that the length will be
more than only a few bytes below SIZE_MAX, since code and
stack have to fit); some are checked now (e.g. when one of
the summands is an off_t); most of the unchecked ones are
annotated now
⇒ cost (MirBSD/i386 static): +76 .text
⇒ cost (Debian sid/i386): +779 .text -4 .data
– on Linux targets, setuid() setresuid() setresgid() can fail
with EAGAIN; check for that and, if so, warn once and retry
infinitely (other targets to be added later once we know that
they are “insane”)
⇒ cost (Debian sid/i386): +192 .text (includes .rodata)
• setmode.c: Do overflow checking for realloc() too; switch back
from calloc() to a checked malloc() for simplification while there
• define -DIN_MKSH and let setmode.c look a tad nicer while here
│Don't alias 'stop' to 'kill -STOP'
│
│Android has already has a stop command used
│to stop the main runtime and the alias
│interferes with testing tools that expect
│stop to kill the runtime.
│
│Change-Id: I02b7efb9203dc39e97f63eb702a54ff79935b316
Although, this is closer to his first patchset and only takes
care of the alias, not the testsuite (which doesn’t run, at
least not out-of-the-box, nicely anyway) using #ifdef ANDROID.
We certainly want a more flexible testsuite…
and vendor pdksh versions, re-introduce FPOSIX alongside FSH. The semantics
are now:
‣ set -o posix ⇒
• disable brace expansion and FSH when triggered
• use Debian Policy 10.4 compliant non-XSI “echo” builtin
• do not keep file descriptors > 2 to ksh
‣ set -o sh ⇒
• set automatically #ifdef MKSH_BINSHREDUCED
• disable brace expansion and FPOSIX when triggered
• use Debian Policy 10.4 compliant non-XSI “echo” builtin
• do not keep file descriptors > 2 to ksh
• trigger MKSH_MIDNIGHTBSD01ASH_COMPAT mode if compiled in
• make “set -- $(getopt ab:c "$@")” construct work
Note that the set/getopt one used to behave POSIXly only with FSH or
FPOSIX (depending on the mksh version) set and Bourne-ish with it not
set, so this changes default mksh behaviour to POSIX!
• merge the rest of branch tg-wcswidth-behaviour
• enhance test cases for wcswidth-like behaviour
• switch hash table collision resolution algorithm to Python’s as announced
• bump vsn
• use a combination of the one-at-a-time hash and an LCG for handling
the $RANDOM special if !HAVE_ARC4RANDOM instead of rand(3)/srand(3)
and get rid of time(3) usage to reduce import footprint
• raise entropy state (mostly in the !HAVE_ARC4RANDOM case though…)
• simplify handling of the $RANDOM_SPECIAL generally
• tweak hash() to save a temp var for non-optimising compilers
• some int → mksh_ari_t and other type fixes
• general tweaking of code and comments
just a "somewhat more POSIX" but also a "/bin/sh legacy kludge" mode
* consistently capitalise POSIX and SUSv3/SUSv4 (same as AT&T ksh) and
Bourne shell
call it only if $RANDOM is indeed set (although pool extension would be a
possibility we do have arc4random_atexit which does it nicely too)
• avoid calling setspec for int→str conversion just before execve()
to it are now either arc4random or rand/srand, but srand retains the old
state; set +o arc4random is no longer possible, but if it's there we use
arc4random(3), if not, we use rand(3) for $RANDOM reads; optimise special
variable handling too and fix a few consts and other minor things
MKSH_S_EDIT for small (Emacs) editing mode, MKSH_S_FEAT for all the dis-
abled language features), which can be set to 0 despite MKSH_SMALL being
defined to re-enable the Vi command line editing mode (which I wouldn't,
but fits into the general mastermind scheme)
some GNU bash extensions (suggested by cnuke@) and bind macros
* make the random cache more efficient (and the code potentially
smaller, although we have a new implementation of the oaat hash
function, alongside the old one, now) and pushb only if needed
(i.e. state has changed or user has set $RANDOM, but not onfork)
• shell flags are now handled in one single place (sh_flags.h)
• sync comments (between enum and array) and manpage with reality
• FMONITOR is now no longer needed for Hartz IV shells
• we must not set the item pointer to NULL, since subsequent ktscan()
would stop there and not find any later occurrences
possible resolution strategies:
‣ still keep tablep; store a dummy value (either (void *)-1 or, probably
more portable, &ktenter or something like that) as is-free marker
⇒ retains benefit of keeping count of actually used entries
⇒ see below for further discussion
‣ don't keep tablep; revert back to setting entry->flag = 0
⇒ need to ktwalk() or ktsort() for getting number of entries
⇒ most simple code
‣ same but with a twist: make ktscan() set pp to the first one with
!(entry->flag & DEFINED)¹ so that it can subsequently be re-used,
or, more accurate, free’d and the entry pointer re-used
⇒ less chance of texpand()ing when not needed
‣ similar (from kabelaffe@): in ktsearch(), move the one we DID find
to the first unused one
⇒ doesn’t need tablep or something, but has the overall best
memory use
⇒ more complicated ktscan(): needs to check pointer for NULL, for
dummyval, then entry->flag
⇒ makes lookup more expensive
⇒ benefit: self-optimising hash tables
⇒ loss: still need ktwalk() or ktsort()
• when afree()ing in ktremove(), …
① need to take FINUSE into account
• Python-2.5.4/Objects/dictnotes.txt talks about cache lines
‣ linear backward scan is much worse than linear forward scan
(even if we have to calculate the upper C-array bound)
‣ dereferencing the entry pointer in ktscan() is a penalty
• Python-2.5.4/Objects/dictobject.c has a lot of comments and
a rather interesting collision resolution algorithm, which
seems to de-cluster better than linear search at not much
more cost
• clib and libobjfw have unusable (for looking-at-for-ideas)
hash table implementations
this is a no-op change breaking ifdef-out-d code; the most likely
to happen is to switch to the following scheme:
• keep tablep in struct tbl
• use a magic pointer value for ktremove’d entries, deallocate
the struct tbl as soon as possible – if not FINUSE, immediately
inside ktremove()
‣ memory gain, despite needing to have tablep around
• nuke ktdelete, so that all ops go through kt{enter,remove}
‣ gains us accurate fill information
‣ speed gain: ktscan() needs no longer dereference removed entries
‣ memory (ktsort) and speed (ktwalk) gain: removed entries are now
ignored right from the beginning, so tstate->left and the size
of the sorted array are accurate
‣ removed entries no longer can cause texpand() to be invoked
⇒ this does not give us self-optimising tables, but a speed and
memory benefit plus, probably, simplicity of code; we accurately
know how many non-deleted entries are in a keytab so we can cal-
culate if we need to expand, how much space ktsort() is going to
need, and, for when indexed arrays will be converted to use key-
tabs instead of singly linked linear lists, ${#foo[*]} is fast
(although ${!foo[*]}² and ${foo[*]}³ will need some tweaking and
may run a little less quickly)
• shuffle code around, so that things like search/scan and garbage
collection can be re-used
• use Python’s collision resolution algorithm ipv linear search
② the list of keys needs to be sorted, at least for indexed arrays⁴
③ this needs to be sorted by keys, at least for indexed arrays⁴
④ … but this is a nice-to-have for associative arrays⁵ as well
⑤ which we however do not have
it with the array index; var.c says that
│ 1244 /* The table entry is always [0] */
so that we can have a special flag and a union which stores hval for
the table index, the array index otherwise (coïncidentally *hint hint*
they have the same size)
return information needed to do a real ktremove instead of the pseudo
ktdelete operation which merely unsets the DEFINED flag to mark it as
eligible for texpand garbage collection (even worse, !DEFINED entries
are still counted)
much better avalanche and no known funnels
• improve comments
• fix some types (uint32_t for hash, size_t for sizes)
• optimise ktsort()
no functional change, I think
Build.sh but use 'if defined(PRECOND) && !defined(TOBEDEFINED)'if possible
* for all of the source code, drop annotations "imake style" (if we check
for specific OSes, bad, instead of using mirtoconf checks proper) and
"conditions correct?" (if I'm not entirely sure if that #if catches all
cases and no false positives) where I can see it by grepping immediately
* bump mksh patchlevel
* refresh Makefiles
starting with an ‘!’ exclamation mark at the beginning of a com-
mand (PS1 not PS2), shall have the same effect as the predefined
“r” alias, to be compatible with csh and GNU bash’s “!string” to
«Execute last used command starting with string» – documentation
and feature request provided by wbx@ (Waldemar Brodkorb)
• expose “#ifdef MKSH_MIDNIGHTBSD01ASH_COMPAT” just in case they decide to
require it and show it in the ksh version automatically
• sync the use of non-ASCII characters over files (unification)
fix the regression test’s results while here, which have been
broken since cid 10049D9BE5254CE65B8
• get rid of separate copyright file which was intended for De-
bian; track down commits in all files of oksh-mirbsd and mksh
to get correct copyright years per-file, as is BSD custom
struct env (other structures defined have no "foreign type with pos-
sible alignment constraints" members) and take care of it while dea-
ling in a struct env instance
pass "xerrok" status across the execution call stack to more closely
match what both POSIX and [18]ksh.1 already describe in regards to set
-e/errexit's behavior in determining when to exit from nonzero return
values.
specifically, the truth values tested as operands to &&' and ||', as
well as the resulting compound expression itself, along with the truth
value resulting from a negated command (i.e. a pipeline prefixed !'),
should not make the shell exit when -e is in effect.
issue reported by matthieu.
testing matthieu, naddy.
ok miod (earlier version), otto.
man page ok jmc.
on Debian Lenny/amd64 (XXX need more verification; this
can be used for 64 bit arithmetics later too)
PPID, PGRP, RANDOM, USER_ID are now unsigned by default
was hard to type and hard to fix, galloc is also hard to fix, and some
things I learned will probably improve things more but make me use the
original form as base (especially for space savings)
* let sizeofN die though, remove even more casts
* optimise, polish
* regen Makefiles
* sprinkle a few /* CONSTCOND */ while here
allocator using malloc and free, with mmap malloc and omalloc in mind,
not counterfeiting its security measures such as guard pages, and having
some of our own, e.g. XOR random cookies, optional mprotect, etc.
zero cost (for we have arc4random())
“-sh” if -DMKSH_BINSHREDUCED was passed during compilation, for example
for Debian, but d̲e̲f̲i̲n̲i̲t̲i̲v̲e̲l̲y̲ n̲̲o̲̲t̲̲ for MirBSD™
• split up regression test to force this behaviour
• remove the gunk from our MirBSD™ startup scripts again
• mention arc4random.c changes on website, sync clog, warn packagers
and .data instead of another initialisation; this was prompted by a bug
in scan-build (the value can never be NULL, but it doesn’t realise it),
although this doesn’t fix it, but less stack usage is always good
‣ only if !MKSH_SMALL
‣ add appropriate regression test
• if FPOSIX is set, do not close fds > 2 on exec, Debian #499139
• add appropriate regression tests for keeping fds private or not
‣ macro afreechk() is superfluous
• get rid of macro afreechv() by re-doing the “don’t leak that much” code
• some KNF (mostly, whitespace and 80c) while here
* initialise the integers PPID, OPTIND, RANDOM, SECONDS, and TMOUT to base-10
* bring back PGRP as base-10 integer to the process group via getpgrp(2)
* initialise USER_ID as base-10 integer to the effective user id as retrieved
from geteuid(2) = $(id -u)
* use $USER_ID in dot.mkshrc instead of spawning an id(1) process
-> dot.mkshrc,v 1.34 now requires mksh R34
* convert more int to bool where appropriate
* remove dead code - getpgrp(2) cannot fail
* sync manual page to reality
* bump to mksh R34(beta) - feature freeze
XXX check if our_pgrp in jobs.c is still really needed, the setpgid call
XXX probably just makes us our own pgrp leader, and we might have to use
XXX and update kshpgrp accordingly - need feedback/help here but I think
XXX this simplification should be possible if I grok the code correctly.
etc/profile:
* adjust to $USER_ID changes in mksh (speed-up here, too)
mksh.hts:
* sync changelog
• apply diff from mirbsdksh-1.11:
#ifdef DUP2_BROKEN
/* Ultrix systems like to preserve the close-on-exec flag */
‣ XXX we do #ifdef __ultrix here (imake-style) instead of mirtoconfing it
(but does anyone know of any other OS with the same problem? plus we’d
see it as we now know the symptoms)
• remove ultrix Build.hs warn=' but might work…' in the hope it DOES
‣ I/O redirection seems broken:
$ (date; date >/dev/null; date) | wc -l
1 (expected: 2)
‣ other than that: working fine
‣ -YBSD (default) and -YSYSTEM_FIVE don’t work, just -YPOSIX, somehow
• Fix $(…) to `…` for OSF/1 V2.0 /bin/sh
‣ this compiler is FUBAR though:
$ cat >t.c
main() { return (foo()); }
$ cc t.c
ld:
Unresolved :
foo
$ echo $?
0
$ ls -l a.out
-rwxr-xr-x 1 mirbsd users 10835 Jul 21 17:12 a.out
‣ it seems to have ucode, but man is not installed
• new mirtoconf check: mkstemp(3)
• if !HAVE_MKSTEMP (Ultrix), use tempnam(3)
• only use printf(1) if it exists (it doesn’t on Ultrix)
• a few more signals
• add S_ISLNK if the OS doesn’t define it
• add strcasecmp(3) proto for Ultrix (it _is_ in <portability.h>, but
only for -YBSD I think)
• fgrep(1) on Ultrix doesn’t do “-e ① -e ②”
10x DEChengst:#UnixNL for giving access
In contrast to AT&T ksh93, its semantics are like GNU bash in that it ap-
pends the current working directory to the search path; it is implemented
as a shell alias instead of enhancing funcs.c:shbuiltins[] like in ksh93.
is a compromise anyway; these lunox people will have to live with that, too
many existing korn shell alike scripts depend on it even if not on the full
korn shell syntax availability (note: this doesn't mean using these in some
script with #!/bin/sh is ok)
Analysis:
internal_errorf(int, fmt, ...) was only a __dead function if the int argument
was non-0, which the Prevent probably was unable to follow. Change all uses of
internal_errorf(0, fmt, ...) to internal_warningf(fmt, ...); change the pro-
totype of internal_errorf() to internal_errorf(fmt, ...) and all remaining
uses remove the non-0 int argument; add __dead to internal_errorf() proto;
flesh out guts of internal_errorf() and internal_warningf() into a new local
function for optimisation purposes.
Some whitespace cleanup and dead code removal (return after internal_errorf(1))
given to execute, standard input (interactive or not), via -c command line
argument, or after “eval”, but not for $(…) comsubs, at the beginning of a
subsequent line, or within a line, etc.); regression test for it
idea during my “week off” (despite the pain), bsiegert@ thinks it's good –
and utf-8 capable tools ought to be able to do this anyway
and have it return an API-correct const char *
• enhance and stylify comments
• a little KNF and simplifications
• #ifdef DEBUG: replace strchr and strstr with ucstrchr and ucstrstr
that take and return a non-const char *, and fix the violations
• new cstrchr, cstrstr (take and give const char *)
• new vstrchr, vstrstr (take const or not, give boolean value)
• new afreechk(x) = afreechv(x,x) = if (x1) afree(x2, ATEMP)
• new ksh_isdash(str) = (str != NULL) && !strcmp(str, "-")
• replace the only use of strrchr with inlined code to shrink
• minor man page fixes
• Minix 3 signames are autogenerated with gcc
• rename strlfun.c to strlcpy.c since we don't do strlcat(3) anyway,
only strlcpy(3), and shorten it
• dot.mkshrc: move MKSH=… down to the export line
to not disturb the PS1 visual impression ☺
• dot.mkshrc: Lstripcom(): optimise
• bump version
¹) side effect from creating API-correct cstrchr, cstrstr, etc.
uses goto so it must be better ☻
tested on mirbsd-current via both Makefile and Build.sh
to strcasestr, it was used in a wrong way (reverse logic error in
checking its return value), turning to mis-detection of UTF-8 locale.
* sh.h, check.t: bump version
* copyright: bump year
main.c: In function 'main':
main.c:208: warning: cast discards qualifiers from pointer target type
main.c:329: warning: cast discards qualifiers from pointer target type
no warnings at autoconf time left either; will take care of these two later
(might revisit changes from this commit), maybe change declararion for the
builtins to have their argv[] be const strings, and go through strict type
and qualifier checking again. this'll further improve stability.
XXX these changes might have introduced (more?) memory leaks,
XXX someone who knows about these tools should verify with
XXX automatic memory usage analysers (valgrind?)
still passes testsuite
* edit.c: remove debug stuff again; next time better use shl.c functions ;)
* sh.h: add format attributes to a few shf functions
* histrap.c, var.c: fix format string mistakes
* main.c, sh.h: error_prefix and warningf take bool not int
* misc.c: make chvt() stuff use shf_* functions
* misc.c: rewrite the TIOCSTTY stuff to be better integrated in mksh,
since it originally was an external patch
* misc.c: chvt() no longer fails if e.g. chown fails due to e.g. R/O / fs
* var.c: fix typeset padding for right-justified zero-filled
* integrate compat.h, version.h into sh.h (dependency trick didn't work anyway)
* mention #ksh in mksh(1) since the founder (twkm) said it's on topic too
(don't remove mention of #mksh despite it's usually empty because of control)