182 lines
434 KiB
Plaintext
182 lines
434 KiB
Plaintext
|
<feed xmlns="http://www.w3.org/2005/Atom">
|
|||
|
<title>research!rsc</title>
|
|||
|
<id>tag:research.swtch.com,2012:research.swtch.com</id>
|
|||
|
<link rel="self" href="http://research.swtch.com/feed.atom"></link>
|
|||
|
<updated>2019-03-01T11:01:00-05:00</updated>
|
|||
|
<author>
|
|||
|
<name>Russ Cox</name>
|
|||
|
<uri>https://swtch.com/~rsc</uri>
|
|||
|
<email>rsc@swtch.com</email>
|
|||
|
</author>
|
|||
|
<entry>
|
|||
|
<title>Transparent Logs for Skeptical Clients</title>
|
|||
|
<id>tag:research.swtch.com,2012:research.swtch.com/tlog</id>
|
|||
|
<link rel="alternate" href="http://research.swtch.com/tlog"></link>
|
|||
|
<published>2019-03-01T11:00:00-05:00</published>
|
|||
|
<updated>2019-03-01T11:02:00-05:00</updated>
|
|||
|
<summary type="text">How an untrusted server can publish a verifiably append-only log.</summary>
|
|||
|
<content type="html">

<p>
Suppose we want to maintain and publish a public, append-only log of data.
Suppose also that clients are skeptical about our correct implementation
and operation of the log:
it might be to our advantage to leave things out of the log,
or to enter something in the log today and then remove it tomorrow.
How can we convince the client we are behaving?

<p>
This post is about an elegant data structure we can use to publish
a log of <i>N</i> records with these three properties:
<ol>
<li>
For any specific record <i>R</i> in a log of length <i>N</i>,
we can construct a proof of length
<i>O</i>(lg <i>N</i>) allowing the client to verify that <i>R</i> is in the log.
<li>
For any earlier log observed and remembered by the client,
we can construct a proof of length
<i>O</i>(lg <i>N</i>) allowing the client to verify that the earlier log
is a prefix of the current log.
<li>
An auditor can efficiently iterate over the records in the log.</ol>


<p>
(In this post, “lg <i>N</i>” denotes the base-2 logarithm of <i>N</i>,
reserving the word “log” to mean only “a sequence of records.”)

<p>
The
<a href="https://www.certificate-transparency.org/">Certificate Transparency</a>
project publishes TLS certificates in this kind of log.
Google Chrome uses property (1) to verify that
an <a href="https://en.wikipedia.org/wiki/Extended_Validation_Certificate">enhanced validation certificate</a>
is recorded in a known log before accepting the certificate.
Property (2) ensures that an accepted certificate cannot later disappear from the log undetected.
Property (3) allows an auditor to scan the entire certificate log
at any later time to detect misissued or stolen certificates.
All this happens without blindly trusting that
the log itself is operating correctly.
Instead, the clients of the log—Chrome and any auditors—verify
correct operation of the log as part of accessing it.

<p>
This post explains the design and implementation
of this verifiably tamper-evident log,
also called a <i>transparent log</i>.
To start, we need some cryptographic building blocks.
<a class=anchor href="#cryptographic_hashes_authentication_and_commitments"><h2 id="cryptographic_hashes_authentication_and_commitments">Cryptographic Hashes, Authentication, and Commitments</h2></a>


<p>
A <i>cryptographic hash function</i> is a deterministic
function H that maps an arbitrary-size message <i>M</i>
to a small fixed-size output H(<i>M</i>),
with the property that it is infeasible in practice to produce
any pair of distinct messages <i>M<sub>1</sub></i> ≠ <i>M<sub>2</sub></i> with
identical hashes H(<i>M<sub>1</sub></i>) = H(<i>M<sub>2</sub></i>).
Of course, what is feasible in practice changes.
In 1995, SHA-1 was a reasonable cryptographic hash function.
In 2017, SHA-1 became a <i>broken</i> cryptographic hash function,
when researchers identified and demonstrated
a <a href="https://shattered.io/">practical way to generate colliding messages</a>.
Today, SHA-256 is believed to be a reasonable cryptographic hash function.
Eventually it too will be broken.

<p>
A (non-broken) cryptographic hash function provides
a way to bootstrap a small amount of trusted data into
a much larger amount of data.
Suppose I want to share a very large file with you,
but I am concerned that the data may not arrive intact,

|
|||
|
</entry>
|
|||
|
<entry>
|
|||
|
<title>An Encoded Tree Traversal</title>
|
|||
|
<id>tag:research.swtch.com,2012:research.swtch.com/treenum</id>
|
|||
|
<link rel="alternate" href="http://research.swtch.com/treenum"></link>
|
|||
|
<published>2019-02-25T12:00:00-05:00</published>
|
|||
|
<updated>2019-02-25T12:02:00-05:00</updated>
|
|||
|
<summary type="text">An unexpected tree traversal ordering.</summary>
|
|||
|
<content type="html">

<p>
Every basic data structures course identifies three ways to traverse a binary tree.
It’s not entirely clear how to generalize them to <i>k</i>-ary trees,
and I recently noticed an unexpected ordering that I’d like to know more about.
If you know of references to this ordering, please leave a comment
or email me (<i>rsc@swtch.com</i>).
<a class=anchor href="#binary_tree_orderings"><h2 id="binary_tree_orderings">Binary Tree Orderings</h2></a>


<p>
First a refresher about binary-tree orderings
to set up an analogy to <i>k</i>-ary trees.

<p>
Preorder visits a node before its left and right subtrees:

<p>
<img name="treenum-b2-pre" class="center pad" width=625 height=138 src="treenum-b2-pre.png" srcset="treenum-b2-pre.png 1x, treenum-b2-pre@1.5x.png 1.5x, treenum-b2-pre@2x.png 2x, treenum-b2-pre@3x.png 3x, treenum-b2-pre@4x.png 4x">

<p>
Inorder visits a node between its left and right subtrees:

<p>
<img name="treenum-b2-in" class="center pad" width=625 height=138 src="treenum-b2-in.png" srcset="treenum-b2-in.png 1x, treenum-b2-in@1.5x.png 1.5x, treenum-b2-in@2x.png 2x, treenum-b2-in@3x.png 3x, treenum-b2-in@4x.png 4x">

<p>
Postorder visits a node after its left and right subtrees:

<p>
<img name="treenum-b2-post" class="center pad" width=625 height=138 src="treenum-b2-post.png" srcset="treenum-b2-post.png 1x, treenum-b2-post@1.5x.png 1.5x, treenum-b2-post@2x.png 2x, treenum-b2-post@3x.png 3x, treenum-b2-post@4x.png 4x">

<p>
Each picture shows the same 16-leaf, 31-node binary tree, with the nodes
numbered and also placed horizontally using the order
visited in the given traversal.

<p>
It was observed long ago that one way to represent a tree
in linear storage is to record the nodes in a fixed order
(such as one of these), along with a separate array giving
the number of children of each node.
In the pictures, the trees are complete, balanced trees, so the
number of children of each node can be derived from
the number of total leaves.
(For more, see Knuth Volume 1 §2.3.1;
for references, see §2.3.1.6, and §2.6.)

<p>
It is convenient to refer to nodes in a tree by
a two-dimensional coordinate (<i>l</i>, <i>n</i>), consisting of the level of
the node (with 0 being the leaves) and its sequence number at that level.
For example, the root of the 16-node tree has coordinate (4, 0),
while the leaves are (0, 0) through (0, 15).

<p>
When storing a tree using a linearized ordering such as these,
it is often necessary to be able to convert a two-dimensional
coordinate to its index in the linear ordering.
For example,
the right child of the root—node (3, 1)—has
number 16, 23, and 29
in the three different orderings.

<p>
The linearized pre-ordering of (<i>l</i>, <i>n</i>) is given by:<blockquote>

<p>
seq(<i>L</i>, 0) = 0 (<i>L</i> is height of tree)<br>
seq(<i>l</i>, <i>n</i>) = seq(<i>l</i>+1, <i>n</i>/2) + 1 (<i>n</i> even)<br>
seq(<i>l</i>, <i>n</i>) = seq(<i>l</i>+1, <i>n</i>/2) + 2<sup><i>l</i>+1</sup> (<i>n</i> odd)</blockquote>

<p>
This ordering is awkward because it changes depending on the height of the tree.

<p>
The linearized post-ordering of (<i>l</i>, <i>n</i>) is given by:<blockquote>

<p>
seq(0, <i>n
|
|||
|
</entry>
|
|||
|
<entry>
|
|||
|
<title>Our Software Dependency Problem</title>
|
|||
|
<id>tag:research.swtch.com,2012:research.swtch.com/deps</id>
|
|||
|
<link rel="alternate" href="http://research.swtch.com/deps"></link>
|
|||
|
<published>2019-01-23T11:00:00-05:00</published>
|
|||
|
<updated>2019-01-23T11:02:00-05:00</updated>
|
|||
|
<summary type="text">Download and run code from strangers on the internet. What could go wrong?</summary>
|
|||
|
<content type="html">

<p>
For decades, discussion of software reuse was far more common than actual software reuse.
Today, the situation is reversed: developers reuse software written by others every day,
in the form of software dependencies,
and the situation goes mostly unexamined.

<p>
My own background includes a decade of working with
Google’s internal source code system,
which treats software dependencies as a first-class concept,<a class=footnote id=body1 href="#note1"><sup>1</sup></a>
and also developing support for
dependencies in the Go programming language.<a class=footnote id=body2 href="#note2"><sup>2</sup></a>

<p>
Software dependencies carry with them
serious risks that are too often overlooked.
The shift to easy, fine-grained software reuse has happened so quickly
that we do not yet understand the best practices for choosing
and using dependencies effectively,
or even for deciding when they are appropriate and when not.
My purpose in writing this article is to raise awareness of the risks
and encourage more investigation of solutions.
<a class=anchor href="#what_is_a_dependency"><h2 id="what_is_a_dependency">What is a dependency?</h2></a>


<p>
In today’s software development world,
a <i>dependency</i> is additional code that you want to call from your program.
Adding a dependency avoids repeating work already done:
designing, writing, testing, debugging, and maintaining a specific
unit of code.
In this article we’ll call that unit of code a <i>package</i>;
some systems use terms like library or module instead of package.

<p>
Taking on externally-written dependencies is an old practice:
most programmers have at one point in their careers
had to go through the steps of manually downloading and installing
a required library, like C’s PCRE or zlib, or C++’s Boost or Qt,
or Java’s JodaTime or JUnit.
These packages contain high-quality, debugged code
that required significant expertise to develop.
For a program that needs the functionality provided by one of these packages,
the tedious work of manually downloading, installing, and updating
the package
is easier than the work of redeveloping that functionality from scratch.
But the high fixed costs of reuse
mean that manually-reused packages tend to be big:
a tiny package would be easier to reimplement.

<p>
A <i>dependency manager</i>
(sometimes called a package manager)
automates the downloading and installation of dependency packages.
As dependency managers
make individual packages easier to download and install,
the lower fixed costs make
smaller packages economical to publish and reuse.

<p>
For example, the Node.js dependency manager NPM provides
access to over 750,000 packages.
One of them, <code>escape-string-regexp</code>,
provides a single function that escapes regular expression
operators in its input.
The entire implementation is:
<pre>var matchOperatorsRe = /[|\\{}()[\]^$+*?.]/g;

module.exports = function (str) {
	if (typeof str !== 'string') {
		throw new TypeError('Expected a string');
	}
	return str.replace(matchOperatorsRe, '\\$&amp;');
};
</pre>


<p>
Before dependency managers, publishing an eight-line code library
would have been unthinkable: too much overhead for too little benefit.
But NPM has driven the overhead approximately to zero,
with the result that nearly-trivial functionality
can be packaged and reused.
In late January 2019, the <code>escape-string-regexp</code> package
is explicitly depended upon by almost
|
|||
|
</entry>
|
|||
|
<entry>
|
|||
|
<title>Why Add Versions To Go?</title>
|
|||
|
<id>tag:research.swtch.com,2012:research.swtch.com/vgo-why-versions</id>
|
|||
|
<link rel="alternate" href="http://research.swtch.com/vgo-why-versions"></link>
|
|||
|
<published>2018-06-07T10:20:00-04:00</published>
|
|||
|
<updated>2018-06-07T10:22:00-04:00</updated>
|
|||
|
<summary type="text">Why should Go understand package versions at all? (Go & Versioning, Part 10)</summary>
|
|||
|
<content type="html">

<p>
People sometimes ask me why we should add package versions to Go at all.
Isn't Go doing well enough without versions?
Usually these people have had a bad experience with versions
in another language, and they associate versions with breaking changes.
In this post, I want to talk a little about why we do need to add support for package versions to Go.
Later posts will address why we won't encourage breaking changes.

<p>
The <code>go</code> <code>get</code> command has two failure modes caused by ignorance of versions:
it can use code that is too old, and it can use code that is too new.
For example, suppose we want to use a package D, so we run <code>go</code> <code>get</code> <code>D</code>
with no packages installed yet.
The <code>go</code> <code>get</code> command will download the latest copy of D
(whatever <code>git</code> <code>clone</code> brings down),
which builds successfully.
To make our discussion easier, let's call that D version 1.0
and keep D's dependency requirements in mind
(and in our diagrams).
But remember that while we understand the idea of versions
and dependency requirements, <code>go</code> <code>get</code> does not.
<pre>$ go get D
</pre>


<p>
<img name="vgo-why-1" class="center pad" width=200 height=39 src="vgo-why-1.png" srcset="vgo-why-1.png 1x, vgo-why-1@1.5x.png 1.5x, vgo-why-1@2x.png 2x, vgo-why-1@3x.png 3x, vgo-why-1@4x.png 4x">

<p>
Now suppose that a month later, we want to use C, which happens to import D.
We run <code>go</code> <code>get</code> <code>C</code>.
The <code>go</code> <code>get</code> command downloads the latest copy of C,
which happens to be C 1.8 and imports D.
Since <code>go</code> <code>get</code> already has a downloaded copy of D, it uses that one
instead of incurring the cost of a fresh download.
Unfortunately, the build of C fails:
C is using a new feature from D introduced in D 1.4,
and <code>go</code> <code>get</code> is reusing D 1.0.
The code is too old.
<pre>$ go get C
</pre>


<p>
<img name="vgo-why-2" class="center pad" width=201 height=96 src="vgo-why-2.png" srcset="vgo-why-2.png 1x, vgo-why-2@1.5x.png 1.5x, vgo-why-2@2x.png 2x, vgo-why-2@3x.png 3x, vgo-why-2@4x.png 4x">

<p>
Next we try running <code>go</code> <code>get</code> <code>-u</code>, which downloads the latest
copy of all the code involved, including code already downloaded.
<pre>$ go get -u C
</pre>


<p>
<img name="vgo-why-3" class="center pad" width=201 height=104 src="vgo-why-3.png" srcset="vgo-why-3.png 1x, vgo-why-3@1.5x.png 1.5x, vgo-why-3@2x.png 2x, vgo-why-3@3x.png 3x, vgo-why-3@4x.png 4x">

<p>
Unfortunately, D 1.6 was released an hour ago and
contains a bug that breaks C.
Now the code is too new.
Watching this play out from above, we know what <code>go</code> <code>get</code>
needs to do: use D ≥ 1.4 but not D 1.6, so maybe D 1.4 or D 1.5.
It's very difficult to tell <code>go</code> <code>get</code> that today,
since it doesn't understand the concept of a package version.

<p>
Getting back to the original question in the post, <i>why add versions to Go?</i>

<p>
Because agreeing on a versioning system—a syntax for version identifiers,
along with rules for how to order and interpret them—establishes
a
|
|||
|
</entry>
|
|||
|
<entry>
|
|||
|
<title>What is Software Engineering?</title>
|
|||
|
<id>tag:research.swtch.com,2012:research.swtch.com/vgo-eng</id>
|
|||
|
<link rel="alternate" href="http://research.swtch.com/vgo-eng"></link>
|
|||
|
<published>2018-05-30T10:00:00-04:00</published>
|
|||
|
<updated>2018-05-30T10:02:00-04:00</updated>
|
|||
|
<summary type="text">What is software engineering and what does Go mean by it? (Go & Versioning, Part 9)</summary>
|
|||
|
<content type="html">

<p>
Nearly all of Go’s distinctive design decisions
were aimed at making software engineering simpler and easier.
We've said this often.
The canonical reference is Rob Pike's 2012 article,
“<a href="https://talks.golang.org/2012/splash.article">Go at Google: Language Design in the Service of Software Engineering</a>.”
But what is software engineering?<blockquote>

<p>
<i>Software engineering is what happens to programming
<br>when you add time and other programmers.</i></blockquote>

<p>
Programming means getting a program working.
You have a problem to solve, you write some Go code,
you run it, you get your answer, you’re done.
That’s programming,
and that's difficult enough by itself.
But what if that code has to keep working, day after day?
What if five other programmers need to work on the code too?
Then you start to think about version control systems,
to track how the code changes over time
and to coordinate with the other programmers.
You add unit tests,
to make sure bugs you fix are not reintroduced over time,
not by you six months from now,
and not by that new team member who’s unfamiliar with the code.
You think about modularity and design patterns,
to divide the program into parts that team members
can work on mostly independently.
You use tools to help you find bugs earlier.
You look for ways to make programs as clear as possible,
so that bugs are less likely.
You make sure that small changes can be tested quickly,
even in large programs.
You're doing all of this because your programming
has turned into software engineering.

<p>
(This definition and explanation of software engineering
is my riff on an original theme by my Google colleague Titus Winters,
whose preferred phrasing is “software engineering is programming integrated over time.”
It's worth seven minutes of your time to see
<a href="https://www.youtube.com/watch?v=tISy7EJQPzI&t=8m17s">his presentation of this idea at CppCon 2017</a>,
from 8:17 to 15:00 in the video.)

<p>
As I said earlier,
nearly all of Go’s distinctive design decisions
have been motivated by concerns about software engineering,
by trying to accommodate time and other programmers
into the daily practice of programming.

<p>
For example, most people think that we format Go code with <code>gofmt</code>
to make code look nicer or to end debates among
team members about program layout.
But the <a href="https://groups.google.com/forum/#!msg/golang-nuts/HC2sDhrZW5Y/7iuKxdbLExkJ">most important reason for <code>gofmt</code></a>
is that if an algorithm defines how Go source code is formatted,
then programs, like <code>goimports</code> or <code>gorename</code> or <code>go</code> <code>fix</code>,
can edit the source code more easily,
without introducing spurious formatting changes when writing the code back.
This helps you maintain code over time.

<p>
As another example, Go import paths are URLs.
If code said <code>import</code> <code>"uuid"</code>,
you’d have to ask which <code>uuid</code> package.
Searching for <code>uuid</code> on <a href="https://godoc.org">godoc.org</a> turns up dozens of packages.
If instead the code says <code>import</code> <code>"github.com/pborman/uuid"</code>,
now it’s clear which package we mean.
Using URLs avoids ambiguity
and also reuses an existing mechanism for giving out names,
making it simpler and easier to coordinate with other programmers.

<p>
Continuing the ex
|
|||
|
</entry>
|
|||
|
<entry>
|
|||
|
<title>The vgo proposal is accepted. Now what?</title>
|
|||
|
<id>tag:research.swtch.com,2012:research.swtch.com/vgo-accepted</id>
|
|||
|
<link rel="alternate" href="http://research.swtch.com/vgo-accepted"></link>
|
|||
|
<published>2018-05-29T16:45:00-04:00</published>
|
|||
|
<updated>2018-05-29T16:47:00-04:00</updated>
|
|||
|
<summary type="text">What is the state of vgo? (Go & Versioning, Part 8)</summary>
|
|||
|
<content type="html">

<p>
Last week, the proposal review committee accepted the “vgo approach” elaborated
on this blog in February and then summarized as <a href="https://golang.org/issue/24301">proposal #24301</a>.
There has been some confusion about exactly what that means
and what happens next.

<p>
In general, <a href="https://golang.org/s/proposal">a Go proposal</a> is a discussion about whether to adopt a particular
approach and move on to writing, reviewing, and releasing a production implementation. Accepting a proposal
does not mean the implementation is complete. (In some cases there is no
implementation yet at all!) Accepting a proposal only means that we believe
the design is appropriate and that the production implementation can proceed
and be committed and released.
Inevitably we find details that need adjustment during that process.

<p>
Vgo as it exists today is not the final implementation.
It is a prototype to make the ideas concrete
and to make it possible to experiment with the approach.
Bugs and design flaws will necessarily be found and fixed as we move toward making it the official
approach in the go command.
For example, the original vgo prototype downloaded code from sites
like GitHub using their APIs, for better efficiency and to avoid requiring
users to have every possible version control system installed.
Unfortunately, the GitHub API is far more restrictively rate-limited than
plain <code>git</code> access, so the current vgo implementation has gone back
to invoking <code>git</code>.
Although we'd still <a href="https://blogs.msdn.microsoft.com/devops/2018/05/29/announcing-the-may-2018-git-security-vulnerability/">like to move away</a>
from version control as the default
mechanism for obtaining open source code, we won't do that until we have a viable
replacement ready, to make any transition
as smooth as possible.

<p>
More generally, the key reason for the vgo proposal is to add a common vocabulary
and semantics around versions of Go code, so that
developers and all kinds of tools can be precise
when talking to each other about exactly which program should be built, run, or analyzed.
Accepting the proposal is the beginning, not the end.

<p>
One thing I've heard from many people is that they want to start
using vgo in their company or project but are held back by not having
support for it in the toolchains their developers are using.
The fact that vgo is integrated deeply into the go command,
instead of being a separate vendor directory-writer,
introduces a chicken-and-egg problem.
To address that problem and make it as easy as possible
for developers to try the vgo approach,
we plan to include vgo functionality as an experimental opt-in feature in Go 1.11,
with the hope of incorporating feedback and finalizing the feature for Go 1.12.
(This rollout is analogous to how we included vendor directory functionality
as an experimental opt-in feature in Go 1.5 and turned it on by default in Go 1.6.)
We also plan to make <a href="https://golang.org/issue/25069">minimal changes to legacy <code>go</code> <code>get</code></a> so that
it can obtain and understand code written using vgo conventions.
Those changes will be included in the next point release for Go 1.9 and Go 1.10.

<p>
One thing I've heard from zero people is that
<a href="https://research.swtch.com/vgo">they wish my blog posts were longer</a>.
The original posts are quite dense and a number of important points
are more buried than they should be.
This post is the first of a series of much shorter posts to try to make
focused points about specific details of the vgo design, ap
|
|||
|
</entry>
|
|||
|
<entry>
|
|||
|
<title>Versioned Go Commands</title>
|
|||
|
<id>tag:research.swtch.com,2012:research.swtch.com/vgo-cmd</id>
|
|||
|
<link rel="alternate" href="http://research.swtch.com/vgo-cmd"></link>
|
|||
|
<published>2018-02-23T10:09:00-05:00</published>
|
|||
|
<updated>2018-02-23T10:11:00-05:00</updated>
|
|||
|
<summary type="text">What does it mean to add versioning to the go command? (Go & Versioning, Part 7)</summary>
|
|||
|
<content type="html">

<p>
What does it mean to add versioning to the <code>go</code> command?
The <a href="vgo-intro">overview post</a> gave a preview,
but the followup posts focused mainly on underlying
details: <a href="vgo-import">the import compatibility rule</a>,
<a href="vgo-mvs">minimal version selection</a>,
and <a href="vgo-module">defining go modules</a>.
With those better understood, this post examines the
details of how versioning affects the <code>go</code> command line
and the reasons for those changes.

<p>
The major changes are:
<ul>
<li>


<p>
All commands (<code>go</code> <code>build</code>, <code>go</code> <code>run</code>, and so on)
will download imported source code automatically,
if the necessary version is not already present
in the download cache on the local system.
<li>


<p>
The <code>go</code> <code>get</code> command will serve mainly to change
which version of a package should be used in future
build commands.
<li>


<p>
The <code>go</code> <code>list</code> command will add access to module
information.
<li>


<p>
A new <code>go</code> <code>release</code> command will automate some of the
work a module author should do when tagging a new release,
such as checking API compatibility.
<li>


<p>
The <code>all</code> pattern is redefined to make sense in the
world of modules.
<li>


<p>
Developers can and will be encouraged to work in
directories outside the GOPATH tree.</ul>


<p>
All these changes are implemented in the <code>vgo</code> prototype.

<p>
Deciding exactly how a build system should work is hard.
The introduction of new build caching in Go 1.10 prompted some
important, difficult decisions about the meaning of <code>go</code> commands,
and the introduction of versioning does too.
Before I explain some of the decisions, I want to start by
explaining a guiding principle that I've found helpful recently,
which I call the isolation rule:<blockquote>

<p>
<i>The result of a build command should depend only on the
source files that are its logical inputs, never on
hidden state left behind by previous build commands.)</i>

<p>
<i>That is, what a command does in isolation—on a
clean system loaded with only the relevant input
source files—is what it should do all the time,
no matter what else has happened on the system recently.</i></blockquote>

<p>
To see the wisdom of this rule, let me retell an old build story
and show how the isolation rule explains what happened.
<a class=anchor href="#old_build_story"><h2 id="old_build_story">An Old Build Story</h2></a>


<p>
Long ago, when compilers and computers were very slow,
developers had scripts to build their whole programs from scratch,
but if they were just modifying one source file,
they might save time by manually recompiling just that file
and then relinking the overall program,
avoiding the cost of recompiling all the source files that
hadn't changed.
These manual incremental builds were fast but error-prone:
if you forgot to recompile a source file that you'd modified,
the link of the final executable would use an out-of-date object file,
the executable would demonstrate buggy behavior,
and you might spend a long time staring at the (correct!) source code
looki
|
|||
|
</entry>
|
|||
|
<entry>
|
|||
|
<title>Defining Go Modules</title>
|
|||
|
<id>tag:research.swtch.com,2012:research.swtch.com/vgo-module</id>
|
|||
|
<link rel="alternate" href="http://research.swtch.com/vgo-module"></link>
|
|||
|
<published>2018-02-22T17:00:00-05:00</published>
|
|||
|
<updated>2018-02-22T17:02:00-05:00</updated>
|
|||
|
<summary type="text">How to specify what's in a module. (Go & Versioning, Part 6)</summary>
|
|||
|
<content type="html">

<p>
As introduced in the <a href="vgo-intro">overview post</a>, a Go <i>module</i>
is a collection of packages versioned as a unit,
along with a <code>go.mod</code> file listing other required modules.
The move to modules is an opportunity for us to revisit and fix
many details of how the <code>go</code> command manages source code.
The current <code>go</code> <code>get</code> model will be about ten years old when we
retire it in favor of modules.
We need to make sure that the module design will serve us
well for the next decade. In particular:
<ul>
<li>


<p>
We want to encourage more developers to tag releases of their
packages, instead of expecting that
users will just pick a commit hash that looks good to them.
Tagging explicit releases makes clear what is expected to be
useful to others and what is still under development.
At the same time, it must still be possible—although maybe not convenient—to request specific
commits.
<li>


<p>
We want to move away from invoking version control
tools such as <code>bzr</code>, <code>fossil</code>, <code>git</code>, <code>hg</code>, and <code>svn</code> to download source code.
These fragment the ecosystem: packages developed using Bazaar or
Fossil, for example, are effectively unavailable to users who cannot
or choose not to install these tools.
The version control tools have also been a source of <a href="https://golang.org/issue/22131">exciting</a> <a href="https://www.mercurial-scm.org/wiki/WhatsNew/Archive#Mercurial_3.2.3_.282014-12-18.29">security</a> <a href="https://git-blame.blogspot.com/2014/12/git-1856-195-205-214-and-221-and.html">problems</a>.
It would be good to move them outside the security perimeter.
<li>


<p>
We want to allow multiple modules to be developed in a single
source code repository but versioned independently.
While most developers will likely keep working with one module per repo,
larger projects might benefit from having multiple modules in a single repo.
For example, we'd like to keep <code>golang.org/x/text</code> a single repository
but be able to version experimental new packages separately
from established packages.
<li>


<p>
We want to make it easy for individuals and companies to
put caching proxies in front of <code>go</code> <code>get</code> downloads, whether for availability
(use a local copy to ensure the download works tomorrow)
or security
(vet packages before they can be used inside a company).
<li>


<p>
We want to make it possible, at some future point, to introduce
a shared proxy for use by the Go community, similar in spirit
to those used by Rust, Node, and other languages.
At the same time, the design must work well without assuming
such a proxy or registry.
<li>


<p>
We want to eliminate vendor directories. They were introduced
for reproducibility and availability, but we now have better
mechanisms.
Reproducibility is handled by proper versioning, and availability
is handled by caching proxies.</ul>


<p>
This post presents the parts of the <code>vgo</code> design that address
these issues.
Everything here is preliminary: we will change the design
if we find that it is not right.
<a class=anchor href="#versioned_releases"><h2 id="versioned_releases">Versioned Releases</h2></a>


<p>
Abstraction boundaries let projects scale.
Originally, all Go packages could be imported b
|
|||
|
</entry>
|
|||
|
<entry>
|
|||
|
<title>Reproducible, Verifiable, Verified Builds</title>
|
|||
|
<id>tag:research.swtch.com,2012:research.swtch.com/vgo-repro</id>
|
|||
|
<link rel="alternate" href="http://research.swtch.com/vgo-repro"></link>
|
|||
|
<published>2018-02-21T21:28:00-05:00</published>
|
|||
|
<updated>2018-02-21T21:30:00-05:00</updated>
|
|||
|
<summary type="text">Consistent builds in versioned Go. (Go & Versioning, Part 5)</summary>
|
|||
|
<content type="html">

<p>
Once both Go developers and tools share a vocabulary around package versions,
it's relatively straightforward to add support in the toolchain for
reproducible, verifiable, and verified builds.
In fact, the basics are already in the <code>vgo</code> prototype.

<p>
Since people sometimes disagree about the exact definitions
of these terms, let's establish some basic terminology.
For this post:
<ul>
<li>
A <i>reproducible build</i> is one that,
when repeated, produces the same result.
<li>
A <i>verifiable build</i> is one that records enough
information to be precise about exactly how to repeat it.
<li>
A <i>verified build</i> is one that checks that it is using
the expected source code.</ul>


<p>
<code>Vgo</code> delivers reproducible builds by default.
The resulting binaries are verifiable, in that
they record versions of the exact source code that went into the build.
And it is possible to configure your repository so that
users rebuilding your software verify that their builds
match yours, using cryptographic hashes,
no matter how they obtain the dependencies.
<a class=anchor href="#reproducible_builds"><h2 id="reproducible_builds">Reproducible Builds</h2></a>


<p>
At the very least, we want to make sure that when you build my program,
the build system decides to use the same versions of the code.
<a href="vgo-mvs">Minimal version selection</a> delivers this property by default.
The <code>go.mod</code> file alone is enough to uniquely determine which
module versions should be used for the build
(assuming dependencies are available),
and that decision is stable even as new versions of a module
are introduced into the ecosystem.
This differs from most other systems, which adopt new versions
automatically and need to be constrained to yield
reproducible builds.
I covered this in the minimal version selection post,
but it's an important, subtle detail, so I'll try to give a short reprise here.

<p>
To make this concrete, let's look at a few real packages from Cargo,
Rust's package manager.
To be clear, I am not picking on Cargo.
I think Cargo is an example of the current
state of the art in package managers,
and there's much to learn from it.
If we can make Go package management as smooth as Cargo's, I'll be happy.
But I also think that it is worth exploring whether we would
benefit from choosing a different default when it comes to
version selection.

<p>
Cargo prefers maximum versions in the following sense.
Over at crates.io, the latest <a href="https://crates.io/crates/toml"><code>toml</code></a> is 0.4.5
as I write this post.
It lists a dependency on <a href="https://crates.io/crates/serde"><code>serde</code></a> 1.0 or later;
the latest <code>serde</code> is 1.0.27.
If you start a new project and add a dependency on
<code>toml</code> 0.4.1 or later, Cargo has a choice to make.
According to the constraints, any of 0.4.1, 0.4.2, 0.4.3, 0.4.4, or 0.4.5 would be acceptable.
All other things being equal, Cargo prefers the <a href="cargo-newest.html">newest acceptable version</a>, 0.4.5.
Similarly, any of <code>serde</code> 1.0.0 through 1.0.27 are acceptable,
and Cargo chooses 1.0.27.
These choices change as new versions are introduced.
If <code>serde</code> 1.0.28 is released tonight and I add toml 0.4.5
to a project tomorrow, I'll get 1.0.28 instead of 1.0.27.
As described so far, Cargo's builds are not repeatable.&#x
|
|||
|
</entry>
|
|||
|
<entry>
|
|||
|
<title>Minimal Version Selection</title>
|
|||
|
<id>tag:research.swtch.com,2012:research.swtch.com/vgo-mvs</id>
|
|||
|
<link rel="alternate" href="http://research.swtch.com/vgo-mvs"></link>
|
|||
|
<published>2018-02-21T16:41:00-05:00</published>
|
|||
|
<updated>2018-02-21T16:43:00-05:00</updated>
|
|||
|
<summary type="text">How do builds select which versions to use? (Go & Versioning, Part 4)</summary>
|
|||
|
<content type="html">

<p>
A <a href="vgo-intro">versioned Go command</a> must decide which module versions to use in each build.
I call this list of modules and versions for use in a given build the <i>build list</i>.
For stable development, today's build list must also be tomorrow's build list.
But then developers must also be allowed to change the build list: to upgrade all modules, to upgrade one module, or to downgrade one module.

<p>
The <i>version selection</i> problem therefore is to define the meaning of, and to give algorithms implementing, these four operations on build lists:
<ol>
<li>
Construct the current build list.
<li>
Upgrade all modules to their latest versions.
<li>
Upgrade one module to a specific newer version.
<li>
Downgrade one module to a specific older version.</ol>


<p>
The last two operations specify one module to upgrade or downgrade, but doing so may require upgrading, downgrading, adding, or removing other modules, ideally as few as possible, to satisfy dependencies.

<p>
This post presents <i>minimal version selection</i>, a new, simple approach to the version selection problem.
Minimal version selection is easy to understand and predict,
which should make it easy to work with.
It also produces <i>high-fidelity builds</i>, in which the dependencies a user builds are as close as possible to the ones the author developed against.
It is also efficient to implement, using nothing more complex than recursive graph traversals,
so that a complete minimal version selection implementation in Go is only a few hundred lines of code.

<p>
Minimal version selection assumes that each module declares its own dependency requirements: a list of minimum versions of other modules. Modules are assumed to follow the <a href="vgo-import">import compatibility rule</a>—packages in any newer version should work as well as older ones—so a dependency requirement gives only a minimum version, never a maximum version or a list of incompatible later versions.

<p>
Then the definitions of the four operations are:
<ol>
<li>
To construct the build list for a given target: start the list with the target itself, and then append each requirement's own build list. If a module appears in the list multiple times, keep only the newest version.
<li>
To upgrade all modules to their latest versions: construct the build list, but read each requirement as if it requested the latest module version.
<li>
To upgrade one module to a specific newer version: construct the non-upgraded build list and then append the new module's build list. If a module appears in the list multiple times, keep only the newest version.
<li>
To downgrade one module to a specific older version: rewind the required version of each top-level requirement until that requirement's build list no longer refers to newer versions of the downgraded module.</ol>


<p>
These operations are simple, efficient, and easy to implement.
<a class=anchor href="#example"><h2 id="example">Example</h2></a>


<p>
Before we examine minimal version selection in more detail, let's look at why a new approach is necessary. We'll use the following set of modules as a running example throughout the post:

<p>
<img name="version-select-1" class="center pad" width=463 height=272 src="version-select-1.png" srcset="version-select-1.png 1x, version-select-1@1.5x.png 1.5x, version-select-1@2x.png 2x, version-select-1@3x.png 3x, version-select-1@4x.png 4x">

<p>
The diagram shows the module requirement graph for seven modules (dotte
|
|||
|
</entry>
|
|||
|
<entry>
|
|||
|
<title>Go and Dogma</title>
|
|||
|
<id>tag:research.swtch.com,2012:research.swtch.com/dogma</id>
|
|||
|
<link rel="alternate" href="http://research.swtch.com/dogma"></link>
|
|||
|
<published>2017-01-09T09:00:00-05:00</published>
|
|||
|
<updated>2017-01-09T09:02:00-05:00</updated>
|
|||
|
<summary type="text">Programming language dogmatics.</summary>
|
|||
|
<content type="html">

<p>
[<i>Cross-posting from last year’s <a href="https://www.reddit.com/r/golang/comments/46bd5h/ama_we_are_the_go_contributors_ask_us_anything/d05yyde/?context=3&st=ixq5hjko&sh=7affd469">Go contributors AMA</a> on Reddit, because it’s still important to remember.</i>]

<p>
One of the perks of working on Go these past years has been the chance to have many great discussions with other language designers and implementers, for example about how well various design decisions worked out or the common problems of implementing what look like very different languages (for example both Go and Haskell need some kind of “green threads”, so there are more shared runtime challenges than you might expect). In one such conversation, when I was talking to a group of early Lisp hackers, one of them pointed out that these discussions are basically never dogmatic. Designers and implementers remember working through the good arguments on both sides of a particular decision, and they’re often eager to hear about someone else’s experience with what happens when you make that decision differently. Contrast that kind of discussion with the heated arguments or overly zealous statements you sometimes see from users of the same languages. There’s a real disconnect, possibly because the users don’t have the experience of weighing the arguments on both sides and don’t realize how easily a particular decision might have gone the other way.

<p>
Language design and implementation is engineering. We make decisions using evaluations of costs and benefits or, if we must, using predictions of those based on past experience. I think we have an important responsibility to explain both sides of a particular decision, to make clear that the arguments for an alternate decision are actually good ones that we weighed and balanced, and to avoid the suggestion that particular design decisions approach dogma. I hope <a href="https://www.reddit.com/r/golang/comments/46bd5h/ama_we_are_the_go_contributors_ask_us_anything/d05yyde/?context=3&st=ixq5hjko&sh=7affd469">the Reddit AMA</a> as well as discussion on <a href="https://groups.google.com/group/golang-nuts">golang-nuts</a> or <a href="http://stackoverflow.com/questions/tagged/go">StackOverflow</a> or the <a href="https://forum.golangbridge.org/">Go Forum</a> or at <a href="https://golang.org/wiki/Conferences">conferences</a> help with that.

<p>
But we need help from everyone. Remember that none of the decisions in Go are infallible; they’re just our best attempts at the time we made them, not wisdom received on stone tablets. If someone asks why Go does X instead of Y, please try to present the engineering reasons fairly, including for Y, and avoid argument solely by appeal to authority. It’s too easy to fall into the “well that’s just not how it’s done here” trap. And now that I know about and watch for that trap, I see it in nearly every technical community, although some more than others.
</content>
|
|||
|
</entry>
|
|||
|
<entry>
|
|||
|
<title>A Tour of Acme</title>
|
|||
|
<id>tag:research.swtch.com,2012:research.swtch.com/acme</id>
|
|||
|
<link rel="alternate" href="http://research.swtch.com/acme"></link>
|
|||
|
<published>2012-09-17T11:00:00-04:00</published>
|
|||
|
<updated>2012-09-17T11:00:00-04:00</updated>
|
|||
|
<summary type="text">A video introduction to Acme, the Plan 9 text editor</summary>
|
|||
|
<content type="html">
<p class="lp">
People I work with recognize my computer easily:
it's the one with nothing but yellow windows and blue bars on the screen.
That's the text editor acme, written by Rob Pike for Plan 9 in the early 1990s.
Acme focuses entirely on the idea of text as user interface.
It's difficult to explain acme without seeing it, though, so I've put together
a screencast explaining the basics of acme and showing a brief programming session.
Remember as you watch the video that the 854x480 screen is quite cramped.
Usually you'd run acme on a larger screen: even my MacBook Air has almost four times
as much screen real estate.
</p>

<center>
<div style="border: 1px solid black; width: 853px; height: 480px;"><iframe width="853" height="480" src="https://www.youtube.com/embed/dP1xVpMPn8M?rel=0" frameborder="0" allowfullscreen></iframe></div>
</center>

<p class=pp>
The video doesn't show everything acme can do, nor does it show all the ways you can use it.
Even small idioms like where you type text to be loaded or executed vary from user to user.
To learn more about acme, read Rob Pike's paper &ldquo;<a href="/acme.pdf">Acme: A User Interface for Programmers</a>&rdquo; and then try it.
</p>

<p class=pp>
Acme runs on most operating systems.
If you use <a href="http://plan9.bell-labs.com/plan9/">Plan 9 from Bell Labs</a>, you already have it.
If you use FreeBSD, Linux, OS X, or most other Unix clones, you can get it as part of <a href="http://swtch.com/plan9port/">Plan 9 from User Space</a>.
If you use Windows, I suggest trying acme as packaged in <a href="http://code.google.com/p/acme-sac/">acme stand alone complex</a>, which is based on the Inferno programming environment.
</p>

<p class=lp><b>Mini-FAQ</b>:
<ul>
<li><i>Q. Can I use scalable fonts?</i> A. On the Mac, yes. If you run <code>acme -f /mnt/font/Monaco/16a/font</code> you get 16-point anti-aliased Monaco as your font, served via <a href="http://swtch.com/plan9port/man/man4/fontsrv.html">fontsrv</a>. If you'd like to add X11 support to fontsrv, I'd be happy to apply the patch.
<li><i>Q. Do I need X11 to build on the Mac?</i> A. No. The build will complain that it cannot build &lsquo;snarfer&rsquo; but it should complete otherwise. You probably don't need snarfer.
</ul>

<p class=pp>
If you're interested in history, the predecessor to acme was called help. Rob Pike's paper &ldquo;<a href="/help.pdf">A Minimalist Global User Interface</a>&rdquo; describes it. See also &ldquo;<a href="/sam.pdf">The Text Editor sam</a>&rdquo;
</p>

<p class=pp>
<i>Correction</i>: the smiley program in the video was written by Ken Thompson.
I got it from Dennis Ritchie, the more meticulous archivist of the pair.
</p>

</content>
|
|||
|
</entry>
|
|||
|
<entry>
|
|||
|
<title>Minimal Boolean Formulas</title>
|
|||
|
<id>tag:research.swtch.com,2012:research.swtch.com/boolean</id>
|
|||
|
<link rel="alternate" href="http://research.swtch.com/boolean"></link>
|
|||
|
<published>2011-05-18T00:00:00-04:00</published>
|
|||
|
<updated>2011-05-18T00:00:00-04:00</updated>
|
|||
|
<summary type="text">Simplify equations with God</summary>
|
|||
|
<content type="html">
<p><style type="text/css">
p { line-height: 150%; }
blockquote { text-align: left; }
pre.alg { font-family: sans-serif; font-size: 100%; margin-left: 60px; }
td, th { padding-left; 5px; padding-right: 5px; vertical-align: top; }
#times td { text-align: right; }
table { padding-top: 1em; padding-bottom: 1em; }
#find td { text-align: center; }
</style>

<p class=lp>
<a href="http://oeis.org/A056287">28</a>. 
That's the minimum number of AND or OR operators
you need in order to write any Boolean function of five variables.
<a href="http://alexhealy.net/">Alex Healy</a> and I computed that in April 2010. Until then,
I believe no one had ever known that little fact.
This post describes how we computed it
and how we almost got scooped by <a href="http://research.swtch.com/2011/01/knuth-volume-4a.html">Knuth's Volume 4A</a>
which considers the problem for AND, OR, and XOR.
</p>

<h3>A Naive Brute Force Approach</h3>

<p class=pp>
Any Boolean function of two variables
can be written with at most 3 AND or OR operators: the parity function
on two variables X XOR Y is (X AND Y') OR (X' AND Y), where X' denotes
&ldquo;not X.&rdquo; We can shorten the notation by writing AND and OR
like multiplication and addition: X XOR Y = X*Y' + X'*Y.
</p>

<p class=pp>
For three variables, parity is also a hardest function, requiring 9 operators:
X XOR Y XOR Z = (X*Z'+X'*Z+Y')*(X*Z+X'*Z'+Y).
</p>

<p class=pp>
For four variables, parity is still a hardest function, requiring 15 operators:
W XOR X XOR Y XOR Z = (X*Z'+X'*Z+W'*Y+W*Y')*(X*Z+X'*Z'+W*Y+W'*Y').
</p>

<p class=pp>
The sequence so far prompts a few questions. Is parity always a hardest function?
Does the minimum number of operators alternate between 2<sup>n</sup>&#8722;1 and 2<sup>n</sup>+1?
</p>

<p class=pp>
I computed these results in January 2001 after hearing
the problem from Neil Sloane, who suggested it as a variant
of a similar problem first studied by Claude Shannon.
</p>

<p class=pp>
The program I wrote to compute a(4) computes the minimum number of
operators for every Boolean function of n variables
in order to find the largest minimum over all functions.
There are 2<sup>4</sup> = 16 settings of four variables, and each function
can pick its own value for each setting, so there are 2<sup>16</sup> different
functions. To make matters worse, you build new functions
by taking pairs of old functions and joining them with AND or OR.
2<sup>16</sup> different functions means 2<sup>16</sup>&#183;2<sup>16</sup> = 2<sup>32</sup> pairs of functions.
</p>

<p class=pp>
The program I wrote was a mangling of the Floyd-Warshall
all-pairs shortest paths algorithm. That algorithm is:
</p>

<pre class="indent alg">
// Floyd-Warshall all pairs shortest path
func compute():
 for each node i
 for each node j
 dist[i][j] = direct distance, or &#8734;
 
 for each node k
 for each node i
 for each node j
 d = dist[i][k] + dist[k][j]
 if d &lt; dist[i][j]
 dist[i][j] = d
 return
</pre>

<p class=lp>
The algorithm begins with the distance table dist[i][j] set to
an actual distance if i is connected to j and infinity otherwise.
Then each round updates the table to account for paths
going through the node k: if it&#
|
|||
|
</entry>
|
|||
|
<entry>
|
|||
|
<title>Zip Files All The Way Down</title>
|
|||
|
<id>tag:research.swtch.com,2012:research.swtch.com/zip</id>
|
|||
|
<link rel="alternate" href="http://research.swtch.com/zip"></link>
|
|||
|
<published>2010-03-18T00:00:00-04:00</published>
|
|||
|
<updated>2010-03-18T00:00:00-04:00</updated>
|
|||
|
<summary type="text">Did you think it was turtles?</summary>
|
|||
|
<content type="html">
<p><p class=lp>
Stephen Hawking begins <i><a href="http://www.amazon.com/-/dp/0553380168">A Brief History of Time</a></i> with this story:
</p>

<blockquote>
<p class=pp>
A well-known scientist (some say it was Bertrand Russell) once gave a public lecture on astronomy. He described how the earth orbits around the sun and how the sun, in turn, orbits around the center of a vast collection of stars called our galaxy. At the end of the lecture, a little old lady at the back of the room got up and said: &ldquo;What you have told us is rubbish. The world is really a flat plate supported on the back of a giant tortoise.&rdquo; The scientist gave a superior smile before replying, &ldquo;What is the tortoise standing on?&rdquo; &ldquo;You're very clever, young man, very clever,&rdquo; said the old lady. &ldquo;But it's turtles all the way down!&rdquo;
</p>
</blockquote>

<p class=lp>
Scientists today are pretty sure that the universe is not actually turtles all the way down,
but we can create that kind of situation in other contexts.
For example, here we have <a href="http://www.youtube.com/watch?v=Y-gqMTt3IUg">video monitors all the way down</a>
and <a href="http://www.amazon.com/gp/customer-media/product-gallery/0387900926/ref=cm_ciu_pdp_images_all">set theory books all the way down</a>,
and <a href="http://blog.makezine.com/archive/2009/01/thousands_of_shopping_carts_stake_o.html">shopping carts all the way down</a>.
</p>

<p class=pp>
And here's a computer storage equivalent: 
look inside <a href="http://swtch.com/r.zip"><code>r.zip</code></a>.
It's zip files all the way down:
each one contains another zip file under the name <code>r/r.zip</code>.
(For the die-hard Unix fans, <a href="http://swtch.com/r.tar.gz"><code>r.tar.gz</code></a> is
gzipped tar files all the way down.)
Like the line of shopping carts, it never ends,
because it loops back onto itself: the zip file contains itself!
And it's probably less work to put together a self-reproducing zip file
than to put together all those shopping carts,
at least if you're the kind of person who would read this blog.
This post explains how.
</p>

<p class=pp>
Before we get to self-reproducing zip files, though,
we need to take a brief detour into self-reproducing programs.
</p>

<h3>Self-reproducing programs</h3>

<p class=pp>
The idea of self-reproducing programs dates back to the 1960s.
My favorite statement of the problem is the one Ken Thompson gave in his 1983 Turing Award address:
</p>

<blockquote>
<p class=pp>
In college, before video games, we would amuse ourselves by posing programming exercises. One of the favorites was to write the shortest self-reproducing program. Since this is an exercise divorced from reality, the usual vehicle was FORTRAN. Actually, FORTRAN was the language of choice for the same reason that three-legged races are popular.
</p>

<p class=pp>
More precisely stated, the problem is to write a source program that, when compiled and executed, will produce as output an exact copy of its source. If you have never done this, I urge you to try it on your own. The discovery of how to do it is a revelation that far surpasses any benefit obtained by being told how to do it. The part about &ldquo;shortest&rdquo; was just an incentive to demonstrate skill and determine a winner.
</p>
</blockquote>

<p class=lp>
<b>Spoiler alert!</b>
I agree: if you have never done this, I urge you to try it on your own.
The internet
|
|||
|
</entry>
|
|||
|
<entry>
|
|||
|
<title>UTF-8: Bits, Bytes, and Benefits</title>
|
|||
|
<id>tag:research.swtch.com,2012:research.swtch.com/utf8</id>
|
|||
|
<link rel="alternate" href="http://research.swtch.com/utf8"></link>
|
|||
|
<published>2010-03-05T00:00:00-05:00</published>
|
|||
|
<updated>2010-03-05T00:00:00-05:00</updated>
|
|||
|
<summary type="text">The reasons to switch to UTF-8</summary>
|
|||
|
<content type="html">
<p><p class=pp>
UTF-8 is a way to encode Unicode code points&#8212;integer values from
0 through 10FFFF&#8212;into a byte stream,
and it is far simpler than many people realize.
The easiest way to make it confusing or complicated
is to treat it as a black box, never looking inside.
So let's start by looking inside. Here it is:
</p>

<center>
<table cellspacing=5 cellpadding=0 border=0>
<tr height=10><th colspan=4></th></tr>
<tr><th align=center colspan=2>Unicode code points</th><th width=10><th align=center>UTF-8 encoding (binary)</th></tr>
<tr height=10><td colspan=4></td></tr>
<tr><td align=right>00-7F</td><td>(7 bits)</td><td></td><td align=right>0<i>tuvwxyz</i></td></tr>
<tr><td align=right>0080-07FF</td><td>(11 bits)</td><td></td><td align=right>110<i>pqrst</i>&nbsp;10<i>uvwxyz</i></td></tr>
<tr><td align=right>0800-FFFF</td><td>(16 bits)</td><td></td><td align=right>1110<i>jklm</i>&nbsp;10<i>npqrst</i>&nbsp;10<i>uvwxyz</i></td></tr>
<tr><td align=right valign=top>010000-10FFFF</td><td>(21 bits)</td><td></td><td align=right valign=top>11110<i>efg</i>&nbsp;10<i>hijklm</i> 10<i>npqrst</i>&nbsp;10<i>uvwxyz</i></td>
<tr height=10><td colspan=4></td></tr>
</table>
</center>

<p class=lp>
The convenient properties of UTF-8 are all consequences of the choice of encoding.
</p>

<ol>
<li><i>All ASCII files are already UTF-8 files.</i><br>
The first 128 Unicode code points are the 7-bit ASCII character set,
and UTF-8 preserves their one-byte encoding.
</li>

<li><i>ASCII bytes always represent themselves in UTF-8 files. They never appear as part of other UTF-8 sequences.</i><br>
All the non-ASCII UTF-8 sequences consist of bytes
with the high bit set, so if you see the byte 0x7A in a UTF-8 file,
you can be sure it represents the character <code>z</code>.
</li>

<li><i>ASCII bytes are always represented as themselves in UTF-8 files. They cannot be hidden inside multibyte UTF-8 sequences.</i><br>
The ASCII <code>z</code> 01111010 cannot be encoded as a two-byte UTF-8 sequence
11000001 10111010</code>. Code points must be encoded using the shortest
possible sequence.
A corollary is that decoders must detect long-winded sequences as invalid.
In practice, it is useful for a decoder to use the Unicode replacement
character, code point FFFD, as the decoding of an invalid UTF-8 sequence
rather than stop processing the text.
</li>

<li><i>UTF-8 is self-synchronizing.</i><br>
Let's call a byte of the form 10<i>xxxxxx</i>
a continuation byte.
Every UTF-8 sequence is a byte that is not a continuation byte
followed by zero or more continuation bytes.
If you start processing a UTF-8 file at an arbitrary point,
you might not be at the beginning of a UTF-8 encoding,
but you can easily find one: skip over
continuation bytes until you find a non-continuation byte.
(The same applies to scanning backward.)
</li>

<li><i>Substring search is just byte string search.</i><br>
Properties 2, 3, and 4 imply that given a string
of correctly encoded UTF-8, the only way those bytes
can appear in a larger UTF-8 text is when they represent
|
|||
|
</entry>
|
|||
|
<entry>
|
|||
|
<title>Computing History at Bell Labs</title>
|
|||
|
<id>tag:research.swtch.com,2012:research.swtch.com/bell-labs</id>
|
|||
|
<link rel="alternate" href="http://research.swtch.com/bell-labs"></link>
|
|||
|
<published>2008-04-09T00:00:00-04:00</published>
|
|||
|
<updated>2008-04-09T00:00:00-04:00</updated>
|
|||
|
<summary type="text">Doug McIlroy's rememberances</summary>
|
|||
|
<content type="html">
<p><p class=pp>
In 1997, on his retirement from Bell Labs, <a href="http://www.cs.dartmouth.edu/~doug/">Doug McIlroy</a> gave a
fascinating talk about the &ldquo;<a href="https://web.archive.org/web/20081022192943/http://cm.bell-labs.com/cm/cs/doug97.html"><b>History of Computing at Bell Labs</b></a>.&rdquo;
Almost ten years ago I transcribed the audio but never did anything with it.
The transcript is below.
</p>

<p class=pp>
My favorite parts of the talk are the description of the bi-quinary decimal relay calculator
and the description of a team that spent over a year tracking down a race condition bug in
a missile detector (reliability was king: today you'd just stamp
&ldquo;cannot reproduce&rdquo; and send the report back).
But the whole thing contains many fantastic stories.
It's well worth the read or listen.
I also like his recollection of programming using cards: &ldquo;It's the kind of thing you can be nostalgic about, but it wasn't actually fun.&rdquo;
</p>


<p class=pp>
For more information, Bernard D. Holbrook and W. Stanley Brown's 1982
technical report

&ldquo;<a href="cstr99.pdf">A History of Computing Research at Bell Laboratories (1937-1975)</a>&rdquo;
covers the earlier history in more detail.
</p>

<p><i>Corrections added August 19, 2009. Links updated May 16, 2018.</i></p>

<br>
<br>

<p class=lp><i>Transcript of &ldquo;<a href="https://web.archive.org/web/20081022192943/http://cm.bell-labs.com/cm/cs/doug97.html">History of Computing at Bell Labs:</a>&rdquo;</i></p>

<p class=pp>
Computing at Bell Labs is certainly an outgrowth of the
<a href="https://web.archive.org/web/20080622172015/http://cm.bell-labs.com/cm/ms/history/history.html">mathematics department</a>, which grew from that first hiring
in 1897, G A Campbell. When Bell Labs was formally founded
in 1925, what it had been was the engineering department
of Western Electric.
When it was formally founded in 1925,
almost from the beginning there was a math department with Thornton Fry as the department head, and if you look at some of Fry's work, it turns out that
he was fussing around in 1929 with trying to discover
information theory. It didn't actually gel until twenty years later with Shannon.</p>

<p class=pp><span style="font-size: 0.7em;">1:10</span>
Of course, most of the mathematics at that time was continuous.
One was interested in analyzing circuits and propagation. And indeed, this is what led to the growth of computing in Bell Laboratories. The computations could not all be done symbolically. There were not closed form solutions. There was lots of numerical computation done.
The math department had a fair stable of computers,
which in those days meant people. [laughter]</p>

<p class=pp><span style="font-size: 0.7em;">2:00</span>
And in the late '30s, <a href="http://en.wikipedia.org/wiki/George_Stibitz">George Stibitz</a> had an idea that some of
the work that they were doing on hand calculators might be
automated by using some of the equipment that the Bell System
was installing in central offices, namely relay circuits.
He went home, and on his kitchen table, he built out of relays
a binary arithmetic circuit. He decided that binary was really
the right way to compute.
However, when he finally came to build some equipment,
he determined that binary to decimal conversion and
decimal to binary conversion was a drag, and he didn't
want to put it in the equipment, and so he finally bu
|
|||
|
</entry>
|
|||
|
<entry>
|
|||
|
<title>Using Uninitialized Memory for Fun and Profit</title>
|
|||
|
<id>tag:research.swtch.com,2012:research.swtch.com/sparse</id>
|
|||
|
<link rel="alternate" href="http://research.swtch.com/sparse"></link>
|
|||
|
<published>2008-03-14T00:00:00-04:00</published>
|
|||
|
<updated>2008-03-14T00:00:00-04:00</updated>
|
|||
|
<summary type="text">An unusual but very useful data structure</summary>
|
|||
|
<content type="html">
<p><p class=lp>
This is the story of a clever trick that's been around for
at least 35 years, in which array values can be left
uninitialized and then read during normal operations,
yet the code behaves correctly no matter what garbage
is sitting in the array.
Like the best programming tricks, this one is the right tool for the 
job in certain situations.
The sleaziness of uninitialized data
access is offset by performance improvements:
some important operations change from linear 
to constant time.
</p>

<p class=pp>
Alfred Aho, John Hopcroft, and Jeffrey Ullman's 1974 book 
<i>The Design and Analysis of Computer Algorithms</i>
hints at the trick in an exercise (Chapter 2, exercise 2.12):
</p>

<blockquote>
Develop a technique to initialize an entry of a matrix to zero
the first time it is accessed, thereby eliminating the <i>O</i>(||<i>V</i>||<sup>2</sup>) time
to initialize an adjacency matrix.
</blockquote>

<p class=lp>
Jon Bentley's 1986 book <a href="http://www.cs.bell-labs.com/cm/cs/pearls/"><i>Programming Pearls</i></a> expands
on the exercise (Column 1, exercise 8; <a href="http://www.cs.bell-labs.com/cm/cs/pearls/sec016.html">exercise 9</a> in the Second Edition):
</p>

<blockquote>
One problem with trading more space for less time is that 
initializing the space can itself take a great deal of time.
Show how to circumvent this problem by designing a technique
to initialize an entry of a vector to zero the first time it is
accessed. Your scheme should use constant time for initialization
and each vector access; you may use extra space proportional
to the size of the vector. Because this method reduces 
initialization time by using even more space, it should be
considered only when space is cheap, time is dear, and 
the vector is sparse.
</blockquote>

<p class=lp>
Aho, Hopcroft, and Ullman's exercise talks about a matrix and 
Bentley's exercise talks about a vector, but for now let's consider
just a simple set of integers.
</p>

<p class=pp>
One popular representation of a set of <i>n</i> integers ranging
from 0 to <i>m</i> is a bit vector, with 1 bits at the
positions corresponding to the integers in the set.
Adding a new integer to the set, removing an integer
from the set, and checking whether a particular integer
is in the set are all very fast constant-time operations
(just a few bit operations each).
Unfortunately, two important operations are slow:
iterating over all the elements in the set 
takes time <i>O</i>(<i>m</i>), as does clearing the set.
If the common case is that 
<i>m</i> is much larger than <i>n</i>
(that is, the set is only sparsely
populated) and iterating or clearing the set 
happens frequently, then it could be better to
use a representation that makes those operations
more efficient. That's where the trick comes in.
</p>

<p class=pp>
Preston Briggs and Linda Torczon's 1993 paper,
&ldquo;<a href="http://citeseer.ist.psu.edu/briggs93efficient.html"><b>An Efficient Representation for Sparse Sets</b></a>,&rdquo;
describes the trick in detail.
Their solution represents the sparse set using an integer
array named <code>dense</code> and an integer <code>n</code>
that counts the number of elements in <code>dense</code>.
The <i>dense</i> array is simply a packed list of the elements in the
set, stored in order of insertion.
If the set contains the elements 5, 1, and 4,
|
|||
|
</entry>
|
|||
|
<entry>
|
|||
|
<title>Play Tic-Tac-Toe with Knuth</title>
|
|||
|
<id>tag:research.swtch.com,2012:research.swtch.com/tictactoe</id>
|
|||
|
<link rel="alternate" href="http://research.swtch.com/tictactoe"></link>
|
|||
|
<published>2008-01-25T00:00:00-05:00</published>
|
|||
|
<updated>2008-01-25T00:00:00-05:00</updated>
|
|||
|
<summary type="text">The only winning move is not to play.</summary>
|
|||
|
<content type="html">
<p><p class=lp>Section 7.1.2 of the <b><a href="http://www-cs-faculty.stanford.edu/~knuth/taocp.html#vol4">Volume 4 pre-fascicle 0A</a></b> of Donald Knuth's <i>The Art of Computer Programming</i> is titled &#8220;Boolean Evaluation.&#8221; In it, Knuth considers the construction of a set of nine boolean functions telling the correct next move in an optimal game of tic-tac-toe. In a footnote, Knuth tells this story:</p>

<blockquote><p class=lp>This setup is based on an exhibit from the early 1950s at the Museum of Science and Industry in Chicago, where the author was first introduced to the magic of switching circuits. The machine in Chicago, designed by researchers at Bell Telephone Laboratories, allowed me to go first; yet I soon discovered there was no way to defeat it. Therefore I decided to move as stupidly as possible, hoping that the designers had not anticipated such bizarre behavior. In fact I allowed the machine to reach a position where it had two winning moves; and it seized <i>both</i> of them! Moving twice is of course a flagrant violation of the rules, so I had won a moral victory even though the machine had announced that I had lost.</p></blockquote>

<p class=lp>
That story alone is fairly amusing. But turning the page, the reader finds a quotation from Charles Babbage's <i><a href="http://onlinebooks.library.upenn.edu/webbin/book/lookupid?key=olbp36384">Passages from the Life of a Philosopher</a></i>, published in 1864:</p>

<blockquote><p class=lp>I commenced an examination of a game called &#8220;tit-tat-to&#8221; ... to ascertain what number of combinations were required for all the possible variety of moves and situations. I found this to be comparatively insignificant. ... A difficulty, however, arose of a novel kind. When the automaton had to move, it might occur that there were two different moves, each equally conducive to his winning the game. ... Unless, also, some provision were made, the machine would attempt two contradictory motions.</p></blockquote>

<p class=lp>
The only real winning move is not to play.</p></p>






</content>
|
|||
|
</entry>
|
|||
|
<entry>
|
|||
|
<title>Crabs, the bitmap terror!</title>
|
|||
|
<id>tag:research.swtch.com,2012:research.swtch.com/crabs</id>
|
|||
|
<link rel="alternate" href="http://research.swtch.com/crabs"></link>
|
|||
|
<published>2008-01-09T00:00:00-05:00</published>
|
|||
|
<updated>2008-01-09T00:00:00-05:00</updated>
|
|||
|
<summary type="text">A destructive, pointless violation of the rules</summary>
|
|||
|
<content type="html">
<p><p class=lp>Today, window systems seem as inevitable as hierarchical file systems, a fundamental building block of computer systems. But it wasn't always that way. This paper could only have been written in the beginning, when everything about user interfaces was up for grabs.</p>

<blockquote><p class=lp>A bitmap screen is a graphic universe where windows, cursors and icons live in harmony, cooperating with each other to achieve functionality and esthetics. A lot of effort goes into making this universe consistent, the basic law being that every window is a self contained, protected world. In particular, (1) a window shall not be affected by the internal activities of another window. (2) A window shall not be affected by activities of the window system not concerning it directly, i.e. (2.1) it shall not notice being obscured (partially or totally) by other windows or obscuring (partially or totally) other windows, (2.2) it shall not see the <i>image</i> of the cursor sliding on its surface (it can only ask for its position).</p>

<p class=pp>
Of course it is difficult to resist the temptation to break these rules. Violations can be destructive or non-destructive, useful or pointless. Useful non-destructive violations include programs printing out an image of the screen, or magnifying part of the screen in a <i>lens</i> window. Useful destructive violations are represented by the <i>pen</i> program, which allows one to scribble on the screen. Pointless non-destructive violations include a magnet program, where a moving picture of a magnet attracts the cursor, so that one has to continuously pull away from it to keep working. The first pointless, destructive program we wrote was <i>crabs</i>.</p>
</blockquote>

<p class=lp>As the crabs walk over the screen, they leave gray behind, &#8220;erasing&#8221; the apps underfoot:</p>
<blockquote><img src="http://research.swtch.com/crabs1.png">
</blockquote>
<p class=lp>
For the rest of the story, see Luca Cardelli's &#8220;<a style="font-weight: bold;" href="http://lucacardelli.name/Papers/Crabs.pdf">Crabs: the bitmap terror!</a>&#8221; (6.7MB). Additional details in &#8220;<a href="http://lucacardelli.name/Papers/Crabs%20%28History%20and%20Screen%20Dumps%29.pdf">Crabs (History and Screen Dumps)</a>&#8221; (57.1MB).</p></p>






</content>
|
|||
|
</entry>
|
|||
|
</feed>
|