Published by marco on
As Microsoft did a couple of years ago, Apple’s language designers are also designing the next version of Swift in public.[1] One example of the new design is the discussion of String Processing For Swift 4 (GitHub). If you read through the relatively long document, you can at least see that they’re giving the API design a tremendous amount of thought.
There are so many factors to weigh when building the API, especially for a low-level construct like String
.
String
API with a bunch of overloads? (E.g. the discussion of storage for sub-strings.)Strings
are actually structs
rather than classes
.)Array
?String
be a Collection
? If so, what is the default item-type?Character
have the same or a similar API as a String
? (E.g. why can’t you get the sub-structure of the grapheme cluster for a character without first casting it to a String
?)A good example is the discussion of how to represent string slices: should there be a separate type, called Substring
, analogous to the ArraySlice
that already exists for an Array
?
“Long-term storage of Substring instances is discouraged. A substring holds a reference to the entire storage of a larger string, not just to the portion it presents, even after the original string’s lifetime ends.
“[…]
“The downside of having two types is the inconvenience of sometimes having a Substring when you need a String, and vice-versa. It is likely this would be a significantly bigger problem than with Array and ArraySlice, as slicing of String is such a common operation. It is especially relevant to existing code that assumes String is the currency type – that is, the default string type used for everyday exchange between APIs. To ease the pain of type mismatches, Substring should be a subtype of String in the same way that Int is a subtype of Optional<Int>.”
Collection
or not?For those that watch as the API for Swift evolves from one major version to another—with each change introducing non–backward-compatible incompatibilities—this document should hopefully reassure them that the changes are not made lightly. It may seem like the designers don’t have a plan, but, over the years, designers and opinions change. E.g. Witness the discussion of what the default representation of the string should be.
“[…] in Swift 1.0, String was a collection of Character (extended grapheme clusters). […] In Swift 2.0, String’s Collection conformance was dropped, because we convinced ourselves that its semantics differed from those of Collection too significantly.”
After listing several reasons why the change in Swift 2.0 was not a good direction, they conclude that in 4.0, they should revert to the original behavior.
“It would be much better to legitimize the conformance to Collection and simply document the oddity of any concatenation corner-cases, than to deny users the benefits on the grounds that a few cases are confusing.”
Again, the discussion is open and public and, despite the claims of some who think that they’re just a bunch of cowboys changing stuff willy-nilly, they have a documented plan.
It’s unfortunate that it took them so long to get there, but this kind of design isn’t always easy.
Because Swift uses Unicode grapheme clusters as the default “items” view for strings, the discussion of string indices might seem unnecessarily abstract for developers coming from other languages, where the index is always an int
int bytes
.
“String currently has four views–characters, unicodeScalars, utf8, and utf16 […]”
Because of these different views, it’s necessary to discuss how to reduce API surface by consolidating the various index types used to refer to individual elements in these different “views” on a String
.
It’s not like C#—and most other mainstream languages—have anything to brag about with their string-handling. In that respect, even Swift 1 and 2 are light-years ahead in Unicode correctness with their focus on grapheme clusters rather than the utterly nonsensical 90s-era bytes
still used in those other languages.
The Guidance for API Designers shows how they try to build the API so that it makes sense for callers.
“A Substring passed where String is expected will be implicitly copied. When compared to the “same type, copied storage” model, we have effectively deferred the cost of copying from the point where a substring is created until it must be converted to String for use with an API.
“A user who needs to optimize away copies altogether should use this guideline: if for performance reasons you are tempted to add a Range argument to your method as well as a String to avoid unnecessary copies, you should instead use Substring.”
Their goal is noble, though it’s unclear to what degree the vision can be realized. The following citation could be written as the high-level goal of any API.
“We should represent these aspects as orthogonal, composable components, abstracting pattern matchers into a protocol like this one, that can allow us to define logical operations once, without introducing overloads, and massively reducing API surface area.”