Intersections |
Intersections@rob_rix, mostly on code. |
While implementing a type inferencing algorithm, I ran into a situation where I wanted to perform some operation on two sub-structures, returning the error if one or the other failed, combining and returning both errors if they both failed, or performing another operation if both succeeded (and returning its result).
This is too simple and common a situation for someone not to have named it and found best practices for it already, so I sketched the outline and tweeted about it. To my surprise, while there were a lot of responses along the lines of “it’s sort of like x,” there wasn’t a clear indication that e.g. the thing I am doing “is called y, look it up on hoogle.”
The thing is, it is sort of like x, for a number of different values of x. Combining two values representing success and failure into a third value representing success and failure is common; so what is it?
The group intuition was correct: it does have a familiar pattern. For example, it’s basically the same as &&! Sure enough, if you look at the truth table for logical conjunction (which && implements), it’s exactly what I just described:
However, the result I want isn’t just pass/fail: I want the errors in the 👎 case, and the result in the 👍 case. Still, the similarity with && doesn’t end with the truth table: it turns out that it’s evident in the types, as well.
Ignoring Swift’s @autoclosure trick for short-circuiting the right-hand side, &&’s type looks like this:
That is, it’s a closed binary function over the Bools. That sounds like I want, too: it should take two success/failure results and return a single success/failure result.
I’m using Either to represent success/failure. In this use, it’s equivalent to Result, which you might be more familiar with. Using Either to describe the operation’s type, we get:
That is, it takes two results, and returns a single result, combined… somehow. Knowing what goes in and comes out doesn’t make implementing it exactly trivial, but having a point of comparison in the form of && gives us a hint.
If we had to implement && ourselves, we might use a switch statement to pattern match against the operands as a pair:
Simple: in the first three cases, we just return false; in the last, we return true.
We can use the same approach with Either, with some adjustments to account for the differences between Either and Bool. Let’s call this function over Eithers combine, for the moment, for lack of a better name:
(Note that when using Either to represent an operation’s result, the convention is that “Right is right”—that is, failure is represented by the Left case, and success by the Right case.)
There are indeed a few small changes from what we had for Bool:
We return the failure, a and b respectively, in the first two cases. Either contains more interesting values than Bool does, so this makes sense: we want to return the specific error that occurred.
In the third case, we use a hypothetical failure function to combine two Errors into a new Error and pack it up into a Left.
In the fourth case, we use a hypothetical success function to combine two Ts into a new T and pack it up into a Right.
Note that both the success and failure methods have the same exact shape as &&—(Error, Error) -> Error and (T, T) -> T respectively. These closed binary functions are catching on!
Now, let’s fill in the blanks. The failure function is pretty easy: we defined the Error type, so we get to say what can be done with it. We could arbitrarily pick one error or the other, but I want to combine the two into a new error containing all the data of the original two.
My Errors are represented as a tree, making it a simple instance of Composite Pattern:
We can implement the failure function combining two Errors together by adding leaves to branches and concatenating branches:
Look familiar? It’s the same basic shape as && and combine.
What about success?
success is more difficult. We don’t know what type T is, so we require a generic way of producing a T given (T, T). There are two and only two generic ways to produce a T given (T, T): we could return the first one, or we could return the second.
That’s unfortunate: we wanted to use both values, not discard one or the other of them arbitrarily. At the time, my solution was to add a parameter to combine, a function of type (T, T) -> T:
Ultimately, the type of T is specified by the caller, so it’s free to pass in whatever function it wants. (For example, if T were Int it could use +.) There! Solved.
It’s somewhat dissatisfying, though. The same basic shape of combine is duplicated in failure, and looking more closely, we see it in ==, too:
We’re using default here to abbreviate things a little, but nonetheless, it’s the same basic shape: pattern matching over two enumerations at the same time. Every time I write == for an enum it looks almost exactly like this. Maybe there’s something more to that.
Stepping back for a moment, pattern matching in Swift leaves me wanting more. It’s syntactically noisy, it’s easy to make trivial mistakes with (oops, I forgot let again!), and it’s a statement, not an expression, which means a bunch of return statements or assignments to shove values through it.
When confronted with a distasteful idiom, I rebel. And, truly, there is a problem here: switch is the standard way to get values into and out of an enum, but it’s so syntactically heavy that it obscures the intent. If a programmer needs to extract the error from Either<Error, T> (if any) and pass it to a function, she shouldn’t have to resort to this:
Extracting the success or error value from an Either is pretty common work, so we want to factor it. Either does just that:
That simplifies our caller’s task to something much more palatable:
Convenient though this is, it’s lost some of the power of a fully-operational switch statement. If our caller wants to pass errors off to didFailWithError and pass success values to didSucceedWithValue, she’d have to write:
That’s not much of an improvement on writing out the full switch statement:
Haskell’s Either type has an answer for this, in the form of the imaginatively-named either function. either takes two functions, applying the first to the value of the Left constructor, or applying the second to the value of the Right constructor, whichever is applicable. Both of these functions must return the same type, therefore, and either will return that type as well.
In Swift, an either method would have the following signature:
When reading about Lenses or Prisms or even Continuations, one sometimes encounters a discussion of some value or computation “with a hole in it,” with the Lens, Prism, or Continuation filling that hole by providing a value or computation to fill in the blank.
Our either method has that sort of a pattern to it, too: the caller plugs the “holes” by providing functions which fill in the blanks. The full implementation looks like this:
Now we can simplify our left and right properties:
(We could also write id for { x in x } and const(nil) for { _ in nil }, but I wanted to emphasize the triviality of the functions themselves. Much like other complex-sounding functional programming terms, you could have invented these, and maybe you already have.)
Even better, our caller’s job is simplified to the extent that she doesn’t even have to write an if statement:
We’ve recovered all of the power of switch for this purpose, with much less boilerplate, and proportionately greater clarity of intent: errors and successes are routed to didFailWithError and didSucceedWithValue respectively, and the result is returned.
Even better, either avoids a little white lie which I’ve been telling you all along: as of Swift 1.2, enums with parameterized types can’t store the values directly; you have to wrap them in a reference type. If you’ve ever encountered this error, you’ll recognize what I’m talking about:
error: unimplemented IR generation feature non-fixed multi-payload enum layout
(This is Swift-speak for “we haven’t got to this yet.”)
Therefore, Either’s constructors both wrap the types in Box, a class which simply forms a reference to another type. Either actually looks like this:
Therefore, every time you use switch to pattern match against Either, you also have to extract the boxed value by calling .value. Don’t ask me how many times I’ve forgotten it, only to be rewarded by some meditation of the compiler’s:
error: cannot invoke 'f' with an argument list of type '((Box<T>))'
Or worse:
error: could not find an overload for '==' that accepts the supplied arguments
either takes care of all of that: the functions you pass it receive T and U directly, not boxes containing it. Our caller needn’t be aware that Box is involved at all, which makes it a much more palatable implementation detail than it would otherwise have been.
I’ve referred to Swift’s switch statements as “pattern matching” a couple of times now. Another term for this particular application of it—extracting the values of an enums cases—is “case analysis.” Every time I write an enum—and I try to write enums as often as possible—I now give it an analysis method modelled on either.
For example, my type inferencing algorithm has an enum named Type, with four cases and an analysis method:
(It has a lot of other methods and properties too, but it’s telling that every other one calls analysis!)
Reflecting on either, I wondered if I could simplify the combine function by using case analysis. We’d have to nest the analysis of one Either inside the analysis of the other, but at least we wouldn’t need to deal with switch’s shortcomings. Here’s what I came up with:
(Either.left and Either.right—note the use of lowercase!—are static functions which construct values without having to Box them first. Note also that I’m returning a tuple of the values instead of passing them to any sort of success function; and as such, we no longer require the success values to be of the same type.)
This really isn’t what I’d hoped for. There’s still the same four branches, only now there’s that much less context for which line applies to which. (This is largely due to an error in either’s design—unlike Type.analysis(), above, Either.either doesn’t have external labels for its parameters, leaving them tragically inscrutable.) There’s less boilerplate, but not by much.
In hindsight, I shouldn’t have been surprised that it would have turned out as little an improvement over switch (a, b) {…} as result.left and result.right individually were over switch result {…}. Just like result.left and result.right capture only part of the generality of switch as applied to a single Either, the either method captures only part of the generality of switch as applied to two of them.
That was the last time I thought about it for a couple of weeks; the yak being thoroughly shaven, I moved on to other tasks.
This evening, implementing unification of equality constraints over types, I ran into an eerily familiar situation: given two types, I wanted to descend through them side by side insofar as their structure allows.
For example, if I have a function type on the left, and a function type on the right, then I wanted to do some work with the function types themselves, then do the same process with their parameter types, and then do the same process with their return types, recursively.
If, on the other hand, I have a type variable on one side and a function type on the other, I want to operate on just the two of them: there’s no structural equivalence to recurse through.
That makes this operation a little bit like a zip, and more than a little bit like ==. The obvious implementation is to write a switch statement over all of the possible combinations of cases. For Either that was only four; not too bad, right?
Swift’s enums are an example of “sum” types, while its tuples and structs are “product” types. The short explanation for “sum” and “product” is a symmetry between them and the + and × operators in everyday arithmetic: if you think of a type as being occupied by a set of distinct values, then sums and products of that type will add and multiply the number of distinct values possible.
Case in point, Bool has a mere two values: true and false. (Unlike objects, it makes no sense to ask which true or which false one might be referring to; Bools, like all value types, are fungible, i.e. completely interchangeable with no notion of identity or distinction, like two dollars in a bank account; if you withdraw one dollar, “which dollar?” is a meaningless question.) Void has only one: ().
You could represent Swift’s Bool type with an enum. (Swift itself doesn’t, but that’s an implementation detail.)
A case with no values is equivalent to the same case with a Void value associated with it. Void’s only has one value, and enums sum the possible values of their constructors. Therefore this Bool, just like Swift’s own, has only two values: it can be True, or it can be False
A tuple, like the ones we used to pattern match against in combine, failure, and ==, is the cartesian product of its elements. Instead of being either true or false, (Bool, Bool) is the product of true or false and true or false. Therefore, (Bool, Bool) can have four distinct values:
Either, much like Bool and Error, has two cases. Each case corresponds to a single pattern to be matched in a switch statement, or a single function passed to the case analysis method. Therefore, a switch statement over a pair of Either values will require the four branches which we’ve encountered again and again:
Left, Right)Right, Left)Left, Left)Right, Right)That’s the product of two cases (Left and Right) and two cases (Left and Right): four.
Moving back to Type, the == function is deceptively simple:
Why is it “deceptively” simple? There are a lot of branches unaccounted for, tucked away behind that nice convenient default. We don’t explicitly handle (.Base, .Variable), (.Function, .Universal), or any of the other combinations of cases.
This is another aspect of switch’s power and generality which isn’t captured by straightforward case analysis methods. With four cases on each side of the tuple, there’s a full sixteen we’d need to tackle.
Suddenly the side-by-side recursion problem looks more daunting. I was so overwhelmed with the branchiness of the problem that I couldn’t deal with the orthogonal problem of how to combine the values to return them, and so my first draft is a stateful function, pairwise:
This code is bad, and I feel bad. Even with the ifBase:, ifVariable:, etc. labels for the functions, this is much, much worse than the corresponding switch statement. I want an n + m solution, but ended up with an n × m one. That comes out in the wash when n = m = 2, but the more cases, the worse it gets—and my Type doesn’t have sum or product types yet!
If only I could turn analysis inside out somehow, or have it only apply to one branch. I don’t want to handle every combination of t1 and t2, I only want to handle the equal ones. I considered overloading analysis with one variant per case; that would yield something like:
But why go to all the trouble of writing a function if I’m just going to ignore its parameters? Why not just return the values, wrapped in Optional, of course, and combine those somehow? Optional, after all, has a mere two cases.
This whole time, I’d been unable to shake this feeling that there’s some symmetry between Either and Bool that perhaps Haskell knows something about. After some discussion with Patrick Thomson, I realized that Justin Spahr-Summers hit the nail on the head with my original question by drawing a parallel between my original combine function and Haskell’s &&& operator on Arrows.
Ignoring the details of Arrow, &&& implements “fanout”—it applies two different computations to an input value, and combines their results. On the surface, that doesn’t sound like what we want at all—we have two input values, and we want to apply a single computation!—but values and computations are not really all that distinct.
Haskell’s version of Optional is Maybe. Maybe is a Monad, Arrow’s less enlightened cousin, but while there are many tragic analogies for what Monads are, the core detail is that Monads abstract a computation. If Maybe abstracts a computation—even a trivial one, like “is this nil or not?”—then so does Optional.
Even better, we already have a way to get Optionals representing our branches: properties returning the cases’ values wrapped in Optional. The left and right properties we saw above are examples of this. Type’s properties look like this:
Now, assuming we can find an &&& operator for Optional, we can combine these cases arbitrarily. We can combine them between arbitrary types arbitrarily—there’s no requirement that the left and right operands be the same type! We can even recover the flexibility of default by using Swift’s ?? method to provide fallback values.
So what does &&& look like? It really couldn’t be simpler:
x &&& y combines the map methods of each Optional, producing an Optional combining both values together in a tuple. This has the same truth table as && and combine, while also recovering the values; that makes it every bit as flexible, general, and powerful as switch, and without the boilerplate.
Here’s what combine looks like through this lens:
There are a couple of warts left:
As written, it takes ten seconds for Swift to compile it, most of which is spent typechecking. Adding explicit type parameters to Either.left and Either.right helps quite a lot, but it’s something to be aware of. (This, by the way, is the house that subtyping built. I hope you’re happy)
Swift has no way to tell that we’ve covered all our bases here, so we have to add that ! on the end. I like to read it the last line as “or the left value of b wrapped into an Either, dammit!”
The repetition between a.left.map and b.left.map could be cleaned up by implementing the analogous ||| operator.
Of course, Either is the easy case; the n × m = n + m = 4 case. The results are more impressive as n and m grow. What about structural recursion over Type?
My second draft, this time named structural (I was too excited about &&& to think up a better name), uses &&& to combine Optionals:
That’s a lot better than the 16-way, deeply-nested analyses-of-analyses function I had before. And now, with the problem of branches entirely sorted out and all the boilerplate out of the way, I was able to notice that the operation I’m attempting is really a sort of a fold, and so I should provide an initial value and combine it with the pairs of types using f. (And at the same time, I can also fix a bug: type variables should be unified with every other type, not just with other type variables!)
Now if you’ll excuse me, I’m going to go rewrite all of my == functions.
Value types (struct, enum, and tuples) in Swift have some key advantages over reference types (primarily class) when it comes to expressiveness—which we’ll define for the purposes of this discussion as the ability of the compiler, and more importantly the programmer, to reason on them.
One of the biggest is the constraints placed on them by let vs. var: let defines a constant, whose value cannot be changed, whereas var defines a variable, whose value can, indeed, vary. Noting that a value is bound in a constant allows the compiler, and more importantly the programmer, to reason on the behaviour of that value over time.
These semantics are partially enforced by the language; attempting to assign a new value to a constant will fail. However, the reasoning compiler and programmer are also at the mercy of the constant’s type: the language requires that the value of a constant not change, but this is a weak guarantee for reference types, since the constant’s value is a reference to an object in question, and not that object’s value.
Value types, on the other hand, have a stronger guarantee—the constant’s value is whatever gets assigned to it, not merely a reference. This is why the mutating keyword for value type methods exists—a method marked as mutating will implicitly assign the new value to the variable holding the receiver. From there we can see directly why we cannot call mutating methods on value types bound in constants instead of variables: assignment to bound constants is forbidden.
Of course, if reference types can defeat the semantics of let in local variables, the same is true in properties: a reference to an object held by a value type may be constant, but that does not imply that the object referenced is.
Most of the time, this is undesirable. Andy Matuschak noted this recently in A Warm Welcome to Structs and Value Types:
Value types containing references are not necessarily isolated and should generally be avoided: they carry a dependency on all other owners of that referent. These value types are also not readily interchangeable, since that external reference might be connected to the rest of your system in some complex way.
This advice is sound: in general, a reference type held by a value type opens the door to unsemantic use and consequently a reduction in our ability to reason soundly on the value type.
On the other hand, sometimes copying is not what you intend; sometimes performance or correctness in the presence of side effects requires state to be shared. That’s when you reach for reference types; but it would be a shame to lose value semantics entirely.
For example, I recently had occasion to implement a Stream. A first approximation to the type looked something like this:
enum Stream<T> {
case Nil
case Cons(Box<T>, () -> Stream<T>)
}
This is pleasantly direct, but has a significant flaw—since the rest of the stream is an arbitrary function producing a Stream, we can’t make any guarantees about when and how it is evaluated. This becomes very important in two cases:
When the function is expensive. We don’t want to pay that cost any more than necessary.
When the function has side-effects. If the function isn’t referentially transparent, then neither is the stream.
Unfortunately, Swift’s type system doesn‘t allow us to constrain the function type to only accept pure functions. Even if it did, that would address correctness, but not performance—and performance is a feature.
Instead, we’ll employ memoization—the function will still be evaluated lazily, but the first call will cache the returned value and simply return it for every future call.
It’s fairly trivial to write a memoizing combinator which would wrap the function; but that would require that every caller remember to memoize. Instead, we’d like to express this constraint with a type: Memo<T>.
Memoization has two cases to consider: evaluated, and not yet evaluated. This is a great use case for an enum!
enum Memo<T> {
case Evaluated(Box<T>)
case Unevaluated(() -> T)
}
The distinction in semantics between reference and value types has surfaced already. The Evaluated case employs Box<T> to work around Swift’s regrettable lack of support for enums with multiple cases with values, where at least one case is of indeterminate size. Memo does not constrain T, and so its size cannot be known ahead of time, and the compiler doesn’t yet know how to account for the potential variance when there are other cases present.
Box<T> is a class, and therefore a reference type. Thus, since the value of a reference type is the reference and not the referenced object, its size is fixed—presumably, the size of a pointer.
Unevaluated, meanwhile, does not require this wrapping—function types are reference types already.
We can now change Stream to use Memo instead of a function type:
enum Stream<T> {
case Nil
case Cons(Box<T>, Memo<Stream<T>>)
}
That’s the types sorted out, but we still need to handle evaluation—an Unevaluated instance of Memo should become Evaluated, caching the value to prevent further evaluation. We will do this with a method, value().
// within Memo<T>
mutating func value() -> T {
switch self {
case let Evaluated(x): return x.value
case let Unevaluated(f):
let value = f()
self = Evaluated(Box(value))
return value
}
}
We immediately hit a barrier: Memo is, as all enums are, a value type, and as we saw before, this means that it is inherently immutable, but if bound to a variable, can be replaced or updated. value() does this in its assignment to self, and thus the method must be marked as mutating. (This is also why we employed a method instead of the property we would have preferred—the mutating keyword only applies to methods.)
While technically accurate, this is misleading as far as the semantics of the type go: the whole point of Memo is that the value won’t change, since it’s evaluated at most once! In fact, Memo is semantically immutable in precisely the same way as let: it is bound exactly once, and thereafter constant; and it provides relatively weak guarantees when parameterized by reference types.
It seems that we’ll be forced to use reference types for this, meaning class, and consequently a much less clear implementation—class instances do not support pattern matching, so we’ll have to give up our nice clean switch for an intention-clouding if/else statement.
Worse yet, writing Memo as a class would cloud our intent with regard to its immutability; seeing that it is a reference type, the programmer using this class has no hint of how the type will behave when bound to a constant. Programmer and compiler both must treat the type more conservatively—that is, defensively!—than they might otherwise have to, because weaker guarantees make for fewer valid optimizing assumptions.
Earlier, we saw that the guarantee of immutability provided by let has three categories of strength:
When the type is a pure value type, the guarantee is absolute for the entire value of the binding—as we saw earlier, mutating a value type replaces the value in the binding, which let forbids. Note that this guarantee cannot be provided for value types with members of parameterized types, since those parameters could be reference types.
When the type is a value type with a type parameter or some non-parameterized reference type as a member, then the guarantee is still absolute for the value of the binding, including any references—but excluding the referenced objects. We may think of these objects as being members of the containing value type, but this intent is reflected in neither the types nor the behaviour.
When the type is a reference type, the only guarantee is that the reference bound to the constant will not change—it will always point to the same object, once bound. But that object could be pure or impure, and we have no way to tell.
The second category sounds like exactly what we’re looking for: we wish to declare our intent that Memo be understood as a value, while still allowing the memoized evaluation to be implemented. Therefore, let’s bridge the gap by building a value type veneer wrapping a theoretically immutable reference to a practically mutable enum:
struct Memo<T> {
/// A mutable reference to our private state.
private let state: MutableBox<MemoState<T>>
init(_ unevaluated: () -> T) {
state = MutableBox(.Unevaluated(unevaluated))
}
init(evaluated: T) {
state = MutableBox(.Evaluated(Box(evaluated)))
}
public var value: T {
switch state.value {
case let .Evaluated(x): return x.value
case let .Unevaluated(f):
let value = f()
state.value = .Evaluated(Box(value))
return value
}
}
}
/// Private state for memoization.
private enum MemoState<T> {
case Evaluated(Box<T>)
case Unevaluated(@autoclosure () -> T)
}
MutableBox<T> is a counterpart to Box<T> whose value property is a variable, not a constant; this allows us to update its value. At the same time, the reference to this box is held in a constant, so we know that any copies will always share the same reference: once evaluated, always evaluated, for all copies.
Privately, we are still afforded the luxury of expressing the state with an enum; publicly, Memo still enjoys the semantics of a value type; Memo, while employing mutability as an implementation detail, does not have to bother its consumers with this; and finally, Memo has the API we wanted all along—a property for value instead of a method.
Swift’s decision not to surface effects via the type system results in a compromise. It is regrettably convenient for a developer to employ poor judgement in their treatment of value types. However, by the same token, we are enabled to hide unobservable effects—for example, the evaluation of a memoized type; equivalently, lazy properties; or the assignment of values during init()—and provide consumers of our API with better guarantees, behaviours, and clearer documentation of intent.
It’s on us to make sure that we do so.
Today, I made a mistake. I complained on Twitter.
My blood pressure spikes as I see a method pulling a property’s value into a local variable to avoid the block capturing self strongly.
— Rob Rix (@rob_rix) May 13, 2014
It’s obvious that Twitter is a poor venue for this subject. A series of 140c missives (minus @-mention bookkeeping) is really not the ideal medium to express which particular layer of frustration is drawing my angst at the moment. I did a terrible job of disambiguating things, too.
My point was easily missed due to these mistakes, so I’ll try to express it more clearly. Here’s the method in question, from ReactiveCocoa’s RACStream:
- (instancetype)map:(id (^)(id value))block {
NSCParameterAssert(block != nil);
Class class = self.class;
return [[self flattenMap:^(id value) {
return [class return:block(value)];
}] setNameWithFormat:@"[%@] -map:", self.name];
}
This is a perfectly acceptable method, and it does its job perfectly well. I’m not frustrated with it, or with its author, in the slightest. So what’s wrong? This is the offending line:
Class class = self.class;
By pulling the receiver’s class into a local variable, this method avoids an unnecessary capture of self by the block which follows. It’s not immediately clear to me whether that would constitute a reference cycle or not; even if it didn’t, this could be a good example of defensive programming, avoiding a change in assumptions silently and poisonously causing -map: to produce disconnected subgraphs.
The problem isn’t the code, therefore; it’s the fact that the code is necessary.
In the small, this is a complaint about the fact that even with ARC, it is all too easy to end up with reference cycles when using blocks; this is perhaps the greatest flaw of blocks (unpopular syntax notwithstanding). But that’s not really the problem either.
The problem is that this is just another example of how I am forced to conflate what with how.
The definition given is already quite declarative, being defined in terms of -flattenMap: and +return: rather than specifying e.g. some fold, or worse, loop over the stream. But the necessity of mixing implementation details like reference cycles and lifetimes into it breaks the spell; we’re back to thinking about how this occurs, rather than what the definition is; what -map: means.
More ideal would be the freedom to ignore details like every variable being effectively a weak form of Maybe with nil (I’m looking at you, NSCParameterAssert) and like the reference-capturing semantics of blocks. Or like returning from a function. Or like these being instructions, at their core, rather than a definition at all—because this method has not been defined; it has only been implemented.
We don’t have the luxury of defining until we can cut away the irrelevant chaff and talk about the problem we care about, rather than the problems in the way of solving it.
I like solving problems, but this has been done; we are simply doing it wrong.
Inscribe the orbit of Neptune in a square.
Now, take a pair of integers as x and y coordinates across this square. Their size in bits determines the resolution at which they can measure this square.
An integer of n bits can hold any of 2ⁿ distinct values. 32-bit integers, therefore, would divide the square into a grid of 2³² points.
At 32 bits of resolution, adjacent coordinates, e.g. …0101 and …0110, are about a kilometre apart on our square.
If we double the size of our integers, we now divide the square into a grid of 2⁶⁴ points.
At 64 bits of resolution, still covering the entire span of the orbit of Neptune, adjacent coordinates are about 0.24µm apart, or about 1⁄400th the width of an average human hair.
And famously, populating a 128-bit address space would require us to boil the oceans.
I recently wrote a small library for working with Objective-C blocks. It’s quite limited thus far—it can apply an array of arguments to a block, and not much else. In order to do even that, it first tests that the array is long enough to apply; to do that, it gets the arity; and to do that, it parses (scans, really) the block’s signature string.
Not all blocks have signature strings—they were added to the ABI eons ago, but there are, hypothetically, still some pieces of software out there built with an old enough compiler that they won’t have it. But it’s easy to test for, and once you have it, you’re in for a treat.
Objective-C’s type encoding strings are a well-known feature of the language whereby the compiler and runtime conspire to provide you almost-but-not-quite enough information to perform some truly admirable hacks. They’re fairly ubiquitous in the runtime: variants of them are used by NSMethodSignature, methods, properties, @encode(), distributed objects (one of the more sophisticated uses of their encoding of type qualifiers), and, of course, blocks.
However, not all type signature strings are created equal. The documentation lists many simple types, and the compound types constructible with them (structs, unions, arrays). But, true to the documentation, a little experimentation shows that object types are always just @ (lacking information about classes, protocols, etc.):
(lldb) po @((char *)@encode(id))
@
The documentation doesn’t even mention blocks, which might lead you to believe that they aren’t encoded at all; but as is so often the case, a little experimentation leads to a lot of surprise:
(lldb) po @((char *)@encode(dispatch_block_t))
^{__block_literal_generic=^vii^v^{__block_descriptor}}
Going by the documentation, this means “a pointer to a __block_literal_generic structure whose members are void *, int, int, void *, and a pointer to a __block_descriptor structure whose contents are anonymous—that’s the ABI we know and love! Still, this isn’t a very useful string from a metaprogramming perspective; it says how the memory representing the block is structured, but nothing about its return or argument types. And of course, this is all just a flat string—parsing is left as an exercise to the reader.
Even for simpler types, it doesn’t take much experimentation with the language’s forwarding mechanism—the means by which the delivery of an arbitrary message to an object can be overloaded to forward it on to some other object, interact with its arguments parametrically, or otherwise customize dispatch—to learn that NSMethodSignature strings aren’t quite as boring as @encode would make it seem. For example, here’s the -description of the method signature for one of Obstruct’s unit test methods:
(lldb) po [self methodSignatureForSelector:_cmd]
<NSMethodSignature: 0x100307e20>
number of arguments = 2
frame size = 224
is special struct return? NO
return value: -------- -------- -------- --------
type encoding (v) 'v'
flags {}
modifiers {}
frame {offset = 0, offset adjust = 0, size = 0, size adjust = 0}
memory {offset = 0, size = 0}
argument 0: -------- -------- -------- --------
type encoding (@) '@'
flags {isObject}
modifiers {}
frame {offset = 0, offset adjust = 0, size = 8, size adjust = 0}
memory {offset = 0, size = 8}
argument 1: -------- -------- -------- --------
type encoding (:) ':'
flags {}
modifiers {}
frame {offset = 8, offset adjust = 0, size = 8, size adjust = 0}
memory {offset = 0, size = 8}
Those “type encoding” lines are simple enough, but we see frame/memory offsets and sizes that aren’t in the documentation. Where do those come from? A little introspection with the runtime provides the answer:
(lldb) p (char *)method_getTypeEncoding(
(struct objc_method *)class_getInstanceMethod([self class], _cmd))
(char *) $25 = 0x00000001000e1f3a "v16@0:8"
Aha! I’m no expert, but those look like integers to me. But what do they mean? Apparently, not much these days—they’re a nod to the past, primarily, maintained for binary compatibility.
This is interesting enough, but void is the boringest thing to return. What does the type signature of a method returning an object look like?
(lldb) p (char *)method_getTypeEncoding(
(struct objc_method *)class_getInstanceMethod([self class], @selector(description)))
(char *) $26 = 0x00000001000ef3ad "@16@0:8"
True to the documentation, the return type is @. That’s a shame—the method returns NSString *, and it would be nice to be able to capture that fact at runtime.
As it turns out, properties have strings encoding their attributes, which include their types. It’s even documented, complete with useful examples (although sadly lacking object types). Given this property:
@property NSString *string;
We can inspect its attributes string like so:
(lldb) p (char *)property_getAttributes(
(struct objc_property *)class_getProperty([self class], "string"))
(char *) $69 = 0x00000001000e1e09 "T@"NSString",&,V_string"
Huh. The comma-delimited list of attributes includes the type, starting with T, and we can see that it’s encoded as not just @ but @"NSString"—it encodes the class name!
Still, that’s not quite the same as the types of the methods (although you can use it to infer those types). What if we wanted to add methods implementing this property to the runtime?
This is, in fact, one of the things that these type encoding strings are used for:
BOOL class_addMethod(Class cls, SEL name, IMP imp, const char *types)
Typically you might use e.g. method_getTypeEncoding to get the encoding of an existing method if you wanted to add or replace a method. But what if you want to parameterize it arbitrarily? Producing these strings by hand is doable, albeit error-prone, but that quickly becomes arduous when adding arbitrary methods. And while that is a somewhat obscure task to undertake, Mac OS X 10.5’s addition of +resolveClassMethod: and +resolveInstanceMethod: to the runtime make it a reasonable one to consider: producing methods at runtime is a valuable metaprogramming tool (with apologies to Jon Sterling) to have at your disposal.
Further, following the addition of blocks in Mac OS X 10.6 & iOS 4, 10.7/4.3 added this little gem to the runtime:
IMP imp_implementationWithBlock(id block)
Even now, years later, seeing this makes my eyes light up—simultaneously with glee at the possibilities it affords, sorrow at the type system being unable to express it better than id, and some mix of alarm and exuberance at how this works in the runtime (thanks, bbum!). But how would you add a method with this if you didn’t want to specify the type string by hand?
You can’t ask the runtime—there is no public C API to introspect blocks for types, arity, or anything else, unfortunately. But the information is there, and an ABI is just another sort of API—and that’s where Obstruct comes in.
Taking a step back for a moment, if you wanted to add a -description method using a block, you would start like this:
NSString *(^block)(id) = ^(id self){ return @"🔥"; }
IMP implementation = imp_implementationWithBlock(block);
class_addMethod([self class], @selector(description), implementation,
Now, we need the types. We could ask the runtime for the types for -description, but if we were adding a method for some other selector we mightn’t have a prototype; so we could instead ask the block using Obstruct:
const char *types = obstr_block_get_signature((obstr_block_t)block);
class_addMethod([self class], @selector(description), implementation, types);
From now on, sending -description to instances of our class will invoke our block. And since it pulls that encoding of the block out at runtime (not compile-time as with @encode), we can apply this to any arbitrary block instance, even one typed as id, and retrieve the full type information that the ABI has available to it.
Going by method_getTypeEncoding, we’d expect that the signature string for the block would be something dull, like this:
@16@0:8
Breaking that down, we have:
@16 — the return type, an object that could be of any class, and its make-believe offset@0 — an object which is the implicit self parameter, and its make-believe offset:8 — a selector which is the implicit _cmd parameter, and its make-believe offsetRight away we know that something will be different. The docs for imp_implementationWithBlock say this:
The selector is not available as a parameter to this block.
And, again per bbum, we know we can expect that _cmd will be replaced with the implicit block parameter. But enough speculation; what does the string actually look like?
@"NSString"16@?0@8
Well now! Just like property attribute strings, block signature strings encode classes! Breaking that down, we have:
@"NSString"16 — the return type, an object of class NSString (!)@?0 — the implicit block parameter@8 — selfWhat else do block signature strings encode? Here are some examples:
protocols:
obstr_block_get_type_signature((obstr_block_t)^id<NSCopying>(id self){ return nil; });
// @"<NSCopying>"16@?0@8
classes and protocols combined:
obstr_block_get_type_signature((obstr_block_t)^NSString<NSCopying> *(id self){ return nil; });
// @"NSString<NSCopying>"16@?0@8
and blocks!
obstr_block_get_type_signature((obstr_block_t)^dispatch_block_t(id self){ return nil; });
// @?<v@?>16@?0@8
This is worth breaking down a little further. We can see that the return type is encoded as @?<v@?>. What does it mean?
@? — a block< — whosev — return type is void,@? — taking the implicit block parameter> — and no other argumentsThat’s dispatch_block_t (void(^)(void)), alright. You’ll note that the encoding of this type doesn’t include offsets—perhaps the bincompat issues don’t apply, since it was added so long after those binaries could possibly have been created.
We now know how to read the type signatures produced by @encode for objects and block types, the attribute strings of properties, the type signatures of methods in the runtime, and now those of blocks. So now let’s practice with a complex example. I got this signature from a block while testing some improvements to Obstruct’s type signature parsing:
@?<@?<v@?>@?@"NSDictionary"@"NSString<NSCoding>"@?<v@?>>16
@?0
@?<@?<v@?>@?@"NSDictionary"@"NSString<NSCoding>"@?<v@?>>8
(Linebreaks added.) No surprise: it’s an unreadable mess. Let’s break it down and see if we can figure out its type.
This is a block for which:
@? — the return type is a block type
< — which
@? — returns a block
< — which
v — returns void@? — and takes the implicit block parameter> — and@? — takes the block parameter implicitly@"NSDictionary" — anNSDictionary *` as its first explicit parameter@"NSString<NSCoding>" — an NSString<NSCoding> * as its second@? — and a block as its third
< — which
v — returns void@? — and takes the implicit block parameter> — and>16 — which@?0 — takes the block parameter implicitly@? — and a block as its first parameter
< — which
@? — returns a block
< — which
v — returns void@? — and takes the implicit block parameter> — and@? — takes the block parameter implicitly@"NSDictionary" — anNSDictionary *` as its first explicit parameter@"NSString<NSCoding>" — an NSString<NSCoding> * as its second@? — and a block as its third
< — which
v — returns void@? — and takes the implicit block parameter> — and>8 — that is allWhew! And sure enough, the block this was taken from is defined as:
^(dispatch_block_t(^_)(NSDictionary *, NSString<NSCoding> *, dispatch_block_t)){ return _; }
That’s the identity block for a higher-order block. It uses inference for the return type, but if we were to write it out in full, it would be like so:
typedef dispatch_block_t(^inner_block_t)(NSDictionary *, NSString<NSCoding> *, dispatch_block_t);
inner_block_t(^block)(inner_block_t) = …;
All that remains now is to parse it—an exercise left for the writer.
Here’s the first thing you probably see any time you look at your source files:
//
// APointlessWasteOfTime.h
// ProjectName
//
// Created by You on 4/15/14.
// Copyright (c) 2014 Somebody. All rights reserved.
//
Why do we do this? Because it’s the default: the templates have it.
But why? Because they had it last year. And the year before that. And the one before that.
Every time you open a source file, you have to scroll past this.
Every time you change the name of the file, class, or project, you either capture this or it’s out of date.
It duplicates the name of the file, the version control metadata, potentially even your LICENSE file. It makes a mess of your diffs. And maybe you think: hey, it’s 2015 now; should I update all my files’ comment headers…?
I have my own set of templates, because I don’t use the code style that Xcode’s templates use in any of the projects I work on. My header looks like this:
// Copyright (c) 2014 Somebody. All rights reserved.
I only do that because somebody on Twitter convinced me it’s a reasonable minimum for investigating copyright violations/licensing issues, otherwise I’d do without entirely.
Edit: apparently I can drop “All rights reserved”. h/t Kristopher Johnson.
Edit 2: joke’s on me:
//___COPYRIGHT___
Looks like Xcode is cargo-culting “All rights reserved” in there.
As I’ve previously discussed, I had at one point implemented the map function, RXMap, such that it took three parameters:
This let you conveniently build a set from an array while mapping, or an array from a dictionary, or any sort of collection from any other sort, and if you passed nil for the second parameter it would simply make a new collection of the initial collection’s type (which is generally what people want). I was unhappy with this, however, since it conflated the map (producing some result by applying the block to each element) with a fold (producing a concrete collection from that abstract series of produced values) and was a little surprising if you were at all used to maps in other languages/frameworks that generally only take a collection and a block.
Maps and folds were (and are) done over anything implementing NSFastEnumeration, allowing you to conveniently map over anything you can do a for(in) loop over; it occurred to me, then, that I could take advantage of this to separate the two out: a map would return an object conforming to NSFastEnumeration, rather than a concrete collection; it can then be folded into any result, and additionally allows you to terminate the map early (since you can just stop enumerating at any time as with a break statement in a for(in) loop).
This started me thinking more about lazy evaluation, which is always a good thing, but it had some downsides: NSFastEnumeration is difficult to implement correctly, and particularly so when the resulting objects are temporaries (and you are explicitly not collecting the results into an array, after all). In fact, doing so without leaks is only possible when you document some caveats that the caller must obey, and these caveats preclude useful things like fetching each consecutive batch of objects on a serial queue due to the details of when autorelease pools get popped.
Perhaps more damningly, it’s particularly difficult to compose NSFastEnumeration, e.g. to produce objects based on the enumeration of some other object, as with a map, or worse, a convolution function (e.g. zip, zipWith).
This led me to decide that NSFastEnumeration is not the correct interface for these tasks; it’s convenient to be able to do a loop over a map:
for (id x in RXMap(y, ^(id each){ return …; }))
…but not at the cost of a huge implementation and maintenance burden and several caveats that make this phrasing of lazy evaluation impossible to use for common idioms like retrieving chunks of data from a socket.
Therefore I set about designing a better interface for lazy evaluation, with several goals in mind:
it should be efficient to traverse the results; maximizing locality of reference was deemed best for this purpose, so, as with NSFastEnumeration, it was decided to produce and traverse the results in batches.
As with NSFastEnumeration, an array or other object with an interior C array of objects should be able to just return a pointer to that array and its cardinality and be finished.
it should be safe to produce temporaries, which means strong references rather than the __unsafe_unretained-pointer-in-a-struct approach NSFastEnumeration uses.
Corollary to which, it should be safe to use regardless of the state of the surrounding autorelease pools: you can call it in the rain, you can call it on a train, you can call it on a queue, won’t you call it, Sam-are-you?
it should be straightforward for a caller to use; minimal setup required if any, fire and forget, etc.
it should be straightforward to implement; sane defaults, flexibility via specialization, difficult to go wrong, no undocumented gotchas like “make sure to set state->mutationsPtr to a valid reference because for(in) will absolutely dereference it without checking even if you’re an immutable collection, herp derp.”
It should be straightforward to compose; a map should not be twenty lines of code for the key method!
it should be suitable for sockets, maps, folds, parsers, generators, filters, convolution, and so forth; anything idiomatically producing or consuming a series of objects in some sequence.
it should allow you to decouple the traversal of the objects in a collection from the collection’s structure, as with NSArray’s -reverseObjectEnumerator; consider a depth-first, pre-order traversal of a tree, for example.
Two naïve options occurred:
something like NSEnumerator, returning one object at a time. This is trivial to implement, but fails the efficiency test.
something like NSFastEnumerationState, but reified as a class with a proper, strong id context property. This is straightforward to implement, but still difficult (or at least obnoxious) to compose; the 20 lines of code for a map are reduced to (perhaps) 12 or 13, which is still a lot of noise.
This provided an opportunity to think about this problem in the abstract. Assuming we want a batched implementation for the sake of efficiency of traversal (and which also harmonizes nicely with the buffering one desires of a socket or file handle reader), we can describe a traversal as follows:
This provided the insight that the ideal interface for this abstracts all of that except for refilling the batch—producing the objects—away from the implementor and the caller. The batch knows if it is exhausted, and can go refill itself. The resulting RXTraversal class is trivial to use:
a simple collection like RXTuple can return a traversal with an interior pointer and count and a strong reference to itself.
a more complex traversal like a map can return a traversal parameterized with a source; the traversal calls out to the source when it needs to be refilled, and the source refills it by consuming objects from the traversal it is itself parameterized with, and returning a boolean flag saying whether or not there might be more things to produce.
refilling the traversal is done by a block called by the traversal; the traversal doesn’t know or care how full the traversal is, or about refilling its parameter traversal.
All of that is a little opaque in the abstract. A better exposition is a diff, showing the implementation with NSFastEnumeration vs. RXTraversal:
https://github.com/robrix/RXCollections/commit/2c330875c0d83c2bc43b2dc2f92859c9e27ffab5
This is a particularly evocative example because of the effort that was required to maintain the set of source enumeration states that the convolution is using. All of that code is completely unnecessary when all you need to care about is what objects go in what order! In short, you’re writing the code you want to write, which is explicitly my goal.
And the best part? Traversals conform to NSFastEnumeration (which was trivial to implement!), and can be made with id objects.
The framework as a whole has now been migrated to this new model; the API may be refined to cover new uses, but is considered basically stable.
There are some open questions still:
Finally, there’s more I want to do with RXCollections in terms of usage of it:
My RXCollections framework has been getting some attention lately, what with mentions by both Wolf Rentzsch and Michael Tsai, as well as my own incessant prattling on the subject1, 2.
At the same time, it’s been undergoing some major shifts. Since Wolf and Michael’s posts, the old RXMap that took a second argument as the collection to fold the mapped results into is gone. In its place is an RXMap which returns an object conforming to NSFastEnumeration instead of an instance of a collection class. RXMap is dead; long live RXMap!
This change was made to isolate and clarify the responsibilities present:
The former is straightforward: the block passed to RXMap is evaluated once per object in the collection, as with any map.
The latter requires a little more exposition. When you implement a map-style loop by hand, it typically goes something like this:
NSMutableArray *secretAgenda = [NSMutableArray new];
for (Conspirator *conspirator in conspirators) {
[secretAgenda addObject:[conspirator sinisterPlots]];
}
At the end of this loop, we have an array containing all of the conspirators’ sinister plots in the order in which the conspirators are enumerated using NSFastEnumeration. But if conspirators were a set, we would likely want to build up a set instead of an array; the array’s semantics include preservation of ordering, which is surplus to the requirements of the original collection being enumerated.
RXMap was originally designed to take an object which the results would be collected into; if it were nil, it would instead ask the collection (these objects conformed to an RXCollection protocol) for an empty mutable collection that it could use instead.
Given the collection we’re filing into, destination, and the object in the collection currently being operated upon, element, the logic is something like this:
[destination appendObject:element];
Mutable state! (-appendObject: would probably be a synonym for -addObject: on both NSMutableArray and NSMutableSet). Well that’s alright, we could rephrase it thusly:
return [destination collectionByAppendingObject:element];
There, no more mutable state. (NSMutableArray would probably use -arrayByAddingObject: here.) But what’s being done with the returned collection? It would have to be used with the next element of the original collection. This is sounding suspiciously like a fold! Sure enough, you can implement a map function like this:
NSArray *array = RXFold(collection,
[NSArray new],
^(id accumulator, id each){
return [accumulator arrayByAddingObject:each];
});
(You could also use an NSMutableArray as the second argument, call -addObject: with each object, and return the accumulator; it would be interesting to benchmark these approaches against one another.)
This is a perfectly serviceable map, but you still don’t have a way to use NSSet instead of NSArray without either writing a new fold or parameterizing that one by the class you’re accumulating into.
An alternative solution, which I opted for in RXCollections, is to instead return an enumeration. (Python aficionados will probably recognize this as an iterator or even a generator.) This is a more complex endeavour than the above-described fold, given the frustrating complexity of the NSFastEnumeration protocol, but it is not without its benefits:
NSFastEnumeration does not require all the objects to be returned at once, there’s technically no reason for them to even exist until they’re requested. The results are calculated and provided on-demand; if you never request them, they’re never built. (NB: If you rely on your map having side effects, you’ll want to move those into a for(in) loop over the enumeration instead, as they won’t be evaluated unless the enumeration is traversed!)NSFastEnumeration specifies an interface for doing a single pass over a collection, there’s no need for the objects produced by the mapping block to be kept around any longer than is necessary for them to be used by the caller. You can keep them around by folding them into a collection, of course; but if you don’t need them, they won’t take up space.The latter two benefits can be summed up as lazy evaluation and streaming. Enumerating a long chain of maps with a for(in) loop will only take as much CPU time and memory as is required for each stage to complete, and the results will be processed as soon as they’re available, rather than waiting for each stage to complete before going on to the next. All this without threads or asynchrony!
There are some important caveats, here:
NSFastEnumeration is potentially quite dangerous in the wrong hands. In particular, it does not provide a safe mechanism for you to return temporary objects directly, and it does not provide any guarantee that you will be called when the enumeration has completed; therefore if you generate temporaries and store them in an instance variable, you will not necessarily be able to clean them up! As testament I offer the documentation for RXEnumerationTraversal, the NSFastEnumeration-conformant class behind RXMap and RXMap; note that it specifically forbids you from calling it like this:
NSAutoreleasePool *pool = [NSAutoreleasePool new];
for (id element in enumeration) {
[pool drain];
pool = [NSAutoreleasePool new];
}
Or like this:
NSUInteger count = 0;
@autoreleasepool {
count = [enumeration countByEnumeratingWithState:&state
objects:buffer
count:bufferCount];
… do something with the objects in state …
}
@autoreleasepool {
count = [enumeration countByEnumeratingWithState:&state
objects:buffer
count:bufferCount];
… do something with the next batch of objects in state …
}
Why not? Because it stores references to objects in the state struct by casting the itemsPtr to __autoreleasing id *. If you pop the autorelease pool between calls to -countByEnumeratingWithState:objects:count:, it will die in a messy explosion of pointer dereferences and EXC_BAD_ACCESS. This is why we can’t have nice things, Cocoa.
Still and all, the former construct is illegal under ARC and the latter is trivial to avoid; notably, for(in) abides by these restrictions. These is not such a high price to pay to be able to build lazily-evaluated streams of objects and maps over them.
As mentioned before, the implementation is not as direct as the fold we like to think in terms of. Why is that?
NSFastEnumeration has you return your objects via a pointer in the passed-in NSFastEnumerationState struct. ARC forbids object pointers in structs unless they are marked __unsafe_unretained; this causes some syntactic wrinkles at best. (Turning off ARC for this implementation is an option, but not one I was interested in pursuing.)break; in a for(in) loop—you have to save other objects into the state if you want to return temporaries.-countByEnumeratingWithState:objects:count: that in turn enumerates some other object(s) is going to be further complicated—you have to store the NSFastEnumerationState you pass to the delegate enumeration, as well as the resulting objects.for(in) requires the mutations pointer be set to a valid pointer to an unsigned long, since it dereferences it without checking that it isn’t NULL.NSFastEnumeration return batches of objects (thus, NSEnumerator’s one-at-a-time implementation is not ideal!), which means that a chaining implementation has to account for the possibility that the delegate will return fewer or more objects than it itself wants to. This amounts to blocking and buffering, which can be complex to think about and is full of pitfalls.For this reason, having a framework like RXCollections which provides abstractions above these complexities can be quite valuable if you’re wanting to use lazy evaluation in your own systems. RXGenerator can be used to calculate values on demand, and can be used such that it does not mutate state; RXMap is a great way to enumerate collections or generated values and take some action on each; and RXFold lets you collapse the lazily evaluated objects into a single concrete object.
The more esoteric constructs like RXTraversalArray (a lazy NSArray subclass) and RXIntervalTraversal (a straightforward traversal of a closed interval of real numbers backwards or forwards by a given stride or count; sort of like Ruby’s 1.upto(10) or Python’s range(10), but also great for e.g. calculating integrals) may also be interesting, even if only as exposition of what an implementation of NSFastEnumeration can accomplish, or how it can be used.
Despite the above warnings and alternatives, NSFastEnumeration is still worth getting to know a little better. Try implementing a fast enumeration of a sequence of coin flips. Did you generate them and cache them in an array, and then enumerate that? Can you generate them temporarily one by one or in batches and enumerate those? Try implementing a fast enumeration over a tree. Did you do it breadth-first? Depth-first? As this implies, there are both mechanical and algorithmic problems to be solved here, and both are interesting. Here’s my coin toss enumeration:
@implementation CoinToss
-(NSUInteger)
countByEnumeratingWithState:(NSFastEnumerationState *)state
objects:(__unsafe_unretained id [])buffer
count:(NSUInteger)len {
state->mutationsPtr = state->extra;
state->itemsPtr = buffer;
buffer[0] = @1;
return 1;
}
@end
(Terrible line breaks brought to you by my desire to stay within a fixed column width.) This coin toss might be rigged! And it never finishes! And that memory management is sketchy at best—if it works, it’s only because NSNumber is a tagged pointer. Maybe it should limit the number of tosses:
state->state++;
return (state->state++ > 10)? 0 : 1;
And autorelease its temporaries, fixing the problems with its memory management:
__autoreleasing id *temporaries =
(__autoreleasing id *)(void *)buffer;
*temporaries = @1;
That cast is ugly, but at least the assignment is straightforward.
And maybe it should return batches:
NSUInteger initial = state->state;
for (;
state->state < MIN(initial + len, max - initial);
state->state++) {
*temporaries++ = @1;
}
return state->state - initial;
I’ll leave the rigged coin toss in, though. As others have observed, you never really know, with randomness.
What does yours look like?
On my short list—a tremendous misnomer for the vague probability cloud of hopes and intentions I drift through day by day—is a similar set of nouns and verbs for implementing NSFastEnumeration. Abstracting the common task of storing contextual information or autoreleased temporaries into the state struct would be a good start.
NSFastEnumeration, not least of which is the difference between (some formulations of) a stream’s push semantics and NSFastEnumeration’s pull semantics—i.e. a dispatch_io_t will push data to you as it becomes available, but you have to explicitly call -countByEnumeratingWithState:objects:count: to get data out of the enumeration, which comes back to blocking and buffering—but it seems like that would be a good component in this style of lazy evaluation.-countBy…, then you could use the same objects and blocks both as signals and enumerations.There’s a lot of potential here beyond my immediate interests of implementing convolution (aka zip/zipWith) and the like. What would you do with these tools?
Graphs, by nature, can be recursive, which means that you can’t build a graph immutably with just initializers:
Foo *bar = [Foo fooWithFoo:quux];
Foo *quux = [Foo fooWithFoo:bar];
Obviously that won’t work—quux isn’t defined when bar is initialized. But if these are value objects, we don’t really want them to be mutable. What do we do?
Well, I don’t know, but here’s what we don’t do:
Foo *bar = [Foo alloc];
Foo *quux = [[Foo alloc] initWithFoo:bar];
[bar initWithFoo:quux];
Just because immutable objects aren’t really finally immutable in idiomatic Objective-C doesn’t mean you should do this sort of ugliness, even if you can expect it to work (modulo +alloc and -initWithFoo: returning the same pointer, and -initWithFoo: not doing anything that requires an initialized Foo).
Thanks to Doug Russell.