Intersections |
Intersections@rob_rix, mostly on code. |
As I’ve previously discussed, I had at one point implemented the map function, RXMap, such that it took three parameters:
This let you conveniently build a set from an array while mapping, or an array from a dictionary, or any sort of collection from any other sort, and if you passed nil for the second parameter it would simply make a new collection of the initial collection’s type (which is generally what people want). I was unhappy with this, however, since it conflated the map (producing some result by applying the block to each element) with a fold (producing a concrete collection from that abstract series of produced values) and was a little surprising if you were at all used to maps in other languages/frameworks that generally only take a collection and a block.
Maps and folds were (and are) done over anything implementing NSFastEnumeration, allowing you to conveniently map over anything you can do a for(in) loop over; it occurred to me, then, that I could take advantage of this to separate the two out: a map would return an object conforming to NSFastEnumeration, rather than a concrete collection; it can then be folded into any result, and additionally allows you to terminate the map early (since you can just stop enumerating at any time as with a break statement in a for(in) loop).
This started me thinking more about lazy evaluation, which is always a good thing, but it had some downsides: NSFastEnumeration is difficult to implement correctly, and particularly so when the resulting objects are temporaries (and you are explicitly not collecting the results into an array, after all). In fact, doing so without leaks is only possible when you document some caveats that the caller must obey, and these caveats preclude useful things like fetching each consecutive batch of objects on a serial queue due to the details of when autorelease pools get popped.
Perhaps more damningly, it’s particularly difficult to compose NSFastEnumeration, e.g. to produce objects based on the enumeration of some other object, as with a map, or worse, a convolution function (e.g. zip, zipWith).
This led me to decide that NSFastEnumeration is not the correct interface for these tasks; it’s convenient to be able to do a loop over a map:
for (id x in RXMap(y, ^(id each){ return …; }))
…but not at the cost of a huge implementation and maintenance burden and several caveats that make this phrasing of lazy evaluation impossible to use for common idioms like retrieving chunks of data from a socket.
Therefore I set about designing a better interface for lazy evaluation, with several goals in mind:
it should be efficient to traverse the results; maximizing locality of reference was deemed best for this purpose, so, as with NSFastEnumeration, it was decided to produce and traverse the results in batches.
As with NSFastEnumeration, an array or other object with an interior C array of objects should be able to just return a pointer to that array and its cardinality and be finished.
it should be safe to produce temporaries, which means strong references rather than the __unsafe_unretained-pointer-in-a-struct approach NSFastEnumeration uses.
Corollary to which, it should be safe to use regardless of the state of the surrounding autorelease pools: you can call it in the rain, you can call it on a train, you can call it on a queue, won’t you call it, Sam-are-you?
it should be straightforward for a caller to use; minimal setup required if any, fire and forget, etc.
it should be straightforward to implement; sane defaults, flexibility via specialization, difficult to go wrong, no undocumented gotchas like “make sure to set state->mutationsPtr to a valid reference because for(in) will absolutely dereference it without checking even if you’re an immutable collection, herp derp.”
It should be straightforward to compose; a map should not be twenty lines of code for the key method!
it should be suitable for sockets, maps, folds, parsers, generators, filters, convolution, and so forth; anything idiomatically producing or consuming a series of objects in some sequence.
it should allow you to decouple the traversal of the objects in a collection from the collection’s structure, as with NSArray’s -reverseObjectEnumerator; consider a depth-first, pre-order traversal of a tree, for example.
Two naïve options occurred:
something like NSEnumerator, returning one object at a time. This is trivial to implement, but fails the efficiency test.
something like NSFastEnumerationState, but reified as a class with a proper, strong id context property. This is straightforward to implement, but still difficult (or at least obnoxious) to compose; the 20 lines of code for a map are reduced to (perhaps) 12 or 13, which is still a lot of noise.
This provided an opportunity to think about this problem in the abstract. Assuming we want a batched implementation for the sake of efficiency of traversal (and which also harmonizes nicely with the buffering one desires of a socket or file handle reader), we can describe a traversal as follows:
This provided the insight that the ideal interface for this abstracts all of that except for refilling the batch—producing the objects—away from the implementor and the caller. The batch knows if it is exhausted, and can go refill itself. The resulting RXTraversal class is trivial to use:
a simple collection like RXTuple can return a traversal with an interior pointer and count and a strong reference to itself.
a more complex traversal like a map can return a traversal parameterized with a source; the traversal calls out to the source when it needs to be refilled, and the source refills it by consuming objects from the traversal it is itself parameterized with, and returning a boolean flag saying whether or not there might be more things to produce.
refilling the traversal is done by a block called by the traversal; the traversal doesn’t know or care how full the traversal is, or about refilling its parameter traversal.
All of that is a little opaque in the abstract. A better exposition is a diff, showing the implementation with NSFastEnumeration vs. RXTraversal:
https://github.com/robrix/RXCollections/commit/2c330875c0d83c2bc43b2dc2f92859c9e27ffab5
This is a particularly evocative example because of the effort that was required to maintain the set of source enumeration states that the convolution is using. All of that code is completely unnecessary when all you need to care about is what objects go in what order! In short, you’re writing the code you want to write, which is explicitly my goal.
And the best part? Traversals conform to NSFastEnumeration (which was trivial to implement!), and can be made with id objects.
The framework as a whole has now been migrated to this new model; the API may be refined to cover new uses, but is considered basically stable.
There are some open questions still:
Finally, there’s more I want to do with RXCollections in terms of usage of it: