Intersections |
Intersections@rob_rix, mostly on code. |
My RXCollections framework has been getting some attention lately, what with mentions by both Wolf Rentzsch and Michael Tsai, as well as my own incessant prattling on the subject1, 2.
At the same time, it’s been undergoing some major shifts. Since Wolf and Michael’s posts, the old RXMap that took a second argument as the collection to fold the mapped results into is gone. In its place is an RXMap which returns an object conforming to NSFastEnumeration instead of an instance of a collection class. RXMap is dead; long live RXMap!
This change was made to isolate and clarify the responsibilities present:
The former is straightforward: the block passed to RXMap is evaluated once per object in the collection, as with any map.
The latter requires a little more exposition. When you implement a map-style loop by hand, it typically goes something like this:
NSMutableArray *secretAgenda = [NSMutableArray new];
for (Conspirator *conspirator in conspirators) {
[secretAgenda addObject:[conspirator sinisterPlots]];
}
At the end of this loop, we have an array containing all of the conspirators’ sinister plots in the order in which the conspirators are enumerated using NSFastEnumeration. But if conspirators were a set, we would likely want to build up a set instead of an array; the array’s semantics include preservation of ordering, which is surplus to the requirements of the original collection being enumerated.
RXMap was originally designed to take an object which the results would be collected into; if it were nil, it would instead ask the collection (these objects conformed to an RXCollection protocol) for an empty mutable collection that it could use instead.
Given the collection we’re filing into, destination, and the object in the collection currently being operated upon, element, the logic is something like this:
[destination appendObject:element];
Mutable state! (-appendObject: would probably be a synonym for -addObject: on both NSMutableArray and NSMutableSet). Well that’s alright, we could rephrase it thusly:
return [destination collectionByAppendingObject:element];
There, no more mutable state. (NSMutableArray would probably use -arrayByAddingObject: here.) But what’s being done with the returned collection? It would have to be used with the next element of the original collection. This is sounding suspiciously like a fold! Sure enough, you can implement a map function like this:
NSArray *array = RXFold(collection,
[NSArray new],
^(id accumulator, id each){
return [accumulator arrayByAddingObject:each];
});
(You could also use an NSMutableArray as the second argument, call -addObject: with each object, and return the accumulator; it would be interesting to benchmark these approaches against one another.)
This is a perfectly serviceable map, but you still don’t have a way to use NSSet instead of NSArray without either writing a new fold or parameterizing that one by the class you’re accumulating into.
An alternative solution, which I opted for in RXCollections, is to instead return an enumeration. (Python aficionados will probably recognize this as an iterator or even a generator.) This is a more complex endeavour than the above-described fold, given the frustrating complexity of the NSFastEnumeration protocol, but it is not without its benefits:
NSFastEnumeration does not require all the objects to be returned at once, there’s technically no reason for them to even exist until they’re requested. The results are calculated and provided on-demand; if you never request them, they’re never built. (NB: If you rely on your map having side effects, you’ll want to move those into a for(in) loop over the enumeration instead, as they won’t be evaluated unless the enumeration is traversed!)NSFastEnumeration specifies an interface for doing a single pass over a collection, there’s no need for the objects produced by the mapping block to be kept around any longer than is necessary for them to be used by the caller. You can keep them around by folding them into a collection, of course; but if you don’t need them, they won’t take up space.The latter two benefits can be summed up as lazy evaluation and streaming. Enumerating a long chain of maps with a for(in) loop will only take as much CPU time and memory as is required for each stage to complete, and the results will be processed as soon as they’re available, rather than waiting for each stage to complete before going on to the next. All this without threads or asynchrony!
There are some important caveats, here:
NSFastEnumeration is potentially quite dangerous in the wrong hands. In particular, it does not provide a safe mechanism for you to return temporary objects directly, and it does not provide any guarantee that you will be called when the enumeration has completed; therefore if you generate temporaries and store them in an instance variable, you will not necessarily be able to clean them up! As testament I offer the documentation for RXEnumerationTraversal, the NSFastEnumeration-conformant class behind RXMap and RXMap; note that it specifically forbids you from calling it like this:
NSAutoreleasePool *pool = [NSAutoreleasePool new];
for (id element in enumeration) {
[pool drain];
pool = [NSAutoreleasePool new];
}
Or like this:
NSUInteger count = 0;
@autoreleasepool {
count = [enumeration countByEnumeratingWithState:&state
objects:buffer
count:bufferCount];
… do something with the objects in state …
}
@autoreleasepool {
count = [enumeration countByEnumeratingWithState:&state
objects:buffer
count:bufferCount];
… do something with the next batch of objects in state …
}
Why not? Because it stores references to objects in the state struct by casting the itemsPtr to __autoreleasing id *. If you pop the autorelease pool between calls to -countByEnumeratingWithState:objects:count:, it will die in a messy explosion of pointer dereferences and EXC_BAD_ACCESS. This is why we can’t have nice things, Cocoa.
Still and all, the former construct is illegal under ARC and the latter is trivial to avoid; notably, for(in) abides by these restrictions. These is not such a high price to pay to be able to build lazily-evaluated streams of objects and maps over them.
As mentioned before, the implementation is not as direct as the fold we like to think in terms of. Why is that?
NSFastEnumeration has you return your objects via a pointer in the passed-in NSFastEnumerationState struct. ARC forbids object pointers in structs unless they are marked __unsafe_unretained; this causes some syntactic wrinkles at best. (Turning off ARC for this implementation is an option, but not one I was interested in pursuing.)break; in a for(in) loop—you have to save other objects into the state if you want to return temporaries.-countByEnumeratingWithState:objects:count: that in turn enumerates some other object(s) is going to be further complicated—you have to store the NSFastEnumerationState you pass to the delegate enumeration, as well as the resulting objects.for(in) requires the mutations pointer be set to a valid pointer to an unsigned long, since it dereferences it without checking that it isn’t NULL.NSFastEnumeration return batches of objects (thus, NSEnumerator’s one-at-a-time implementation is not ideal!), which means that a chaining implementation has to account for the possibility that the delegate will return fewer or more objects than it itself wants to. This amounts to blocking and buffering, which can be complex to think about and is full of pitfalls.For this reason, having a framework like RXCollections which provides abstractions above these complexities can be quite valuable if you’re wanting to use lazy evaluation in your own systems. RXGenerator can be used to calculate values on demand, and can be used such that it does not mutate state; RXMap is a great way to enumerate collections or generated values and take some action on each; and RXFold lets you collapse the lazily evaluated objects into a single concrete object.
The more esoteric constructs like RXTraversalArray (a lazy NSArray subclass) and RXIntervalTraversal (a straightforward traversal of a closed interval of real numbers backwards or forwards by a given stride or count; sort of like Ruby’s 1.upto(10) or Python’s range(10), but also great for e.g. calculating integrals) may also be interesting, even if only as exposition of what an implementation of NSFastEnumeration can accomplish, or how it can be used.
Despite the above warnings and alternatives, NSFastEnumeration is still worth getting to know a little better. Try implementing a fast enumeration of a sequence of coin flips. Did you generate them and cache them in an array, and then enumerate that? Can you generate them temporarily one by one or in batches and enumerate those? Try implementing a fast enumeration over a tree. Did you do it breadth-first? Depth-first? As this implies, there are both mechanical and algorithmic problems to be solved here, and both are interesting. Here’s my coin toss enumeration:
@implementation CoinToss
-(NSUInteger)
countByEnumeratingWithState:(NSFastEnumerationState *)state
objects:(__unsafe_unretained id [])buffer
count:(NSUInteger)len {
state->mutationsPtr = state->extra;
state->itemsPtr = buffer;
buffer[0] = @1;
return 1;
}
@end
(Terrible line breaks brought to you by my desire to stay within a fixed column width.) This coin toss might be rigged! And it never finishes! And that memory management is sketchy at best—if it works, it’s only because NSNumber is a tagged pointer. Maybe it should limit the number of tosses:
state->state++;
return (state->state++ > 10)? 0 : 1;
And autorelease its temporaries, fixing the problems with its memory management:
__autoreleasing id *temporaries =
(__autoreleasing id *)(void *)buffer;
*temporaries = @1;
That cast is ugly, but at least the assignment is straightforward.
And maybe it should return batches:
NSUInteger initial = state->state;
for (;
state->state < MIN(initial + len, max - initial);
state->state++) {
*temporaries++ = @1;
}
return state->state - initial;
I’ll leave the rigged coin toss in, though. As others have observed, you never really know, with randomness.
What does yours look like?
On my short list—a tremendous misnomer for the vague probability cloud of hopes and intentions I drift through day by day—is a similar set of nouns and verbs for implementing NSFastEnumeration. Abstracting the common task of storing contextual information or autoreleased temporaries into the state struct would be a good start.
NSFastEnumeration, not least of which is the difference between (some formulations of) a stream’s push semantics and NSFastEnumeration’s pull semantics—i.e. a dispatch_io_t will push data to you as it becomes available, but you have to explicitly call -countByEnumeratingWithState:objects:count: to get data out of the enumeration, which comes back to blocking and buffering—but it seems like that would be a good component in this style of lazy evaluation.-countBy…, then you could use the same objects and blocks both as signals and enumerations.There’s a lot of potential here beyond my immediate interests of implementing convolution (aka zip/zipWith) and the like. What would you do with these tools?