PHP 5.5 Generators
Among the new features that will be introduced in PHP 5.5, the probably most exiting one is the concept of generators.
What are Generators?
Lets first look at what Wikipedia has to say about generators:
In computer science, a generator is a special routine that can be used to control the iteration behaviour of a loop.
and
A generator is very similar to a function that returns an array, in that a generator has parameters, can be called, and generates a sequence of values.
So basically a generator is a special function that creates a sequence of values and is used in loops. What then is the difference to a regular function returning an array?
There are cases where you cannot generate all values at once (think of infinite sequences or memory limitations) and this is where generators come in very handy as they produce the values on demand.
Generators in other languages
The concept of generators is not new. Implementations have long existed in other languages such as Ruby (called enumerators), Python, JavaScript and C#.
def fib():
last = 0
current = 1
yield 1
while True:
current = last + current
last = current - last
yield current
for i in fib():
print(i)
Generators first appeared back in 1975 in a language called CLU which was created at MIT. CLU also introduced other features such as multiple return values and multiple assignment that influenced modern languages like Ruby, Python and Go.
PHP
First lets start off by converting above Python script into its PHP equivalent:
function fib() {
$last = 0;
$current = 1;
yield 1;
while (true) {
$current = $last + $current;
$last = $current - $last;
yield $current;
}
}
foreach (fib() as $number) {
echo $number . "\n";
}
The fib()
function implements a generator that produces the fibonacci numbers one by one. For each computed value it uses the yield
keyword with the value as parameter. PHP 5.5 automatically treats every function that contains a yield
call as a generator.
Generators are implemented as a subclass of the Iterator
class.
final class Generator implements Iterator {
void rewind();
bool valid();
mixed current();
mixed key();
void next();
mixed send(mixed $value);
}
The initial call to a generator function will return an object of the Generator class. This can easily be tested:
function gen() {
yield 'hello';
}
$gen = gen();
echo get_class($gen);
// => "Generator"
Iterators have been available in PHP for quite some time and implementing Generators in the same manner, makes them automatically usable in places where you earlier would have used an Iterator. So we now know how a Generator is implemented, but what about that yield
keyword?
Yield
The yield
keyword behaves much like return
. It passes a value back to the caller of the function, but instead of completely removing the function from the stack, its state is saved for re-entry. Whenever the generator function is called again (in this case a call to next()
on the Generator object - remember, it implements the Iterator interface) the execution is passed back to the point of the last yield
call.
Yielding Keys
In PHP, unlike other languages, Iterators always consist of a key and a value. Yield has support for yielding keys using a similar syntax to that used in foreach loops and associative arrays: yield $key => $value;
. Internally generators need to generate key is none way explicitly yielded. The behaviour of arrays was adapted here: By default the keys start with the integer value 0 and auto-increment by one. If an explicitly yielded key is larger than the current key, it is used as the new starting-point for auto-generated keys.
function gen() {
yield 'a';
yield 'b';
yield 'key' => 'c';
yield 'd';
yield 10 => 'e';
yield 'f';
}
foreach (gen() as $key => $value) {
echo $key, ' => ', $value, "\n";
}
// outputs:
0 => a
1 => b
key => c
2 => d
10 => e
11 => f
Other keys, like strings, do not affect the mechanism of auto-incremented keys.
Benchmark
To test the performance of generators against other common methods of iteration Nikita Popov wrote a micro benchmark that tested the execution time of a simple loop with one hundred, ten thousand and one million iterations. The methods tested are a iterator implementation, the range function, a self-generated array and generators (here xrange).
His initial tests showed that the generators consistently outperform all other methods by at least factor 1.5 given a high number of iterations (>= 10000).
I ran his benchmark on my Thinkpad x61s with a self-compiled PHP 5.5 Alpha 1 on Ubuntu 12.04 and used memory_get_peak_usage()
to get the maximum amount of memory used for each test. The tests were separated into multiple files to ensure that the memory usage really comes from a single method.
xrange (100) took 0.00011491775512695 seconds.
xrange (100) used 246080 bytes of memory.
xrange (10000) took 0.0027370452880859 seconds.
xrange (10000) used 246080 bytes of memory.
xrange (1000000) took 0.24474191665649 seconds.
xrange (1000000) used 246080 bytes of memory.
range (100) took 0.006105899810791 seconds.
range (100) used 250840 bytes of memory.
range (10000) took 0.0058670043945312 seconds.
range (10000) used 1727528 bytes of memory.
range (1000000) took 0.50974011421204 seconds.
range (1000000) used 144624904 bytes of memory.
urange (100) took 0.00012612342834473 seconds.
urange (100) used 252208 bytes of memory.
urange (10000) took 0.0051400661468506 seconds.
urange (10000) used 1728808 bytes of memory.
urange (1000000) took 0.6002779006958 seconds.
urange (1000000) used 144626280 bytes of memory.
RangeIterator (100) took 0.00048303604125977 seconds.
RangeIterator (100) used 249160 bytes of memory.
RangeIterator (10000) took 0.0093441009521484 seconds.
RangeIterator (10000) used 249160 bytes of memory.
RangeIterator (1000000) took 0.90639901161194 seconds.
RangeIterator (1000000) used 249160 bytes of memory.
The tests were run multiple times which showed that the results obtained are consistent over time. The results in the listing above come from a single run and confirm the numbers obtained by Nikita Popov. They also show, that generators are memory efficient in that their memory usage is in O(1). The memory usage for range and urange scale linearly and quickly let PHP run into its memory limit.
Note: The benchmark done here is still very simple and results in real use cases can vary (e.g. iterating over objects instead of numbers).
Conclusion
Generators seem to outperform Iterators and arrays for big data sets in both, speed and memory efficiency. They are also much faster to implement, since you do not have to extend the Iterator class. Personally, this is one of the few features in PHP 5.5 I am really excited about, but given the slow adoption rate of new PHP versions by most web hosts, I fear that it will take a long time until we see them used in popular projects. PHP 5.5 itself is roughly scheduled for March 2013 (one year after PHP 5.4 was released), so implementation details might change until then.