Introduction to SPL

Standard PHP Library

Kevin Waterson

http://www.phPro.org

During the next half hour we will endeavour to introduce some of the basic concepts of SPL. SPL has been described as "simple in a complex way" and this is quite true. To the un-initiated it can seem quite daunting, however, after grasping some common concepts, the flow is quite intuitive. To aid in the explanation I have opted to show these concepts by way of example.

What is SPL

Aggregate Structures

SPL provides a standard set of classes and interfaces for PHP5 and PHP6. The aim of SPL is to implement some efficient data access interfaces and classes for PHP. Functionally it is designed to traverse aggregate structures (anything you want to loop over). These may include arrays, database result sets, xml trees, directory listings or any list at all.

Reflection

<pre>

<?php

  Reflection::export(new ReflectionClass('DirectoryIterator'));

?>

</pre>

Class [ class DirectoryIterator extends SplFileInfo implements Iterator, Traversable ] {
...

Using the Reflection API we can see a full list of iterator classes and their methods. Note that all iterators implement the Iterator and Traversable classes. These are in fact interfaces and cannot be instantiated on thier own. Attempting to do so would produce a fatal error saying "Cannot instantiate interface Iterator etc.

How it works

Explicit use

<?php

$array = array('koala', 'dingo', 'kiwi', 'wombat');

$obj = new ArrayIterator( $array );

$obj->rewind();

while($obj->valid()){
    echo $obj->key() . $obj->current();
    $obj->next();
}
?>
Iterators may be used explicitly or implicitly... Here we see the arrayIterator in action and iterating explicitly. A simple array of natives which is then passed to the ArrayIterator constructor when the object is instantiated. The internal pointer is set to the beginning and we loop over the array structure. Iterators may be called Explicitly using rewind() and looping...

How it works

Implicit use

<?php

$obj = new ArrayIterator( array('koala', 'dingo', 'kiwi', 'wombat') );

foreach($obj as $key=>$value){
  // get funky here
}

?>
This snippet shows how an iterator be used implicitly. When foreach is used on the iterator object, the iteration is done internally rather than on a copy of the array which gains us much in memory management. If you can imagine 300 hits per second on a web site, you can imagine how helpful this might be.

So? Whats the diff?

In PHP, arrays are stored internally as zvals, or hash tables. Using foreach creates a copy of this array and loops over it. SPL stores only the current element, so memory usage is optimal. Access is via method calls as we see in the first example.

Directory Iterator

The bad old days

while ($dirEntry = readdir('./')){
    if (substr($dirEntry, 0, 1) != '.'){
        if(!is_file($dirEntry)){
            continue;
            }
        $listing[] = $dirEntry;
        }
    }
closedir($hdl);
foreach($listing as $my_file){
    echo $my_file.'<br />';
    }
In the bad old days, to get something like directory listing would would read the directory path into a handle, check for dot files, if the entry is file would could enter it into an array to be printed or passed elsewhere. Remembering that the use of foreach creates additional overhead when using arrays.

Directory Iterator

A better way

try{
foreach ( new DirectoryIterator('./does-not-exist') as $Item ){
    echo $Item.'<br />';
    }
}
catch(Exception $e){
    echo 'No files Found!<br />';
}
Here we see a better way. The directory is read internally and the looping is optimally performed at a lower level. As you can see, and Exception is thrown on failure giving you the ability to handle errors in any way you wish.

Directory Iterator

Filter out directories.

$it = new DirectoryIterator($path);

while($it->valid()){
    if(!$it->isDir()){
        echo $it->current().'<br />';
    }
$it->next();
}
To filter out the directories within the directory listing, SPL provides for checking with the isDir() method. Similarly, to check for dot files or directories, the of the isDot() method is provided.

Extending the DirectoryIterator

Iterators can be extended as any other class

class directoryReader extends DirectoryIterator{

function __construct($path){
  /*** pass the $path off to the parent class constructor ***/
  parent::__construct($path);
}

}
Here, the directory path is passed off to the parent class constructor where the Iterator class methods are exposed.

Extending the DirectoryIterator


function valid(){
  if(parent::valid())
    {
    if (!parent::isDir())
        {
        parent::next();
        return $this->valid();
        }
    return TRUE;
    }
  return FALSE;
}
By overloading the valid() method and calling the parent (directoryIterator) method isDir() we can check of a member is a directory and return true only for those members. The benifits of this are also the re-use of code. We now have a class we can use over.

Filter Iterator

$it = new DirectoryIterator($path);

class my_filter extends FilterIterator($it){
 ...
}
The previous example works well and can be made to do many things to determine what a valid member is. But for filtering of data, the filter iterator is right tool. Here we show how to create a directory iterator object, and then pass it to the filter iterator. As all operations are implemented on the inner iterator (directory iterator) speed and memory management is optimal, as it is handled at a lower level.

Filter Iterator

$animals = array('koala', 'kangaroo', 'wombat', 'wallaby', 'emu', 'NZ'=>'kiwi', 'kookaburra', 'platypus');

class CullingIterator extends FilterIterator{

public function __construct( Iterator $it ){
  parent::__construct( $it );
}

function accept(){
  return is_numeric($this->key());
}
}
$cull = new CullingIterator(new ArrayIterator($animals));

foreach($cull as $key=>$value){
    echo $key.' == '.$value.'<br />';
    }
This simple snippet shows a simple array of Australian native animals. However, one of the array members does not belong. Note that the Kiwi is not only native to New Zealand, but that its array index is not numeric. By implenting the Accept() method we can filter out all array keys that are non-numeric by calling the php is_numeric() function. Now only array members with an associative array key will be accepted by the filter iterator, thus ridding us of the Kiwi peril.

Another Filter

function accept(){
if($this->current() % 2 != 1){
    return false;
    }
$d = 3;
$x = sqrt($this->current());
while ($this->current() % $d != 0 && $d < $x){
    $d += 2;
    }
 return (($this->current() % $d == 0 && $this->current() != $d) * 1) == 0 ? true : false;
}
To further demonstrate how a filter iterator works, we can simply change the Accept() method to whatever we wish to filter out. Here we will change the Accept method to accept only prime numbers. If anybody has a better prime checker, please feel free to submit.

A moment to review

Limit Iterator

The limit Iterator, as the name implies, is a limitting iterator. Functionally it limits the amount of results returned from the object. Those familiar with SQL will find it disturbingly familiar to the SQL LIMIT clause. Like the SQL LIMIT clause the Offset can also be set. The LimitIterator is an outer iterator, which, as we saw earlier, means it takes another iterator as a parameter for the constructor. I can already hear the gears grinding as you think how this might be ideal for pagination of an aggregate structure such an XML heirachy.
<?xml version = "1.0" encoding="UTF-8" standalone="yes"?>
<document>
  <animal>koala</animal>
  <animal>kangaroo</animal>
  <animal>wombat</animal>
  <animal>wallaby</animal>
  <animal>emu</animal>
  <animal>kiwi</animal>
  <animal>kookaburra</animal>
  <animal>platypus</animal>
</document>
Lets change the array of animals to an XML tree. Please be sure your XML is well formed.
try{
$it = new LimitIterator(new SimpleXMLIterator($xmlstring), 3, 2);
foreach($it as $r)
    {
    // output the key and current array value
    echo $it->key().' -- '.$it->current().'<br />';
    }
}
catch(Exception $e)
    {
    echo $e->getMessage();
    }

animal -- wallaby
animal -- emu

Using the offset and limit variables, we have successfully extracted the XML data by positioning the internal pointer to the third item in the tree (offset) and fetching the next two items (limit). The offset an limit variables could come from POST or GET to aid in your application pagination. We saw here the use of the SimpleXMLiterator as our inner iterator. The SimpleXMLIterator can do much more and will be discussed later.
$offset = 3;

$limit = 2;

$array = array('koala', 'kangaroo', 'wombat', 'wallaby', 'emu', 'kiwi', 'kookaburra', 'platypus');

$it = new LimitIterator(new ArrayIterator($array), $offset, $limit);

foreach($it as $k=>$v)
    {
    echo $it->getPosition().'<br />';
    }
Here we see the same concept, and introduce the ArrayIterator as the inner iterator. We have used the getPosition() method to show the position of the current() object.

Seeking

$array = array('koala', 'kangaroo', 'wombat', 'wallaby', 'emu', 'kiwi', 'kookaburra', 'platypus');

$it = new LimitIterator(new ArrayIterator($array));

try
    {
    $it->seek(5);
    echo $it->current();
    }
catch(OutOfBoundsException $e)
    {
    echo $e->getMessage() . "<br />";
    }
This time we have removed the offset and the limit and the whole inner iterator is exposed. The seek() method has been used to direct the internal pointer to the offset of 5. It is then a simple matter to echo the current() value, in this case the result is kiwi. Like all outer iterators, the LimitIterator allows us access to the inner iterator with the getInnerIterator() method. If an invalid seek position is given, that is, a numeric value greater than the size of the array, and OutOfBounds Exception is thrown and needs to be caught.
$offset = 2;

$array = array('koala', 'kangaroo', 'wombat', 'wallaby', 'emu', 'kiwi', 'kookaburra', 'platypus');

$it = new LimitIterator(new ArrayIterator($array), $offset);

$it->getInnerIterator()->offsetUnset(5);

try {
    foreach($it as $value){
        echo $value.'<br />';
        }
    }
catch(OutOfBoundsException $e){
    echo $e->getMessage() . "<br />";
    }
This time we have omitted the limit parameter and used only the offset. This will allow the LimitIterator to begin at the offset and traverse to the end of the structure. We have also accessed the inner iterator (ArrayIterator) with the getInnerIterator() method and called ArrayIterator::offsetUnset() method to remove the kiwi from our lives.

LimitIterator Revision

Array Object

Array Object

$array = array('koala', 'kangaroo', 'wombat', 'wallaby', 'emu', 'kiwi', 'kookaburra', 'platypus');

$arrayObj = new ArrayObject($array);

for($iterator = $arrayObj->getIterator();
   $iterator->valid();
   $iterator->next())
    {
    echo $iterator->key() . ' => ' . $iterator->current() . '<br />';
    }
In the above script we have externally traversed the array object with getIterator(). We could have used a foreach on the array and the getInstance() method would be called implicitly. The key() and current() methods also belong to the ArrayIterator Instance.

Append to the object

$arrayObj = new ArrayObject($array);

$arrayObj->append('dingo');
Of course we can append to the array object simply by using the append() method();

Sorting

$arrayObj = new ArrayObject($array);

$arrayObj->natcasesort();
The arrayObject allows us to access to access some of the sorting functions typically used with PHP itself. Here the natcasesort() function is used to allow alphabetical ordering of the array.

Count

$array = array(koala, kangaroo, wombat, wallaby, emu, kiwi, kookaburra, platypus);

$arrayObj = new ArrayObject($array);

echo $arrayObj->count();
And what array access class would be complete without a method to count them. Within arrayOject we use the count method. Genius!

Unset

$arrayObj = new ArrayObject($array);

$arrayObj->offsetUnset(5);
As with normal array access the ability to unset a member of the array object is available with offsetUnset.

offsetSet

$array = array('koala', 'kangaroo', 'wombat', 'wallaby', 'emu', 'kiwi', 'kookaburra', 'platypus');

$arrayObj = new ArrayObject($array);

$arrayObj->offsetSet(5, "galah");
Rather than simply removing the array member, we could simply change the value at a given key. The kiwi, having a key of 5 can simply be set to a new value using offsetSet. Similarly we access the value of the member with offsetGet.

Caching Iterator

<pre>
<?php

  Reflection::export(new ReflectionClass('cachingIterator'));

?>
</pre>

class CachingIterator extends IteratorIterator implements OuterIterator, Traversable, Iterator, ArrayAccess, Countable ]

The Caching iterator is kind of the swiss army chainsaw of iterators. A quick look using the Reflection API shows us that it extends the iteratorIterator, (that is, an iterator that iterates over other iterators), and impliments OuterIterator, Traversable, Iterator, ArrayAccess, and Countable. Quite an array of iterators which give access to many of the methods we have used previously. The most notable addition to all these is the hasNext() method.

hasNext()

$array = array('koala', 'kangaroo', 'wombat', 'wallaby', 'emu', 'kiwi', 'kookaburra', 'platypus');

try {
    $object = new CachingIterator(new ArrayIterator($array));
    foreach($object as $value){
        echo $value;
        echo $object->hasNext() ? ',' : null;
            }
        }
    }
catch (Exception $e)
    {
    echo $e->getMessage();
    }
As the name implies, the hasNext() method determines if there is another member available after the current() member. This is amazingly useful when we need to do something like making a CSV where comma's are appended to each member except the last. No longer to we need to set a counter, then loop over the array with foreach and increment the counter on each loop.

SimpleXML Iterator

<pre>
<?php

  Reflection::export(new ReflectionClass('SimpleXMLIterator'));

?>
</pre>
The SimpleXMLIterator is, as the name suggests, quite simple in its implementation. This iterator takes a well formed xml structure (document) and can be traversed as with any aggregate structure. When using Reflection we see the SimpleXMLIterator extends SimpleXMLElement and is a recursive iterator as it implements the recursiveIterator.

SimpleXML Iterator

try    {
       $sxi = simplexml_load_string($xmlstring, 'SimpleXMLIterator');
       foreach(new RecursiveIteratorIterator($sxi, 1) as $name => $data)
          {
          echo $name.' -- '.$data.'<br />';
          }
    }
catch(Exception $e)
    {
    echo $e->getMessage();
    }
In order to traverse an XML tree, the SimpleXMLIterator needs an iterator, that will iterate over a recursive iterator itself, to this end, we use the RecursiveIteratorIterator. With this in place, we can successfully traverse the entire structure to produce a list of all the elements and thier values by utilizing the simplexml_load_stirng() method, which provides an object who's return value is loaded directly into the SimpleXMLIterator construct.

SimpleXML Iterator

try {
    $sxi =  new SimpleXMLIterator($xmlstring);

    foreach ( $sxi as $node )
        {
        foreach($node as $k=>$v)
            {
            echo $v->node_name.'<br />';
            }
        }
    }
catch(Exception $e)
    {
    echo $e->getMessage();
    }
Getting a list of XML nodes could be done as we see here. What has been done in the above script is to instantiate a new SimpleXmlIterator object. This, to me, looks rather cumbersome, so lets try another way.

SimpleXML Iterator

try {
$sxi = simplexml_load_string($xmlstring, 'SimpleXMLIterator');

for ($sxi->rewind(); $sxi->valid(); $sxi->next())
    {
    if($sxi->hasChildren())
        {
        foreach($sxi->getChildren() as $element=>$value)
          {
          echo $value->node_name.'<br />';
          }
        }
     }
   }
catch(Exception $e)
   {
   echo $e->getMessage();
   }
This time we have used the simplexml_load_string() function to load the xml string directly into the SimpleXMLIterator class. we have also used a conditional statement with SimpleXmlIterator::hasChildren() to check for the existence of child nodes. If a child node is found, the SimpleXmlIterator::getChildren() method gets the node values. I benchmarked both these methods and found that the better effeciency was to be had with the second. The reality is that both these methods suffer and a better way is to had with xpath.

SimpleXML Iterator xPath

path/to/node_name
The SimpleXmlIterator::xpath() method allows direct iteration over an XML tree by specifying the xpath to the node. Similar to a file system path, the xpath is of a node in an XML document looks like this:

SimpleXML Iterator xPath

try {
    $sxi =  new SimpleXMLIterator($xmlstring);

    /*** set the xpath ***/
    $foo = $sxi->xpath('path/to/node_name');

    foreach ($foo as $k=>$v){
        echo $v.'<br />';
        }
    }
catch(Exception $e)
    {
    echo $e->getMessage();
    }
This method will produce the same results as the previous two methods with much cleaner code. However, the it is not scrictly re-usable where the previous example gave us some better options in this regard by being able to create a custom function for this purpose. Depend on your situation any of the above methods will work. Lets look at a little more.
<?xml version = "1.0" encoding="UTF-8" standalone="yes"?>
<document>
    <ignorance>
    I dont want to know
        <discovery>
        Whats this then?
            <awareness>
            This could be useful
                <acceptance>
                I must use this
                    <enlightenment>
                    SPL
                    </enlightenment>
                </acceptance>
            </awareness>
        </discovery>
    </ignorance>
</document>
Here is a simple xml tree we can use for our next example...

SimpleXML Iterator xPath

try {
    $sxi =  new SimpleXMLIterator($xmlstring);

    $foo = $sxi->xpath('ignorance/discovery/awareness/acceptance/enlightenment');

    foreach ($foo as $k=>$v){
        echo $v.'<br />';
        }
    }
catch(Exception $e)
    {
    echo $e->getMessage();
    }
The XML tree above has five levels from the document root. Using xpath we have traversed the tree directly to the node we seek. The saving in iterations here is obvious as we no longer need to set up a foreach heirachy to traverse the tree on the. The xpath to enlightenment just got easier.

SimpleXML Iterator Revision

SplFileObject

$lines = file('myfile.txt'); // Loop through our array, show line numbers and line. foreach ($lines as $line_num => $line) { echo 'Line: '. $line_num .' -> ' . $line. '<br />'."\n"; }
If you have followed up to now, this part should be easy. In the bad old days, to read the lines in a text file, you would do something like this. The worst of this method is that if the file is a log file it may be up to 100 megs. Should the script be accessed many times it will quickly eat up available memory. Remember also, that using foreach will make a copy of the array internally.

SplFileObject

try{
        $file = new SplFileObject("/usr/local/apache/logs/access_log");

        while($file->valid()){
                echo $file->current().'<br />';
                $file->next();
                }
        }
catch (Exception $e)
        {
        echo $e->getMessage();
        }
Lets assume our text file is an apache log file. We have created an object oriented approach that should be familiar if you have followed up to now. It follows the same standard approach. This is the true power of SPL in giving us flexibility within a constrained and standardised development environment.

SplFileObject

try{
    $file = new SplFileObject("/usr/local/apache/logs/access_log");

    $file->seek( 3 );

    echo $file->current();
        }
catch (Exception $e)
    {
    echo $e->getMessage();
    }
Of course the SplFileObject does not limit us in our using of all the filesystem functions we have been acustom to in the past. Like the filesystem functions we have great flexibility in how we manipulate our file data. Here we see an example using the seek() method.

SPL and PDO

SPL and PDO

$db = new PDO($dsn);

foreach($db->query("SELECT * FROM table")){
    // do stuff here
}
The PDO extension implements the traversable iterator internally. This gives amazing memory benifits when you need to iterate over large SQL result sets. The new image magick extension also makes use of SPL for iteration of pixels etc.

Summary

Thank You, Questions?

As shown in these examples, SPL provides us with a toolkit to iterate over almost any aggregate structure, The benifits of this is that it is a Standard and so can be used by all to provide better coding practices. You will see often in you code that much of it is based on creating and traversing aggregate stuctures as those shown above. We hope this gets you started on your way to implementing this in your code base. The PHP manual is a little sparse on SPL documentation so I have gathered some together at phpro.org