php 5.3: Speed Optimizations for Strings and Arrays

January 27, 2012

updated 2020-02-03

str, ctype, preg, explode, etc..

example: switch y-m-d to m-d-y
list($YY,$mm,$dd) = explode("-",$date_stored);
$displaydate = $mm."/".$dd."/".$YY;

Real-World Experience

Buffering
A customer I contracted with had the experience of rare cases where visitors to his site would experience the computer starting to process their page request and then being sidelined while the cpu co-processed other visitors page requests – for as much as a minute and a half, before finally finishing the process!
The pages had function calls and reads and writes to the database all intermingled with echo’s to their browser causing the job (database updates and page construction) to get put on hold while the processor co-processed other jobs (that did the same thing).

PHP provides output-buffering to avoid this. ob_start()

“To output what is stored in the internal buffer, use ob_end_flush()
Output buffers are stackable, that is, you may call ob_start() while another ob_start() is active. Just make sure that you call ob_end_flush() the appropriate number of times.”

Caching
“One of the secrets of high performance is not to write faster PHP code, but to avoid re-executing the same PHP code by caching a generated HTML page in a file. The PHP script is only run once and the HTML is captured, and future invocations of the script will load the cached HTML.”

For example, two popular open-source caching programs for wordpress are Hyper-cache, and wp-super-cache.


ctype

It should be noted that ctype functions are always preferred over regular expressions, and even to some equivalent str_* and is_* functions. This is because of the fact that ctype uses a native C library and thus processes significantly faster.

ctype_alnum — Check for alphanumeric character(s)
Returns TRUE if every character in text is either a letter or a digit,
Returns FALSE otherwise.
ctype_alpha — Check for alphabetic character(s)
ctype_cntrl — Check for control character(s)
ctype_digit — Check for numeric character(s)
Returns TRUE if every character in text is a decimal digit: [0-9]
ctype_graph — Check for any printable character(s) except space
ctype_lower — Check for lowercase character(s)
ctype_print — Check for printable character(s)
ctype_punct — Check for any printable character which is not whitespace
or an alphanumeric character
ctype_space — Check for whitespace character(s)
ctype_upper — Check for uppercase character(s)
ctype_xdigit — Check for character(s) representing a hexadecimal digit


 

ereg()

Warning: The ereg() function was DEPRECATED in PHP 5.3.0, and REMOVED in PHP 7.0.0.

replace
if ( ereg(‘ ‘, $string) )
with
if ( strpos($string, ' ') )

replace
if ( eregi(‘sub’, $string) )
with
if ( stripos($string, 'sub') )

 


if (substr($fileName, -4, 4) !== '.php')
$fileName .= '.php';

strpos() – Find position of first occurrence of a string
substr() – Return part of a string
strstr() – Find first occurrence of a string
substr_count() – Count the number of sub-string occurrences

php.net:
Try to avoid preg_match() if you only want to check if one string is contained in another string. Use strpos() or strstr() instead, as they will be faster.
“If you don’t need fancy replacing rules (like regular expressions), you should always use str_replace() or str_ireplace() instead of preg_replace() or pregi_ .” … The str_ functions are much faster.

However, if you do need fancy replacing rules …

Metacharacter Matches:

[:alnum:] alphabetic characters and numeric characters.
Including 0-9, A-Z, and a-z.
[:alpha:] alphabetic characters. A-Z, or a-z.
[:blank:] Space or tab.
[:cntrl:] Control characters. Usually from the Command line,
including ^x, ^c, ^h, etc.
[:digit:] Digits only. 0-9.
[:graph:] Non-blank (not spaces, control characters, or the like).
[:lower:] Lowercase alphabetics. a-z.
[:print:] Like [:graph:], but includes the space character.
[:punct:] Punctuation Characters. , ” ‘ ? ! ; : . etc
[:space:] all whitespace characters ([:blank:], newline,
carriage return, and the like)
[:upper:] Uppercase alphabetics. A-Z.
[:xdigit:] characters allowed in a hexadecimal number (i.e., 0-9a-fA-F).

\b A word boundary, the spot between word (\w) and non-word (\W) characters
\B A non-word boundary.
\d A single digit character.
\D A single non-digit character.
\n The newline character. (ASCII 10)
\r The carriage return character. (ASCII 13)
\s A single whitespace character.
\S A single non-whitespace character.
\t The tab character. (ASCII 9)
\w A single word character – alphanumeric and underscore.
\W A single non-word character.

Many scripts tend to rely on regular expression to validate the input specified by user. While validating input is a superb idea, doing so via regular expression can be quite slow. In many cases the process of validation merely involved checking the source string against a certain character list such as A-Z or 0-9, etc… Instead of using regex in many instances you can instead use the ctype extension (enabled by default since PHP 4.2.0) to do the same. The ctype extension offers a series of function wrappers around C’s is*() function that check whether a particular character is within a certain range. Unlike the C function that can only work a character at a time, PHP functions can operate on entire strings and are far faster then equivalent regular expressions.

Examples:

preg_match(“![0-9]+!”, $foo);
vs ctype_digit($foo);

// both below will be true if characters aren’t 0-9, A-Z or a-z.
if(preg_match(‘/[^0-9A-Za-z]/’,$test_string)) }{ // the /’s are now required.

in order to deal with UTF-8 texts, without having to recompile php with the PCRE UTF-8 flag enabled, you can just add the following sequence at the start of your pattern: (*UTF8)

for instance : ‘#(*UTF8)[[:alnum:]]#’ will return TRUE for ‘é’ where ‘#[[:alnum:]]#’ will return FALSE

if(preg_match(“/^http/”, $url)) {
vs.
if(strpos($url, “http”) === 0) {

strpos() is always faster (about 2x) for short strings like a URL but for very long strings of several paragraphs (e.g. a block of XML) when the string doesn’t start with the needle preg_match as twice as fast as strpos() as it doesn’t scan the entire string.

“seems to work:”
$regex .= “([a-z0-9-.]*)\.([a-z]{2,3})”; // Host or IP
$regex .= “(\:[0-9]{2,5})?”; // Port
$regex .= “(\/([a-z0-9+\$_-]\.?)+)*\/?”; // Path

neat function to highlight Search Words

function highlight($words, $content) {
#
$split_subject = explode(" ", $content);
$split_word = explode(" ", $words);
#
foreach ($split_subject as $k => $v){
foreach ($split_word as $k2 => $v2){
if($v2 == $v){
$split_subject[$k] = "".$v."";
}
}
}
return implode(' ', $split_subject);
}

possible test for valid UTF-8 character range compatibility:
$invalid = preg_match(‘@[^\x9\xA\xD\x20-\x{D7FF}\x{E000}-\x{FFFD}\x{10000}-\x{10FFFF}]@u’, $text) ;


PHP Optimization Tricks

1) When working with strings and you need to check that the string is of a
certain length:
Ex.
if (strlen($foo) < 5) { echo "Foo is too short"; } vs. if (!isset($foo{5})) { echo "Foo is too short"; } Calling isset() happens to be faster then strlen() because unlike strlen(), isset() is a language construct and not a function meaning that it's execution does not require function lookups and lowercase. This means you have virtually no overhead on top of the actual code that determines the string's length. Another common operation in PHP scripts is array searching. Ex. $keys = array("apples", "oranges", "mangoes", "tomatoes"); if (in_array('mangoes', $keys)) { ... } vs $keys = array("apples" => 1, “oranges” => 1, “mangoes” => 1, “tomatoes” => 1);
if (isset($keys[‘mangoes’])) { … }

The “isset” mechanism is roughly 3 times faster!


foreach vs. while or for

With foreach the entire array is read. there is no array-index checking and it is faster.
in one 1.5 million element array test, about 40% faster. Elapsed time:
0.344 seconds – foreach()
0.506 seconds – while()
0.530 seconds – for()

foreach($array as $a) 
   {    } 
$i = 0;  $length = count($array);
while($i < $length) 
   {    } 

or ...

do 
   {    } 
while($i < $length) ; 

#(evaluates after each iteration)

$length = count($array);
for($i=0;$i < $length;++$i) 
   {    } 

note: if count($array) is not evaluated first, like
$length = count($array);
for($i=0;$i < $length;++$i)

then, this
for($i=0;$i < count($array);++$i)
or
while($i < count($array))
results in the count() function being called over and over, making the for or while loop incredibly(!) slow.


When incrementing or decrementing, the value of the variable $i++ happens to be a tad slower than ++$i. This is something PHP specific and does not apply to other languages, so don't go modifying your C or Java code thinking it'll suddenly become faster, it won't. ++$i happens to be faster in PHP because, instead of 4 opcodes used for $i++ you only need 3. Post incrementation actually causes the creation of a temporary var that is then incremented. While pre-incrementation increases the original value directly. This is one of the optimizations that opcode optimized like Zend's PHP optimizer. It is a still a good idea to keep in mind since not all opcode optimizers perform this optimization and there are plenty of ISPs and servers running without an opcode optimizer.

When it comes to printing text to screen PHP has so many methodologies to do it, not many users even know all of them. This tends to result in people using output methods they are already familiar from other languages. While this is certainly an understandable approach it is often not best one as far as performance in concerned.

print vs echo

Eventhough both of these output mechanisms are language constructs, if you benchmark the two you will quickly discover that print() is slower than echo(). The reason for that is quite simple, the print function will return a status indicating if it was successful or not, while echo simply prints the text and nothing more. Since in most cases (haven't seen one yet) this status is not necessary and is almost never used it is pointless and simply adds unnecessary overhead.

printf

Using printf() is slow for multitude of reasons and I would strongly discourage it's usage unless you absolutely need to use the functionality this function offers. Unlike print and echo printf() is a function with associated function execution overhead. Moreover printf() is designed to support various formatting schemes that for the most part are not needed in a language that is typeless and will automatically do the necessary type conversions. To handle formatting printf() needs to scan the specified string for special formatting code that are to be replaced with variables. As you can probably imagine that is quite slow and rather inefficient.

example of printf:

$heading1 = "Label 1";
$heading2 = "Label 2";

$value1 = "31298";
$value2 = "98";

printf ("%'.-15.15s%'.6.6s\n", $heading1, $value1);
printf ("%'.-15.15s%'.6.6s\n", $heading2, $value2);
see also: number_format()

Leave a Reply

We try to post all comments within 1 business day