Why PHP sucks?

If you'd like to learn the reasons behind certain T-Regx feature, and know how it manages to supersede PHP regular expressions, read on.

What's wrong with PHP Regular Expressions:#

PHP regular expressions API is far from perfect. Here's only a handful of what's wrong with it:

PHP is Implicit#

You are probably a PHP developer. I would like to get 'Robert likes apples'. Can you tell me which is the correct signature for this task?

preg_replace('/Bob/', 'Robert', 'Bob likes apples'); // pattern, replacement, subject
// or
preg_replace('/Bob/', 'Bob likes apples', 'Robert'); // pattern, subject, replacement
// ??

PHP is Unintuitive#

Programming languages are tools created to solve problems. An experienced programmer should be able to look at the code and tell what it does.

  • Whole set of regular expressions with PHP throws all kinds of notices, warnings, errors and fatal errors, as well as silently ignoring invalid data.
  • Matching API has two functions: preg_match() (first) or preg_match_all().
  • Replacing API has four functions: preg_replace(), preg_replace_callback(), preg_replace_callback_array() and preg_filter().
  • preg_replace() and other replacing functions have two optional int parameters, and I never know which is $limit and which is &$count.
  • Function which does replacing is named preg_filter().
  • Matching returns an array of arrays, which contain either a string, null, or an array of nulls, strings and ints. What type exactly is returned depends on the runtime subject and the order of the values.
  • Functions with 4, 5, 6 parameters (3-4 of which are optional).

PHP is Messy#

  • PREG_OFFSET_CAPTURE is a nightmare! It changes return type from "an array of arrays" to "an array of arrays of arrays".
  • PREG_SET_ORDER / PREG_PATTERN_ORDER change return values. It's either "groups of matches" or "matches of groups", depending on the flag.

The worst part? You find yourself looking at this code:

return $match[1][0];

having no idea what. it. does. You have to see whether you're using preg_match() or preg_match_all() and whether any of PREG_SET_ORDER/PREG_PATTERN_ORDER/PREG_OFFSET_CAPTURE were used.

And to refactor it, later? Replace $match[1] with array_map($match, ...). Good luck. With that.

PHP is Inconsistent#

PHP is Deliberately buggy#

  • preg_match() and preg_match_all() return either:

    • (int) x - a number of matches, if a match is found
    • (int) 0 - if no matches are found
    • (bool) false - if a runtime error occurred

    So if you do just this:

    if (preg_match('//', '')) {

    there's no way of knowing whether your pattern is incorrect or whether it's correct, but your subject isn't matched by your pattern.

    You need to remember to add an explicit !== false check each time you use it.

  • All preg_* functions only return false/null/[] on error. You have to remember to call preg_last_error() to get some insight in the nature of your error. Of course, it only returns int! So you have to look up that 4 is "invalid utf8 sequence" and 2 is "backtrack limit exceeded".

  • However, false-check and preg_last_error() can only save you from runtime errors. So called compile errors don't work that way and require either setting a custom error handler (bad idea) or read and clear just one of those errors (good luck with errors in preg_replace_callback() for example).

  • preg_filter() for arrays returns [] if an error occurred; even though [] is the perfectly valid result for this function. For example, it could have filtered out all values or its input was an empty array right from the beginning.

  • For certain parameter types, some PCRE methods (e.g. preg_filter()) raise fatal errors terminating the application.

  • preg_quote() completely ignores whitespace, which should be quoted when used with x flag.

PHP silently ignores invalid arguments#

T-Regx showcase#

That's why T-Regx happened. It addresses all of PHP regular expressions flaws.

T-Regx eliminates gotcha's#

PHP PCRE API is full of false negatives and false positives. For example, missing group in preg_match() doesn't necessarily mean the group doesn't exist or wasn't matched. It's just a "gotcha" set for you by PHP.

T-Regx performs all the necessary ifology and checks to verify that methods that return true and false are really true or false :)

If, because of reasons, there isn't a way to determine something with absolute certainty (like the index of a group with J modifier), then T-Regx API simply doesn't have index() method for usingDuplicateName().group().

T-Regx maps warnings and errors to exceptions#

If you try to use an invalid regular expression in Java or JavaScript, you would probably get a SyntaxError exception, so you'd be forced to handle it. Such things don't happen in PHP regular expressions.

T-Regx always throws an exception and never issues any warnings, fatal errors, errors or notices.

try {
return pattern('Foo')->match('Bar')->all();
}
catch (PatternException $exception) {
// handle the error
}

Furthermore, T-Regx throws different exceptions for different errors:

  • SubjectNotMatchedException
  • MalformedPatternException
  • FlagNotAllowedException
  • GroupNotMatchedException
  • NonexistentGroupException
  • InvalidReplacementException
  • InvalidReturnValueException
  • CatastrophicBacktrackingPregException
  • RecursionLimitPregException
  • Utf8OffsetPregException

They all extend PatternException though.

Further, furthermore, if you pass an invalid data type to any of the T-Regx methods, \InvalidArgumentException is thrown.

T-Regx is clean and simple#

You will not find arrays, of arrays, of arrays in T-Regx API. Each functionality has a dedicated set of methods.

pattern($pattern)->match($subject)->first(function (Detail $detail) {
$detail->offset(); // offset of a matched occurrence
$detail->group(2)->offset(); // offset of a matched capturing group
$detail->group(-3); // throws \InvalidArgumentException
});

T-Regx unifies the differences between matching and replacing#

Matching

pattern($pattern)->match($subject)->first(function (Detail $detail) {
$detail->offset(); // exactly the same interface
$detail->group(2)->offset();
$detail->group(-3);
});

Replacing:

pattern($pattern)->replace($subject)->first()->callback(function (Detail $detail) {
$detail->offset(); // exactly the same interface
$detail->group(2)->offset();
$detail->group(-3);
});

Read more about Detail.

T-Regx provides rich API for building patterns#

Because of Pattern::prepare(), Pattern::inject(), Pattern::bind(), Pattern::compose(), Pattern::format() and Pattern::template() there is never a need for using preg_quote() yourself.

For example to build pattern with un-safe data, instead of building pattern with preg_quote(), simply use:

Pattern::prepare(["(My|Our) (dog|cat) names are ", [$dog], ' and ', [$cat], '!']);

or

Pattern::inject("(My|Our) (dog|cat) names are @ and @!", [$dog, $cat]);

T-Regx is really smart with its exceptions#

We really did put a lot of thoughts to make T-Regx secure, so for example these code snippets aren't a big deal:

pattern('\w+')->replace($subject)->all()->callback(function (Detail $detail) {
try {
return pattern('intentionally (( invalid {{ pattern ')->match('Foo')->first();
}
catch (MalformedPatternException $ex) {
// it's all good and dandy
// this exception $ex here, won't interfere with the pattern "outside"
return $detail;
}
});

In other words, warnings and flags raised by the inner pattern()->match() invalid call will be represented as MalformedPatternException, and won't interfere with the outer pattern()->replace().

Last updated on