Skip to main content

Release 0.9.11

Heey, there!

Quick summary of changes in this release:

  • Every exception extending PregException (so MalformedPatternException, CatastrophicBacktrackingPregException, etc.) have received new method getPregPattern():

    try {
    pattern('foo')->...
    } catch (\TRegx\SafeRegex\Exception\PregException $exception) {
    $exception->getPregPattern(); // '/foo/'
    }

    Some methods still throw \InvalidArgumentException, and of course that exception is unchanged.

  • We brought back Pattern::prepare() (see ChangeLog.md)

  • We added Match.tail() method, which works like offset() but returns the position of the end of the occurrence in the subject (not the start like offset()).

  • tail() also works for MatchGroup and ReplaceMatch.

  • There's also byteTail(), which returns the position in bytes, instead of characters (like byteOffset()).

  • Fixed inconsistencies

    • Duplicated pattern exception message changes offset after PHP 7.3. Since now, the messages will be identical on every PHP version.
  • Added null-safety to some replace methods. Returning null from any of those methods:

    • replace()->callback()
    • replace()->otherwise()
    • replace()->by()->group()->orElse()

    throws InvalidReturnValueException.

  • Renamed some or methods. Previously, what was used to handle the missing first value (result of findFirst()), was also used to specify the replacement of an optional, unmatched group. Sorry to say that, we made a bad decision unifying this interface, since it turns out they're not even remotely connected. What fooled us, was we referred to each as "optional" (even tough one was "optional first match", and the second was "replacement of an optional group).

    In this release, we separate the interfaces, and assign new, better names for the specification of unmatched, optional groups:

    • Renamed pattern()->replace()->by()->group() methods:
      • Renamed orThrow(string) to orElseThrow(string).
      • Renamed orIgnore() to orElseIgnore().
      • Renamed orEmpty() to orElseEmpty().
      • Renamed orReturn(string) to orElseWith(string).
      • Renamed orElse(callable) to orElseCalling(callable).
    • Renamed and added pattern()->replace()->by()->group()->map() methods:
      • Renamed orThrow(string) to orElseThrow(string).
      • Added orElseIgnore().
      • Added orElseEmpty().
      • Renamed orReturn(string) to orElseWith(string).
      • Renamed orElse(callable) to orElseCalling(callable).

As always, everything is described in ChangeLog.md on github.

Release 0.9.10

We've released T-Regx 0.9.10, where we delivered what we described in previous blog post.

There are some renames to make some methods more clear. We also added pattern()->match()->tuple() and pattern()->match()->triple() helper methods.

As always, everything is described in ChangeLog.md on github.

PS: Pattern::prepare() is removed in this release, but is restored back in 0.9.11.

Removal of Pattern::prepare()

Update#

In T-Regx 0.9.10, we decided to remove Pattern::prepare() from T-Regx.

Rationale#

Originally, the idea behind this function was quite simple. We wanted to enable quick and readable parameter binding:

Pattern::prepare(['^http://', [$domain], '/index\.php'])->test($link);

Fairly easy, at the first glance.

Its messyness becomes visible, when regular expressions become few and short, and texts are multiple and long, sometimes with alteration:

Pattern::prepare([[[$http, $https]], '://', [$domain]])->test($link);
// vs.
Pattern::prepare([[[$http, $https], '://', $domain]])->test($link);

You can see, the outer layer array combined with the [$domain] array and array for schema becomes quite hard to read. Additionally, because of alteration, it becomes even more unreadable! It was very easy to misread the parameter array with alteration array, or even regular regex with text.

The solution#

So at first, we decided to remove it from the library, but then, because of one of the users comment in GitHub issues, we decided that perhaps it would be better not to remove the method, but fix it.

In this case, fixing it would be disallowing such confusing constructs as alteration in prepared patterns. So we decided to bring back Pattern::prepare(), but remove alteration so the messy queries won't appear in the source code.

Alteration is still available for Pattern:inject() and Pattern::bind():

Pattern::inject('@://@', [[$http, $https], $domain])->test($link);
Pattern::bind('@scheme://@domain', ['scheme' => [$http, $https], 'domain' => $domain])->test($link);

PHP Quiz for you!

Quiz#

I've prepared a small quiz for you, which you can try right at the main page: t-regx.com :) The quiz is about PHP Vanilla PCRE regular expressions. It's main reason, really is to illustrate the messyness and inconsistencies in the API.

Give it a try :)

Release 0.9.8 - foreach baby, foreach!

Iterables#

Up to this point, you could either use T-Regx methods that return array, in order to iterate them, or use one of the collection methods map(), forEach(), filter(), etc.

Right now any chainable method in T-Regx is also iterate

foreach (pattern('\d+')->match('127.0.0.1') as $match) {
foreach (pattern('\d+')->match('127.0.0.1')->asInt() as $digit) {
foreach (pattern('\d+')->match('127.0.0.1')->all() as $text) {

Shorthand method#

In addition to previous release, when we added asArray() method, we also added a shorthand get() method for capturing groups.

pattern('(origin/)?master')->match('master')->first(function (Match $match) {
$match->get(1); // same as $match->group(1)->text();
});

Release 0.9.7 - Match as vanilla array!

There was a lot of changes in the code, so I reckon we could release twice in the same week, because why not :)

So what are the changes?

Bare with me.

The concept#

Capturing groups in T-Regx have a really rich API (probably the richest out there), with a lot of variables. Most importantly T-Regx handles:

  • Invalid groups (e.g. negative index -1 or malformed group !@#$), which always throw exception
  • Missing groups (e.g. group 4 used in pattern, that only has 2 groups; same for named) which conditionally throws exceptions
  • Optional groups (e.g. (origin/)?master), which is really tricky to distinguish with PCRE
  • Matched groups (which can be tricky, if the matched group is an empty string "")

Because of that, syntax of groups is not the shortest:

pattern('(origin/)?master')->match('master')->first(function (Match $match) {
$match->group(1)->text(); // for example
});

But we know that T-Regx users mostly care about the last group, Matched groups, so they would like to use them with as simple syntax as possible. That makes sense.

The idea#

At first, there was an idea of Match details implementing \ArrayAccess, so this syntax would be possible:

pattern('(origin/)?master')->match('master')->first(function (Match $match) {
$match[1]; // same as above
});

Well, that syntax does look good, at first, but it comes at a price. A high price.

Why we ditched the \ArrayAccess idea:

  • Unnecessary set and unset methods
  • Methods that work for arrays (array_key_exist()) won't work with \ArrayAccess
  • empty($match[1]) returns true, even if the group 1 was matched ("" and "0" yes is falsy)
  • isset($match[-2]) couldn't throw an exception for a malformed group
  • There's a bug in PHP, that causes $match['100'] to be treated as $match[100] (cast to int any numeric value).

The solution#

So, instead, we got an idea: What if $match was a real PHP array. Every method or notation that works for arrays, will also work.

pattern('(origin/)?master')->match('master')->asArray()->first(function (array $match) {
$match[1]; // same as above
});

The structure of the array is perfectly identical to what preg_match() would return :)

Release 0.9.6 - First/all changes!

Another release ahead of us. This one is about T-Regx chainable interface. We made sure, that first() chained with anything always uses preg_match(), instead of preg_match_all().

For example:

pattern($pattern)->match($subject)
->fluent()
->flatMap(function (Match $match) {
return [$match->text() => $match->offset()];
})
->keys()
->groupByCallback(function (string $text) {
return $text[0];
})
->first();

this code will now use preg_match(). Really, this whole release was about ensuring that.

The only exception is filter(), for which it would be really wasteful to call preg_match() first, and then, if the filter has failed, call preg_match_all().

In the future releases we'll make sure that the exception for unmatched first() are also uniform (probably SubjectNotMatchedException, instead of NoSuchFluentElementException).

Boy, are design patterns so cool for this kind of job ;D

Toss a coin to your T-Regx!

Hello, back again! :) We've added a "Sponsor" button on github.com/T-Regx.

Sponsor

If you like T-Regx going in the right direction, now you have the opportunity to throw us buck or two.

Next release#

And a heads up, in the new 0.9.6 release, we'll add a really smart asInt() and fluent() methods; which are already present, but will get an upgrade.

You see, match()->first() calls preg_match() and that makes sense. Also match()->fluent() calls preg_match_all(), because later fluent()->map() or fluent()->filter()->first() can be called, for example. And that also, sorta makes sense. But, unfortunately match()->fluent()->first() and match()->asInt()->first() also call preg_match_all(), and that's a bit wasteful.

So now we're introducing a change (similar to Java 8 Streams) that will call preg_match() for fluent()->first() and asInt()->first().

Release 0.9.5 - Alternation in prepared patterns!

This release brings alternation in prepared patterns!

Up to this point, there was no reasonable way to create a pattern from a variable number of inputs, for example you allow your users to input 0, 1 or more tags, which later should be used in a pattern. In procedural world, probably array_map() with preg::quote() would do the job, but wait! You don't have to code, it's already here:

Pattern::bind('^user:@id/findBy:@tags/all$', [
'id' => $user->id,
'tags' => $_GET['tags']
]);

In other words:

Pattern::bind('My tag is: "@tags"', ['tags' => ['one', 'two', 'three']]);

creates a pattern:

/My tag is: "(one|two|three)"/

Rest assured:

  • the values are quoted with preg::quote(), to protect you from malicious code
  • the group is non-capturing (use 'My tag is: "(@tags)"' for a capturing group, to be used with group())

The alternation is really smart too - if you use i or u flag, T-Regx will perform certain optimization, for example:

Pattern::inject('Find: @ :)', [['foo', 'bar', 'FOO']], 'i');

then it wil collapse foo and FOO, since i flag is used:

/Find: (foo|bar) :)/

That's it in this release! Stay tuned :)

Release 0.9.4 - Exception changes and groupBy()

This release brings updates in exceptions (namespaces, new detailed exceptions) and a groupBy() method.

Exceptions#

In previous release we renamed SafeRegexException to PregException. In this, we're renaming CleanRegexException to PatternException. So now, those two general exceptions sync nicely with their base methods:

try {
return preg::match('/Foo/', $subject);
} catch (PregException $e) {
try {
return pattern('Foo')->test($subject);
} catch (PatternException $e) {

They both extend RegexException - base for all exceptions thrown by T-Regx. So that's the first thing.

The second exception update - previously, every exception thrown based on preg_last_error() method was RuntimePregException. Now, each error has a dedicated exception, which can be caught separately:

try {
return preg::match($pattern, $subject);
} catch (BacktrackLimitPregException $exception) {
} catch (Utf8OffsetPregException $exception) {

The detailed list of changes is in ChangeLog.md.

New method groupBy()#

This release also comes with a brand new method - groupBy() which groups matches by a capturing group (name or index). It can match strings, offsets and also map them with map() and flatMap(). Additionally, it can be chained with filter() to leave out unwanted matches:

return pattern('(\d)(?<unit>cm|mm)')->match($strings)
->filter(function (Match $match) {
return $match->group(1)->toInt() % 2 == 0;
})
->groupBy('unit')
->map(function (Match $match) {
return $match->group(1)->toInt() * 100;
});