Advanced replace details

Introduction#

When using pattern()->match() all callbacks receive one parameter when called - Detail. You can learn more about it on Detail page.

However, when using pattern()->replace() the callback receives ReplaceMatch details object. It extends Detail object, so they have exactly alike interfaces.

Additionally, ReplaceMatch has two separate methods:

  • ReplaceMatch.modifiedSubject(): string
  • ReplaceMatch.modifiedOffset(): int

They work similarly to offset() and subject() methods, but they take into account results of previous callbacks. Basically, you can see into the process of the new string being built.

  • modifiedSubject() - current state of a subject being built.
  • modifiedOffset() - occurrence's offset, but according to a current the modifiedSubject()

Examples#

modifiedSubject() example#

Given a pattern, that matches capitalized words:

$subject = 'Me, Rihanna and my Mom really like Sweden';
$result = pattern("[A-Z][a-z]+")->replace($subject)->all()->callback(function ($detail) {
$detail->subject(); // Me, Rihanna and my Mom really like Sweden
return '____';
});

having iterated the subject looking for [A-Z][a-z]+ - for each Detail the result of Detail.subject() method would always be the same. There are 4 occurrences matched by the pattern, so callback is invoked 4 times, and each time $detail->subject() is equal to:

Me, Rihanna and my Mom really like Sweden

However, results of ReplaceMatch.modifiedSubject() would also contain results of previous replacements.

Me, Rihanna and my Mom really like Sweden
____, Rihanna and my Mom really like Sweden
____, ____ and my Mom really like Sweden
____, ____ and my ____ really like Sweden

And the $result would be equal to

____, ____ and my ____ really like ____

modifiedOffset() example#

Have you iterated the subject looking for [A-Z][a-z]+, these would be the results of Detail.offset() method.

Me, Rihanna and my Mom really like Sweden
โ†‘
offset() // 0
Me, Rihanna and my Mom really like Sweden
โ†‘
offset() // 4
Me, Rihanna and my Mom really like Sweden
โ†‘
offset() // 19
Me, Rihanna and my Mom really like Sweden
โ†‘
offset() // 35

But, if instead of ReplaceMatch.offset() you use ReplaceMatch.modifiedOffset(), these are the results:

Me, Rihanna and my Mom really like Sweden
โ†‘
modifiedOffset() // 0
offset() // 0
____, Rihanna and my Mom really like Sweden
โ†‘
modifiedOffset() // 6
โ†‘
offset() // 4
____, ____ and my Mom really like Sweden
โ†‘
modifiedOffset() // 18
โ†‘
offset() // 19
____, ____ and my ____ really like Sweden
โ†‘
modifiedOffset() // 35
offset() // 35

Capturing groups#

Method modifiedOffset() as well as modifiedSubject() are available for groups (which when replacing are of type ReplaceDetailGroup extends DetailGroup.

$subject = 'Me, Rihanna and my Mom really like Sweden';
$result = pattern("[A-Z]([a-z]+)")->replace($subject)->all()->callback(function ($detail) {
$group = $detail->group(1);
$group->modifiedSubject();
$group->modifiedOffset();
return '____';
});

When used on group, the modifiedOffset() returns the offset at which the captured group is present in the modified subject, not the offset at which the whole match was captured.

modifiedSubject() for groups returns exactly the same value as modifiedSubject() for ReplaceDetail.

Performance#

But be sure, each and every of those examples only uses one call to preg_replace_callback(). T-Regx simply remembers the length of the replacement returned from callback(), and adds it to modifiedOffset(), when called.

Bytes vs. characters#

When used on ReplaceDetail (whole match) or ReplaceDetailGroup (capturing group), method modifiedOffset() returns character position.

To read byte position, use byteModifiedOffset():

$subject = 'Fรณรณ, Lฤ™ฤ™, ลšฤ‡ฤ‡';
$result = pattern("(\w+)", 'u')->replace($subject)->all()->callback(function (ReplaceDetail $detail) {
$matchOffset = $detail->byteModifiedOffset();
$groupOffset = $detail->group(1)->byteModifiedOffset();
return 'ฤ™';
});
note

Use modifiedOffset() with multibyte-safe methods like mb_substr(), and byteModifiedOffset() with methods like substr().

Last updated on