TransWikia.com

a regex to match a php what's inside the start and end char of a class

Stack Overflow Asked on December 11, 2021

I’m having trouble finding the regex that matches the start and end chars of a php class, which are { and } respectively.
The regex should also not match the { and } if they are inside php comments, in other words it should not match if the { or } is preceded by any char but whitespace.

I suppose I should use negative look behind, but I’m a little rusty on regex, and so far I didn’t found the solution.

Here is my test string:

<?php


namespace LingLight_TaskSchedulerService;


/**
 * The LightTaskSchedulerService class. :{
 */
class LightTaskSchedulerService
{

    /**
     *
     * This method IS the task manager.
     * See the @page(Light_TaskScheduler conception notes) for more details.
     *
     */
    public function run()
    {
        $executionMode = $this->options['executionMode'] ?? "lastOnly";
        $this->logDebug("Executing run method with execution mode "$executionMode".");


    }


}


// this can happen in comments: }, why
// more stuff







And my pattern, which doesn’t work at the moment, is this:

    if(preg_match('!^s*{s*(.*)(?<![^s]*)}!ms', $c, $match)){
        a($match);
    }

So, I used multiline modifier "m", since we need to parse a multiline string,
then I used the "s" modifier so that the dot matches line breaks,
but then the negative look behind part (?<![^s]*) doesn’t seem to work.
I’m basically trying to say don’t match the "}" char if it’s preceded by anything but a whitespace.

@Wiktor Stribiżew: I tried this pattern but it still doesn’t work: !^s*{s*(.*)(?<!S)}!ms

Considering Tim Biegeleisen’s comment, I’ll probably take a simpler approach, like removing the comments first, and then do the simpler regex !^s*{s*(.*)}!ms, which I know will work.

However, if somebody knows a regex that does it, I would be interested in seeing it.

Problem solved for now, I’m out, thanks guys.

@Wiktor Stribiżew

The weird thing is that your regex works on the regex101 website, but it doesn’t work in my version of php (PHP 7.2.31).

So I mean: this doesn’t work in my php world:

$c = <<<'EEE'
<?php

/**
 * The LightTaskSchedulerService class. :{
 */
class LightTaskSchedulerService
{

    /**
     *
     * This method IS the task manager.
     * See the @page(Light_TaskScheduler conception notes) for more details.
     *
     */
    public function run()
    {
        $executionMode = $this->options['executionMode'] ?? "lastOnly";
        $this->logDebug("Executing run method with execution mode "$executionMode".");


    }


}


// this can happen in comments: }, why
// more stuff


EEE;



if(preg_match('/^s*{s*(.*)(?<!S)}$/gms', $c, $match)){
    echo "a match was found"; // is never displayed
}
exit;

So I don’t know what regex101 is using under the hood, but doesn’t work for me.

UPDATE

As Tim suggested, regex might not be the most appropriate tool for this job.

I ended up using a very simple solution to find the end character, and something similar can be applied to find the start character:

    /**
     * Returns an array containing information related to the end of the class.
     *
     * Important note, this method assumes that:
     *
     * - the parsed php file contains valid php code
     * - the parsed php file contains only one class
     *
     * If either the above assumptions are not true, then this method won't work properly.
     *
     *
     *
     * The returned array has the following structure:
     *
     *
     * - endLine: int, the number of the line containing the class declaration's last char
     * - lastLineContent: string, the content of the last line being part of the class declaration
     *
     *
     * @return array
     */
    public function getClassLastLineInfo(): array
    {

        $lastLineNumber = null;
        $lastLineContent = null;


        $lines = file($this->file);
        $reversedLines = array_reverse($lines);
        foreach ($reversedLines as $k => $line) {
            if ('}' === trim($line)) {
                $n = count($lines);
                $lastLineNumber = $n - $k;
                $lastLineContent = $line;
                break;
            }
        }

        return [
            "endLine" => $lastLineNumber,
            "lastLineContent" => $lastLineContent,
        ];
    }

With something similar for the start char, we basically can obtain the line numbers of the start and end characters of the class, and armed with those, we can simply get all the lines of the string as an array, and use a combination of array_slice/implode to "recompile" the content of the class.

Anyway, thanks for the comments.

One Answer

UPDATE

As people have already stated in the comment section: Regex might not be the best solution to do this. Anyway, you asked for it and I tested it with the class below.

// 1) without class check -> this does not work with code on line with opening {
preg_match('/(?:^{(?!r?n?s**/)|{s*$(?!r?n?s**/)).+^s*}(?!r?n?s**/)/ms', $c, $match);

// 2) with class check -> this should always work
preg_match('/^[sw]+?(?:{(?!r?n?s**/)|{s*$(?!r?n?s**/)).+^s*}(?!r?n?s**/)/ms', $c, $match);

// 3) with class check and capturing the second part (non-class-definition) separately -> this should always work
preg_match('/^[sw]+?((?:{(?!r?n?s**/)|{s*$(?!r?n?s**/)).+^s*}(?!r?n?s**/))/ms', $c, $match);

I recommend using 3).

/**
 * The LightTaskSchedulerService class. :{
 */
class LightTaskSchedulerService implements TaskSchedulerService {
{
    /**
     *
     * This method IS the task manager.
     * See the @page(Light_TaskScheduler conception notes) for more details.
     *
     */
    public function run()
    {
        $executionMode = $this->options['executionMode'] ?? "lastOnly";
        $this->logDebug("Executing run method with execution mode "$executionMode".");
        if ($foo) {
            doBar($foo);
        }
        /* multiline */
        // simple one line comment
        // simple one line comment { }
        # another comment
        # another comment}} {
        # another comment{/*}*/
//}
#}
/*}*/
/*{*/
/*
}*/
/*
}
*/
    }
}


// this can happen in comments:}, why
// more stuff
/* multiline hello} hello{
}*/
# singleline{
#}
//}
/*}*/
/**
}*/

Output:

Array
(
    [0] => {
{
    /**
     *
     * This method IS the task manager.
     * See the @page(Light_TaskScheduler conception notes) for more details.
     *
     */
    public function run()
    {
        $executionMode = $this->options['executionMode'] ?? "lastOnly";
        $this->logDebug("Executing run method with execution mode "$executionMode".");
        if ($foo) {
            doBar($foo);
        }
        /* multiline */
        // simple one line comment
        // simple one line comment { }
        # another comment
        # another comment}} {
        # another comment{/*}*/
//}
#}
/*}*/
/*{*/
/*
}*/
/*
}
*/
    }
}
)

Your code does not work, because it has errors:

  1. Unknown modifier g (for preg_match) => use preg_match_all instead
  2. $c in your code does not work, since it is not in the php scope write: <?php $c = <<<'EEE' ... instead
  3. The look behind in your case did not work, since you can't use +*? modifiers.

References:

On php.net 'g' is not listed as an option.
Modifier 'g': preg_match_all

I don't think that you even need preg_match_all a simple preg_match should work, since you only need this one match anyway.

This should work (tested with PHP 7.0.1). It does for me:

preg_match('/^classs+w+s*({.+(?<! )})/ms', $c, $match);
// or:
preg_match('/^class[^{]+({.+(?<! )})/ms', $c, $match);
// or even:
preg_match('^{.+r?n}(?<! )/ms', $c, $match);

print_r($match);

The negative look behind in my regex checks for leading whitespace that is followed by } in this case - the closing bracket needs to be at the very left corner in this case. This will work unless you want it to be in a different way. You need a delimiter anyway. And also you don't want that a closing curly bracket of an if-statement inside your run() method ends the search.

print_r output $match for the first preg_match statement above:

Array
(
    [0] => class LightTaskSchedulerService
{

    /**
     *
     * This method IS the task manager.
     * See the @page(Light_TaskScheduler conception notes) for more details.
     *
     */
    public function run()
    {
        $executionMode = $this->options['executionMode'] ?? "lastOnly";
        $this->logDebug("Executing run method with execution mode "$executionMode".");
        if ($foo) {
            doBar($foo);
        }
    }
}
    [1] => {

    /**
     *
     * This method IS the task manager.
     * See the @page(Light_TaskScheduler conception notes) for more details.
     *
     */
    public function run()
    {
        $executionMode = $this->options['executionMode'] ?? "lastOnly";
        $this->logDebug("Executing run method with execution mode "$executionMode".");
        if ($foo) {
            doBar($foo);
        }
    }
}
)

Answered by F. Müller on December 11, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP