How can I delete all characters falling under /* .... */ including /* & */?

Question

I did tried sed and awk, but its not working as the character involves / which is already there in command as delimiter.
Please let me know how can I achieve this.
Below is a sample Example. We want to remove the commented sections, i.e /*.....*/
/*This is to print the output
data*/
proc print data=sashelp.cars;
run;
/*Creating dataset*/
data abc;
set xyz;
run;

loudness · Answer

Quasímodo · Answer

GNU awk manual provides an example with getline that does just that, which I copy here verbatim.
# Remove text between /* and */, inclusive
{
    while ((start = index($0, "/*")) != 0) {
        out = substr($0, 1, start - 1)  # leading part of the string
        rest = substr($0, start + 2)    # ... */ ...    
        while ((end = index(rest, "*/")) == 0) {  # is */ in trailing part?
            # get more text
            if (getline <= 0) {
                print("unexpected EOF or error:", ERRNO) > "/dev/stderr"
                exit
            }
            # build up the line using string concatenation
            rest = rest $0
        }
        rest = substr(rest, end + 2)  # remove comment
        # build up the output line using string concatenation
        $0 = out rest
    }
    print $0
}

Bear in mind that it joins mon/*comment*/key into monkey. As Stéphane Chazelas mentions in this answer, this may lead to an effectively different code, so consider changing $0 = out rest to $0 = out " " rest.
Save that in a file, say commentRemove.awk, and execute it on a inputfile:
awk -f commentRemove.awk inputfile

user5337995 · Answer

using one line sed to remove comments:

sed '//*/d;/*//d' file

proc print data=sashelp.cars;
run;
data abc;
set xyz;
run;

Stéphane Chazelas · Answer

I once came up with this which we can refine to:

perl -0777 -pe '
  BEGIN{
    $bs=qr{(?:\|??/)};
    $lc=qr{(?:$bsn|$bsrn?)}
  }
  s{
    /$lc**.*?*$lc*/
    | /$lc*/(?:$lc|[^rn])*
    | (
         "(?:$bs$lc*.|.)*?"
       | '''$lc*(?:$bs$lc*(?:??.|.))?(?:??.|.)*?'''
       | ??'''
       | .[^'''"/?]*
      )
  }{$1 eq "" ? " " : "$1"}exsg'

to handle a few more corner cases.

Note that if you remove a comment, you could change the meaning of the code (1-/* comment */-1 is parsed like 1 - -1 while 1--1 (which you'd obtain if you removed the comment) would give you an error). It's better to replace the comment with a space character (as we do here) instead of completely removing it.

The above should work properly on this valid ANSI C code for instance that tries to include a few corner cases:

#include <stdio.h>
int main()
{
  printf("%d %s %c%c%c%c%c %s %s %dn",
  1-/* comment */-1,
  /
* comment */
  "/* not a comment */",
  /* multiline
  comment */
  '"' /* comment */ , '"',
  ''','"'/* comment */,
  '

"', /* comment */
  "\
" /* not a comment */ ",
  "??/" /* not a comment */ ",
  '??''+'"' /* "comment" */);
  return 0;
}

Which gives this output:

#include <stdio.h>
int main()
{
  printf("%d %s %c%c%c%c%c %s %s %dn",
  1- -1,

  "/* not a comment */",

  '"'   , '"',
  ''','"' ,
  '

"',  
  "\
" /* not a comment */ ",
  "??/" /* not a comment */ ",
  '??''+'"'  );
  return 0;
}

Both printing the same output when compiled and run.

You can compare with the output of gcc -ansi -E to see what the pre-processor would do on it. That code is also valid C99 or C11 code, however gcc disables trigraphs support by default so it won't work with gcc unless you specify the standard like gcc -std=c99 or gcc -std=c11 or add the -trigraphs option).

It also works on this C99/C11 (non-ANSI/C90) code:

// comment
/
/ comment
// multiline
comment
"// not a comment"

(compare with gcc -E/gcc -std=c99 -E/gcc -std=c11 -E)

ANSI C didn't support the // form of comment. // is not otherwise valid in ANSI C so wouldn't appear there. One contrived case where // may genuinely appear in ANSI C (as noted there, and you may find the rest of the discussion interesting) is when the stringify operator is in use.

This is a valid ANSI C code:

#define s(x) #x
s(//not a comment)

And at the time of the discussion in 2004, gcc -ansi -E did indeed expand it to "//not a comment". However today, gcc-5.4 returns an error on it, so I'd doubt we'll find a lot of C code using this kind of construct.

The GNU sed equivalent could be something like:

lc='([\%]n|[\%]rn?)'
sed -zE "
  s/_/_u/g;s/!/_b/g;s/</_l/g;s/>/_r/g;s/:/_c/g;s/;/_s/g;s/@/_a/g;s/%/_p/g;
  s@??/@%@g;s@/$lc**@:&@g;s@*$lc*/@;&@g
  s:/$lc*/:@&:g;s/??'/!/g
  s#:/$lc**[^;]*;*$lc*/|@/$lc*/$lc*|("([\\%]$lc*.|[^\\%"])*"|'$lc*([\\%]$lc*.)?[^\\%']*'|[^'"@;:]+)#<5>#g
  s/<>/ /g;s/!/??'/g;s@%@??/@g;s/[<>@:;]//g
  s/_p/%/g;s/_a/@/g;s/_s/;/g;s/_c/:/g;s/_r/>/g;s/_l/</g;s/_b/!/g;s/_u/_/g"

If your GNU sed is too old to support -E or -z, you can replace the first line with:

sed -r ":1;$!{N;b1}

Luciano Andress Martini · Answer

I think i found a easy solution!

cpp -P yourcommentedfile.txt

SOME UPDATES:

Quote from the user ilkachu (original text from the user comments):

I played a bit with the options for gcc: -fpreprocessed will disable most directives and macro expansions (except #define and #undef apparently). Adding -dD will leave defines in too; and std=c89 can be used to ignore new style // comments. Even with them, cpp replaces comments with spaces (instead of removing them), and collapses spaces and empty lines.

But I think it is still reasonable and a easy solution for the most of the cases, if you disable the macro expansion and other things I think you will get good results... - and yes you can combine that with shell script for getting better... and much more...

Baba · Answer

with sed:

UPDATE

//*/ {
    /*// {
        s//*.**///g;
        b next
    };

:loop;
    /*//! {
        N;
        b loop
    };
    /*// {
        s//*.**//n/g
    }
    :next
}

support all possible (multi line comment, data after [or and] befor, );

e1/*comment*/
-------------------
e1/*comment*/e2
-------------------
/*comment*/e2
-------------------
e1/*com
ment*/
-------------------
e1/*com
ment*/e2
-------------------
/*com
ment*/e2
-------------------
e1/*com
1
2
ment*/
-------------------
e1/*com
1
2
ment*/e2
-------------------
/*com
1
2
ment*/e2
-------------------

run:

$ sed -f command.sed FILENAME

e1
-------------------
e1e2
-------------------
e2
-------------------
e1

-------------------
e1
e2
-------------------

e2
-------------------
e1

-------------------
e1
e2
-------------------

e2
-------------------

JigglyNaga · Answer

sed operates on one line at a time, but some of the comments in the input span multiple lines.  As per https://unix.stackexchange.com/a/152389/90751 , you can first use tr to turn the line-breaks into some other character.  Then sed can process the input as a single line, and you use tr again to restore the line-breaks.

tr 'n' '' | sed ... | tr '' n'

I've used null bytes, but you can pick any character that doesn't appear in your input file.

* has a special meaning in regular expressions, so it will need escaping as * to match a literal *.

.* is greedy -- it will match the longest possible text, including more */ and /*.  That means the first comment, the last comment, and everything in between.  To restrict this, replace .* with a stricter pattern: comments can contain anything that's not a "*", and also "*" followed by anything that's not a "/".  Runs of multiple *s also have to be accounted for:

tr 'n' '' | sed -e 's,/*([^*]|*+[^*/])**+/,,g' | tr '' 'n'

This will remove any linebreaks in the multiline comments, ie.

data1 /* multiline
comment */ data2

will become

data1  data2

If this isn't what was wanted, sed can be told to keep one of the linebreaks.  This means picking a linebreak replacement character that can be matched.

tr 'n' 'f' | sed -e 's,/*((f)|[^*]|*+[^*/])**+/,2,g' | tr 'f' 'n'

The special character f, and the use of a back-reference that may not have matched anything, aren't guaranteed to work as intended in all sed implementations.  (I confirmed it works on GNU sed 4.07 and 4.2.2.)

Hans Schou · Answer

$ cat file | perl -pe 'BEGIN{$/=undef}s!/*.+?*/!!sg'

proc print data=sashelp.cars;
 run;

data abc;
 set xyz;
 run;

Remove blank lines if any:

$ cat file | perl -pe 'BEGIN{$/=undef}s!/*.+?*/n?!!sg'

Edit - the shorter version by Stephane:

$ cat file | perl -0777 -pe 's!/*.*?*/!!sg'

user172564 · Answer

Solution by Using SED command and no Script

Here you are:

sed 's/*//n&/g' test | sed '//*/,/*//d'

N.B. This doesn't work on OS X, unless you install gnu-sed. But it works on Linux Distros.

How can I delete all characters falling under /* .... / including / & */?

9 Answers

UPDATE

Solution by Using SED command and no Script

Add your own answers!

Ask a Question