Super User Asked by GC_ on December 24, 2020
I was messing around with notepad++, but seem to find an easy way to do this. I think grep might work, but I an not totally sure how.
I have a file, that has certain tags, I want to find all of the tags, that have incorrect values. For example:
This is what most of them are.
<tag attr="1">Correct</tag>
However, I want to find all the ones with anything else in them.
<tag attr="1">Wrong</tag>
<tag attr="1">Incorrect</tag>
<tag attr="1">Gibberish</tag>
… etc, etc …
There are thousands of them, but I am just looking for bad ones. I don’t want to look at each manually. Also, more than on tag can be on the same line.
GC
It's better to use a XML parser, but, if you want to use Notepad++, this does the job:
<tag[^>]*>(?:(?!Correct|</tag>).)*</tag>
. matches newline
Explanation:
<tag[^>]*> # open tag
# Tempered Greedy Token
(?: # non capture group
(?! # negative lookahead, make sure we haven't after:
Correct # literally Correct
| # OR
</tag> # end tag
) # end lookahead
. # any character
)* # end group, may appear 0 or more times
</tag> # end tag
Screenshot (before):
Screenshot (after):
Correct answer by Toto on December 24, 2020
Find an XML editor that allows XPath searching (I use oXygen), and the query is then //tag[not(.='Correct')]
.
If you're doing anything with XML, you need to master XPath: working with regular expressions to process XML is inefficient, clumsy, and ultimately it gives the wrong answer - there will always be some way of writing the XML that defeats your regex. For example people doing this with a regex often forget that attributes can be delimited by single quotes rather than double quotes, or that a newline can appear before the ">" in a start tag.
Answered by Michael Kay on December 24, 2020
Use CTRL H (find/replace) with REGEX turned on. Dots are a single wildcard, .* is everything. If you want to work with line breaks, rn or n will be your friend too. What defines a correct tag's contents? Is it always one word, or length?
For example, ....g is regex for literally any tag attr 1 with a contents of 4 charachters and a g.
Second, rn....g is a regex for the same, but there is a new line after the opening tag and before the contents of the tag. Some more details would help zero in on the exact regex for n++ So send more details if needed.
you can also do ()(....g)() to parse out the three sections. $1 etc is how to address the parsed parts. $1$2$3 is literal pasteback.
Answered by Alex Roberts on December 24, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP