@tripleo You’re thinking of #Perl’s “taint mode” (stop your teenage giggling), where outside data is untrusted unless it’s the extracted subpattern match in a #RegularExpression.
If you work with text data in R, the gregexpr() function is essential for pattern matching. It finds all occurrences of a pattern within a string. Key parameters include pattern, text, ignore.case, perl, fixed, and useBytes. You can match characters, ignore case, use advanced regex, and search fixed strings.
In my latest blog post, I cover how to find specific strings in data columns using the str_detect function from the stringr package and base R functions. You'll see practical examples with both grepl for identifying matches and gregexpr for counting occurrences.
I was working on a problem today where I needed to pick out a department and a sub issue that were attached to each other as a single code in a comment.
I first went to use the traditional SUSTRING(comment_string, 1, 5) IN (my list of codes) but it was slow.
So off to work, I learned something new via #stackoverflow and learned to make a sarge-able LIKE with what ever I want in the narrower results.
I decided to make a blog post out of a problem I worked on a day or two ago and thankfully I was also pointed to another solution from @embiggenData which worked well too.
I really do enjoy #regex. It always cheers me up. Kinda feel like a cool puzzle.
I learnt regex when I was learning Perl back in the day. But big shout-out to https://www.regular-expressions.info/ and https://regexr.com/ for providing such good resources for me to help my friends and colleagues also learn regex and enjoy writing it.
Any #regex wizards here?
Is there a way to match multiple linebreaks regardless of the content but only if the number of linebreaks exceeds a value like 5?
#TIL Today I learnt that adding ? after * transforms a #regex expression from being "greedy" into "lazy" (important for performance, safe validators, and protection against DoS attacks).
I don't know how I missed this bit of knowledge for so long. :blobfoxbox:
Sure. What follows is a dumb example ( executed in https://regex101.com/ ), but illustrates my point.
In this particular case you could say that ? is semantically required for <script> because we could have more than one, but many times we don't have this distinction and it still affects how many steps the #regex has to perform.
(Sorry for having the text selected in the 2nd image, I was copying it for the alt of the images 😅 )
<main>
Hello World
<script>console.log("hello!"); More stuff Just a decoy!](https://media.hachyderm.io/media_attachments/files/111/914/833/409/432/020/original/3925f50f868f8a82.png)