@hobbypaedagoge ich grätsche mal rein 😉
Suchen (und Ersetzen) von Text
Parsen von Logfiles
Validieren von Daten/Eingaben (nein, keine Mailadressen!)
Daten bereinigen
Extrahieren von Infos aus Textdateien
@maxleibman@aronow this won’t help you procrastinate, but https://regexr.com is one of my favorite tools once I’m close to the expression I want. It lets you test expressions on text you put in, so you can tweak your expression and see the result changes instantly.
Any #regex wizards here?
Is there a way to match multiple linebreaks regardless of the content but only if the number of linebreaks exceeds a value like 5?
Paczka Pythona #regex (nie mylić z wbudowanym modułem re) zbudowana jest w oparciu o szczegóły implementacji CPythona i nie obsługuje poprawnie #PyPy (i autor zapowiada, że może w końcu zablokować kompilację na PyPy). Jednakże wygląda na to, że wymagająca jej paczka #ReAssert działa bez problemów ze zwyczajnym re.
Dzisiaj #Gentoo przechodzi z łatania w sposób niedoskonały paczki regex, i ignorowania szczególnych przypadków, w których nie zadziała, na rzecz łatania re-assert. Chciałbym wysłać tę trywialną łatkę autorowi, ale — jak już wcześniej narzekałem — dostałem niegdyś bana, autor nie potrafi powiedzieć dlaczego, ale nie przeszkadza mu to uważać bana za sprawiedliwego. Może po prostu proaktywnie banuje devów dystrybucji Linuksa.
edit: any #Musicians out there, can you think of any edge-case chords I should test/adjust to catch? This will be part of a Free chord chart organizer, hit me with your worst.
I reached that with some modifications to something I found somewhere, and the impression I've got is that you can use | as a logical OR within a set like that.
I could be wrong there, I'll check up on that. But in the meantime, a real musical chord that fails to properly match against the regex will be more useful to me.
For the moment I'm considering it a solved problem, cause now I need to put on my javascript wetsuit and implement this into a web ui...
@jpaskaruk| only means OR outside of a set like that. Within a [ ] set, every character is already OR'd (e.g. [abc] matches a or b or c, and [a|b] matches a or | or b).
@RoundSparrow I understand that, I don't like memorisation because I'm bad with it. I'm more of, the more I use it, the more I'll remember it, not because I memorised it.
Another thing (although off-topic), the castle memory technique, usually attributed to memorisation. But, I don't know, for me, it's a storage technique. If I don't pull out a memory from its storage, I won't even remember it.
Generally I like #RegEx, but there are two huge problems for me with it:
1️⃣ I don't need it often enough, making it hard to remember more complex stuff.
2️⃣ As if 1. would not be bad enough, every tool and language uses a different dialect of it 😩
@dantleech Are you talking about vimgrep? Not even using that, configured to use rg instead. Bit rg does not support lookbehinds, which I had to use today 🙈 At least not unless you set another flag 😕
#IntelliJ always succeeds in surprising me. It has a built-in #RegEx tester that allows testing and changing the expression directly in your code, dealing with all the nasty escaping that is required in Java. It even highlights matching groups. 💚
@hennell@emd It's a similar scenario to the "Qed" BCMath wrapper library I started writing this week, although as a mere chainable interface (and not a fluent DSL interface) mine is a fair bit less complex.
There are times when it might help, and times when it would not provide a benefit.
I really like when I can re-use the same code for parts 1 and 2, passing different functions as arguments to differentiate the solutions, so I was satisfied with this one.
If I have a string and want to match all characters between the 10th character and the 48th character, what is the proper #regex for that? [A-Z0-9]{10,48} doesn't work 😭
@barubary I was renaming some music files. But they were named as "0X - Artist Name - Album Name - Title.mp3" and the easiest way to rename them in a batch was via Solid Explorer using the REGEX function.
Solved a *problem with #regex with more regex today.
*Not actually a problem with the regex itself, but one of unclear business requirements, but for anyone that said I'd regret the DNS regex I wrote a month later, I ate that soup today and it honestly wasn't bad.
Mon N+1 : "J'aurais besoin de comprendre. Je t'avais transmis ce gros fichier de données toutes bordéliques régurgitées et tu as fourni un CSV tout propre classé et filtré, tu pourrais me passer le script que tu avais utilisé pour faire ça ?"
Moi : "Ah mais j'ai pas de script."
Lui : "Mais comment tu as fait ça ?"
Moi, tout fier : "C'est le pouvoir de la REGEX !"
J'adore les regex. Ça résout tout, les regex ! Tiens, je sais, je vais faire un parser HTML en regex !
The thing about coding with #regex is that it feels like I'm getting paid to do Sudoku puzzles for a living.
Tip for those who are asked to review code with regex: Rather than focusing on the regex itself, ask to see the automated tests that it is ran against and look for gaps in the tests rather than getting lost in the weeds with scrutinizing the regex itself unless there's an obvious significant performance problem.
@vwbusguy My advice is essentially the opposite. Focus on the #regex, at least to get started. Regexes are code. Just like any other programming language, you have to learn the syntax and practice a bit, but the same principles apply as with program code in general.
When reviewing code, start by reading it. If there's something unclear, ask about it. Don't accept a regex consisting of 100 characters in one line without a single space. Compared to most other languages, regex syntax is terse: Few (if any) keywords, lots of symbols. Divide complex regexes into simple parts that are assembled into bigger constructs. You probably wouldn't accept a patch that adds hundreds of lines of unfactored code that has complex logic and nested loops, but no indentation or whitespace and no functions, so why write your regexes this way?
If your language builds regexes from strings, use string concatenation, formatting/indentation, comments, and named variables to make the structure of the pattern clear. If your language has the /x modifier, use it to allow sensible formatting and comments right in the regex (remember to escape with `` or [ ] any spaces that should match literally). If your language supports (?(DEFINE)...) and the (?&foo) syntax for named "regex subroutines", consider using it (but also consider restructuring your code: it might be trying to do too much in a single regex).
Once you understand the structure of the regex and how it is meant to work, it becomes much easier to review the tests: Are there any? Do they cover every input variant, exercising all parts of the regex, both matching and failing? (Failing matches are also relevant for finding performance issues: If a regex finds a match, it usually does so quickly. But a regex with exponential backtracking can take forever to fail because it'll try a huge number of variations before giving up on a string that doesn't match.)
There is an infamous regex for RFC 822 email addresses out there on the internet[1]. It is thousands of characters long and utterly incomprehensible. However, it was not written manually: It is essentially "object code", assembled by commented code using string concatenation from named variables that follow the structure of the BNF grammar in the RFC. Strive for the latter, not the former.
"I hate #regex, but I think this worked fine. I used #regexxer, a helper to find and replace stuff on multiple files, for those [of us] less well versed with the traditional CLI regex workflow."
Any other tips for user friendly find-and-replace tools?