The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 4,263 other subscribers

Archive for December 14th, 2022

Kristian Köhntopp on Twitter: “Basically, show me a Python regex with \d and without ASCII flag, and I can show you a bug, often exploitable.… “

Posted by jpluimers on 2022/12/14

An interesting thought: [Archive] Kristian Köhntopp on Twitter: “Basically, show me a Python regex with \d and without ASCII flag, and I can show you a bug, often exploitable.… “

Basically, input parsing is still very much underrated by most systems and a constant source of peculiarities and therefore bugs, or phrased differently: [Archive] Kristian Köhntopp on Twitter: “In many cases an uncaught exception, and hence a component crash.… “

Kris also states [Archive] Kristian Köhntopp on Twitter: “Again, Python is not alone in this. Perl, when “use utf8;” is active (which it should) also does this, so every single fucking Regex needs a ‘/a‘ at the end. Nobody ever asked \d to match tengwar or klingon numeric symbols.… “.

The point is in the last few words as Arabic numerals are so white spread over the world that the ten digits 0, 1, 2, 3, 4, 5, 6, 7, 8 , 9 they represent should be the de facto \d pattern, but aren’t in Python as per [Wayback/Archive] re — Regular expression operations — Python 3.10.0 documentation: /d (emphasis mine):

Read the rest of this entry »

Posted in Development, Perl, Python, RegEx, Scripting, Software Development | Leave a Comment »