I responded to [Wayback/Archive] jilles.com on Twitter: “@0xD4ni @Twitter What is the regexp for an emoticon ?” with [Wayback/Archive] Jeroen Wiert Pluimers on Twitter: “@jilles_com @0xD4ni @Twitter \p{So}+ See …”.
I got the answer from [Wayback/Archive] java – What is the regex to extract all the emojis from a string? – Stack Overflow (thanks [Wayback/Archive] vishalaksh, and [Wayback/Archive] Desgard_Duan) which refers to the quoted section below.
Note that correctly matching highly depends on the versions of the libraries you use: there have been lots of releases of Unicode versions over the last years (since 2014 roughly every 12 months) each usually adding more Emoji.
In addition, many Emoji are not single Unicode codepoints: often they are code points (with or without any of the variation selectors) stacked on top of each other with zero-width joiners like I described in Kris on Twitter: “Company chat: »Right, we need more languages with Emoji as variable type indicators and pointer symbols.«….
I tried fiddling on [Wayback/Archive] regex101: build, test, and debug regex and could not always getting it to work as I hoped for, but also could not figure out how recent their libraries are.





