Ah I missed that, the original re2c regex is
[a-zA-Z_\x80-\xff][a-zA-Z0-9_\x80-\xff]*
But I’m not sure about the equivalent in emacs-lisp but I know PHP does not fully support UTF-8 yet.
Is the equivalent
"[a-zA-Z_\u0080-\u00FF][a-zA-Z0-9_\u0080-\u00FF]*"
?
17 juli 2019 kl. 07.43 skrev Christian Johansson <christian@cvj.se>:
Thanks for your review, I should have fixed all those items now and pushed them to ELPA
(defvar phps-mode-lexer-LABEL
"[a-zA-Z_\u0080-\u00FF][a-zA-Z0-9_\x80-\xff]*"
Unfinished?It looks like PHP accepts any Unicode character above and including U+0080 in labels implicitly, by including 80-ff at the byte level and the implicit fact that most PHP code is in UTF-8. So your regexp would probably be something like "[A-Za-z_[:nonascii:]][0-9A-Za-z_[:nonascii:]]*"You could always try and see if your code correctly treats $γνῶσις, say.