From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Christian Johansson Newsgroups: gmane.emacs.devel Subject: Re: [ELPA] New package: phps-mode Date: Wed, 17 Jul 2019 11:36:18 +0200 Message-ID: References: <713AB63A-5E2C-4D07-9D96-AFB7DD1ADEDA@acm.org> Mime-Version: 1.0 (1.0) Content-Type: multipart/alternative; boundary=Apple-Mail-D4E2B3B3-B1AD-4E3E-AE86-F6B62176FABF Content-Transfer-Encoding: 7bit Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="194224"; mail-complaints-to="usenet@blaine.gmane.org" Cc: Emacs developers To: =?utf-8?Q?Mattias_Engdeg=C3=A5rd?= Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Jul 17 11:36:41 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1hngMb-000oNx-4m for ged-emacs-devel@m.gmane.org; Wed, 17 Jul 2019 11:36:41 +0200 Original-Received: from localhost ([::1]:55378 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hngMa-0005eI-3d for ged-emacs-devel@m.gmane.org; Wed, 17 Jul 2019 05:36:40 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:38732) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hngMR-0005e0-Cw for emacs-devel@gnu.org; Wed, 17 Jul 2019 05:36:32 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hngMQ-0006ub-7M for emacs-devel@gnu.org; Wed, 17 Jul 2019 05:36:31 -0400 Original-Received: from cvj.se ([31.192.230.63]:59121) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hngMP-0006rE-HW for emacs-devel@gnu.org; Wed, 17 Jul 2019 05:36:30 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=cvj.se; s=x; h=To:References:Message-Id:Content-Transfer-Encoding:Cc:Date:In-Reply-To:From:Subject:Mime-Version:Content-Type; bh=zu8mxOosnLcWxIMTEcVCHNaAlauUQEaP8BoqzFLV7HQ=; b=l0Je1x9OOXtRNOHKzFNsQp1Pj639kunvPQLhp+UULLf9yhRFGcpSfu9gt9dG7U4OpBkX7RXMCg2YeHftph3O02ihFY8Dp7EHbafzpbrWDXp9yTdQ93noiVCIuiY7bbe+w7Hw8wHgCUCzq7NzxJVmaWutbp69adU73VIpXnK4+7w=; Original-Received: from c-2f5ee255.011-155-65736b4.bbcust.telenor.se ([85.226.94.47] helo=[192.168.1.11]) by cvj.se with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1hngMK-00030O-8U; Wed, 17 Jul 2019 11:36:24 +0200 X-Mailer: iPhone Mail (16F203) In-Reply-To: X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 31.192.230.63 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:238613 Archived-At: --Apple-Mail-D4E2B3B3-B1AD-4E3E-AE86-F6B62176FABF Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Ah I missed that, the original re2c regex is [a-zA-Z_\x80-\xff][a-zA-Z0-9_\x80-\xff]* (from https://github.com/php/php-src/blob/master/Zend/zend_language_scanner.= l#L1252) But I=E2=80=99m not sure about the equivalent in emacs-lisp but I know PHP d= oes not fully support UTF-8 yet. Is the equivalent "[a-zA-Z_\u0080-\u00FF][a-zA-Z0-9_\u0080-\u00FF]*" ? > 17 juli 2019 kl. 10:44 skrev Mattias Engdeg=C3=A5rd : >=20 >> 17 juli 2019 kl. 07.43 skrev Christian Johansson : >>=20 >> Thanks for your review, I should have fixed all those items now and pushe= d them to ELPA >=20 >> (defvar phps-mode-lexer-LABEL >> "[a-zA-Z_\u0080-\u00FF][a-zA-Z0-9_\x80-\xff]*" >=20 > Unfinished? >=20 > It looks like PHP accepts any Unicode character above and including U+0080= in labels implicitly, by including 80-ff at the byte level and the implicit= fact that most PHP code is in UTF-8. So your regexp would probably be somet= hing like >=20 > "[A-Za-z_[:nonascii:]][0-9A-Za-z_[:nonascii:]]*" >=20 > You could always try and see if your code correctly treats $=CE=B3=CE=BD=E1= =BF=B6=CF=83=CE=B9=CF=82, say. >=20 --Apple-Mail-D4E2B3B3-B1AD-4E3E-AE86-F6B62176FABF Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable
Ah I= missed that, the original re2c regex is
[a-zA-Z_\x80-\xff][a-zA-Z0-9_\x8= 0-\xff]*

(from https://github.c= om/php/php-src/blob/master/Zend/zend_language_scanner.l#L1252)

But I=E2=80=99m not sure about the equivalent in emacs-lisp b= ut I know PHP does not fully support UTF-8 yet.

<= div dir=3D"ltr">Is t= he equivalent
"[a-zA-Z_\u0080-\u00FF][a-zA-Z0-9_\u0080-\u00FF]*"
?

17 juli 2019 kl. 10:44 skrev Mat= tias Engdeg=C3=A5rd <mattiase@acm.org= >:

17 j= uli 2019 kl. 07.43 skrev Christian Johansson <christian@cvj.se>:

Thanks for your rev= iew, I should have fixed all those items now and pushed them to ELPA<= br>

(defvar php= s-mode-lexer-LABEL
"= [a-zA-Z_\u0080-\u00FF][a-zA-Z0-9_\x80-\xff]*"
<= /span>
Unfinished?

It looks like P= HP accepts any Unicode character above and including U+0080 in labels implic= itly, by including 80-ff at the byte level and the implicit fact that most P= HP code is in UTF-8. So your regexp would probably be something like<= br>
"[A-Za-z_[:nonascii:]][0-9A-Za-z_[:nonascii:]]*"<= /span>

You could always try and see if your code c= orrectly treats $=CE=B3=CE=BD=E1=BF=B6=CF=83=CE=B9=CF=82, say.

= --Apple-Mail-D4E2B3B3-B1AD-4E3E-AE86-F6B62176FABF--