unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file
@ 2017-06-16 10:00 Vincent Belaïche
  2017-06-16 12:59 ` Eli Zaretskii
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Vincent Belaïche @ 2017-06-16 10:00 UTC (permalink / raw)
  To: 27391; +Cc: Vincent Belaïche





================================================================================

I was editing some file written in Markdown. Here is the file :

https://framagit.org/latex-pourquoi-comment/lpc-articles/blob/795ecb9d4f7b8870486fe6557f01d2fe450c4461/CONTRIBUTING.md

My Emacs default configuration was to get files in latin-1. So I had
added some `coding: utf-8' cookie in this file. But it did not work, the
file was still read in latin-1 instead of utf8.

I made a test with one more cookie `eval: (message "Hello")', this one
worked, which means that the problem is not that cookies aren't read,
the problem is within the application of the coding scheme.

The only way for me to get the correct encoding is to place:

(modify-coding-system-alist 'file "\\.m\\(d\\|arkdown\\)\\'"
  'prefer-utf-8)

In my init file.

I made the trial with `emacs -q', and the problem is still there, which
shows that markdown-mode is not to blame. My first thought was that
markdown-mode was the culprit, see discussion here :
https://github.com/jrblevin/markdown-mode/issues/198

Jason Blevin is the author of markdown-mode, he noted that the presence
of the [ character has some impact. See:

https://github.com/jrblevin/markdown-mode/issues/198#issuecomment-308524696

I did not double check his analysis. To me this looks like some race
problem where the automatic encoding detection is applied after the
cookie and undoes it. Maybe some semaphore is missing, or something like
that.

   Vincent.

================================================================================


In GNU Emacs 25.2.50.1 (i686-pc-mingw32)
 of 2017-06-14 built on AIGLEROYAL
Repository revision: da62c1532e479bbac4ce242ee1d170df9c435591
Windowing system distributor 'Microsoft Corp.', version 10.0.14393
Configured using:
 'configure --prefix=c:/Nos_Programmes/GNU/Emacs --without-jpeg
 --without-tiff --without-gif --without-png 'CFLAGS= -Og -g3 -L
 C:/Programmes/installation/emacs-install/libXpm-3.5.8/src' 'CPPFLAGS=
 -DFOR_MSW=1 -I
 C:/Programmes/installation/emacs-install/libXpm-3.5.8/include -I
 C:/Programmes/installation/emacs-install/libXpm-3.5.8/src -L
 C:/Programmes/installation/emacs-install/libXpm-3.5.8/src''

Configured features:
XPM SOUND NOTIFY ACL TOOLKIT_SCROLL_BARS

Important settings:
  value of $LANG: FRA
  locale-coding-system: cp1252

Major mode: Dired by name

Minor modes in effect:
  diff-auto-refine-mode: t
  TeX-PDF-mode: t
  shell-dirtrack-mode: t
  recentf-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  buffer-read-only: t
  line-number-mode: t
  transient-mark-mode: t

Recent messages:
Mark set [2 times]
Mark saved where search started
Quit
scroll-up-command: End of buffer
Mark set
find-dired *Find* finished.
dired-get-file-for-visit: No file on this line [2 times]
Mark set
Quit
Making completion list...

Load-path shadows:
c:/Programmes/installation/cedet-install/cedet-git/lisp/speedbar/loaddefs hides c:/Nos_Programmes/GNU/Emacs/share/emacs/25.2.50/lisp/loaddefs
c:/Programmes/installation/cedet-install/cedet-git/lisp/speedbar/loaddefs hides c:/Programmes/installation/cedet-install/cedet-git/lisp/cedet/loaddefs

Features:
(shadow emacsbug find-dired calc-yank calc-mode calccomp calc-alg
calc-vec calc-aent calc-menu cal-move whitespace perl-mode log-edit
pcvs-util eieio-opt speedbar sb-image ezimage dframe vc-bzr vc-src
vc-sccs vc-svn vc-rcs vc-dir ewoc add-log org-element org-rmail org-mhe
org-irc org-info org-gnus org-docview doc-view subr-x jka-compr
image-mode org-bibtex bibtex org-bbdb org-w3m org org-macro org-footnote
org-pcomplete org-list org-faces org-entities org-version ob-emacs-lisp
ob ob-tangle ob-ref ob-lob ob-table ob-exp org-src ob-keys ob-comint
ob-core ob-eval org-compat org-macs org-loaddefs find-func cal-menu
calendar cal-loaddefs tex-info texinfo vc vc-dispatcher ediff-vers
thingatpt rect visual-basic-mode sh-script smie executable make-mode
misearch multi-isearch ediff-merg ediff-wind ediff-diff ediff-mult
ediff-help ediff-init ediff-util ediff vc-git diff-mode reftex-dcr
reftex reftex-vars preview prv-emacs noutline outline pcmpl-unix
latexenc tex-bar latex easy-mmode tex-style toolbar-x font-latex
plain-tex tex-buf tex advice tex-mode compile shell pcomplete comint
ansi-color ring bbdb-print info mailalias smtpmail sort ispell vc-cvs
hl-line balance eieio-compat calc-forms dired-aux mail-extr bbdb-message
sendmail gnus-async qp gnus-ml cursor-sensor nndraft nnmh nnfolder
bbdb-gnus bbdb-mua bbdb-com crm network-stream nsm auth-source eieio
eieio-core starttls gnus-agent gnus-srvr gnus-score score-mode nnvirtual
gnus-msg gnus-art mm-uu mml2015 mm-view mml-smime smime dig mailcap nntp
gnus-cache gnus-sum gnus-group gnus-undo gnus-start gnus-cloud nnimap
nnmail mail-source tls gnutls utf7 netrc nnoo parse-time gnus-spec
gnus-int gnus-range message dired-x dired format-spec rfc822 mml mml-sec
password-cache epg mm-decode mm-bodies mm-encode mail-parse rfc2231
rfc2047 rfc2045 ietf-drums mailabbrev gmm-utils mailheader gnus-win gnus
gnus-ems nnheader gnus-util mail-utils mm-util help-fns mail-prsvr
edmacro kmacro skeleton calc-misc calc-arith calc-ext calc calc-loaddefs
calc-macs tex-mik preview-latex tex-site auto-loads bbdb bbdb-site
timezone bbdb-loaddefs template w32utils cl-seq cl-macs cl recentf
tree-widget wid-edit load-path-to-cedet-svn finder-inf package
epg-config seq byte-opt gv bytecomp byte-compile cl-extra help-mode
easymenu cconv cl-loaddefs pcase cl-lib time-date mule-util tooltip
eldoc electric uniquify ediff-hook vc-hooks lisp-float-type mwheel
dos-w32 ls-lisp disp-table w32-win w32-vars term/common-win tool-bar dnd
fontset image regexp-opt fringe tabulated-list newcomment elisp-mode
lisp-mode prog-mode register page menu-bar rfn-eshadow timer select
scroll-bar mouse jit-lock font-lock syntax facemenu font-core frame
cl-generic cham georgian utf-8-lang misc-lang vietnamese tibetan thai
tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek romanian
slovak czech european ethiopic indian cyrillic chinese charscript
case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer
cl-preloaded nadvice loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote w32notify w32 multi-tty
make-network-process emacs)

Memory information:
((conses 8 899957 158092)
 (symbols 32 53590 0)
 (miscs 32 2257 2796)
 (strings 16 133750 20600)
 (string-bytes 1 5975277)
 (vectors 8 55330)
 (vector-slots 4 1716681 54830)
 (floats 8 651 494)
 (intervals 28 72632 8079)
 (buffers 516 78))

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus






^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file
  2017-06-16 10:00 bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file Vincent Belaïche
@ 2017-06-16 12:59 ` Eli Zaretskii
  2017-06-16 14:08 ` Vincent Belaïche
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 21+ messages in thread
From: Eli Zaretskii @ 2017-06-16 12:59 UTC (permalink / raw)
  To: Vincent Belaïche; +Cc: 27391

> From: vincent.belaiche@gmail.com (Vincent Belaïche)
> Date: Fri, 16 Jun 2017 12:00:06 +0200
> Cc: Vincent Belaïche <vincent.belaiche@gmail.com>
> 
> I was editing some file written in Markdown. Here is the file :
> 
> https://framagit.org/latex-pourquoi-comment/lpc-articles/blob/795ecb9d4f7b8870486fe6557f01d2fe450c4461/CONTRIBUTING.md
> 
> My Emacs default configuration was to get files in latin-1. So I had
> added some `coding: utf-8' cookie in this file. But it did not work, the
> file was still read in latin-1 instead of utf8.

I cannot reproduce this, and I don't see any coding cookies in the
file I downloaded.

Please provide a minimal recipe that's required to reproduce the
problem.  In particular, since you tried in "emacs -q", I don't
understand what does it mean that your default configuration is
latin-1: in "emacs -q" your default configuration is determined by
your system locale.

Thanks.





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file
  2017-06-16 10:00 bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file Vincent Belaïche
  2017-06-16 12:59 ` Eli Zaretskii
@ 2017-06-16 14:08 ` Vincent Belaïche
  2017-06-16 14:10   ` Vincent Belaïche
  2017-06-16 18:38   ` Eli Zaretskii
  2017-06-16 21:27 ` Vincent Belaïche
  2017-06-16 22:09 ` Vincent Belaïche
  3 siblings, 2 replies; 21+ messages in thread
From: Vincent Belaïche @ 2017-06-16 14:08 UTC (permalink / raw)
  To: 27391, Eli Zaretskii; +Cc: Vincent Belaïche

[-- Attachment #1: Type: text/plain, Size: 2227 bytes --]

Le 16/06/2017 à 14:59, Eli Zaretskii a écrit :
>> From: vincent.belaiche@gmail.com (Vincent Belaïche)
>> Date: Fri, 16 Jun 2017 12:00:06 +0200
>> Cc: Vincent Belaïche <vincent.belaiche@gmail.com>
>>
>> I was editing some file written in Markdown. Here is the file :
>>
>> https://framagit.org/latex-pourquoi-comment/lpc-articles/blob/795ecb9d4f7b8870486fe6557f01d2fe450c4461/CONTRIBUTING.md
>>
>> My Emacs default configuration was to get files in latin-1. So I had
>> added some `coding: utf-8' cookie in this file. But it did not work, the
>> file was still read in latin-1 instead of utf8.
>
> I cannot reproduce this, and I don't see any coding cookies in the
> file I downloaded.
>
> Please provide a minimal recipe that's required to reproduce the
> problem.  In particular, since you tried in "emacs -q", I don't
> understand what does it mean that your default configuration is
> latin-1: in "emacs -q" your default configuration is determined by
> your system locale.
>
> Thanks.

Attached is the file causing the issue. Recipe is just to visit the file
with emacs -q, and you see that the encoding is not taken.

For instance I get the following doc section :

--8<----8<----8<----8<----8<-- begin -->8---->8---->8---->8---->8----
### doc
Placez dans *doc* et ses sous-répertoires toute la documentation afférente au projet, sans oublier les notes et courriers électroniques importants. Vous pouvez avoir des sous-répertoires de doc contenant différents types de documents ou pour différentes phases du projet.
--8<----8<----8<----8<----8<--  end  -->8---->8---->8---->8---->8----

Instead of:

--8<----8<----8<----8<----8<-- begin -->8---->8---->8---->8---->8----
### doc
Placez dans *doc* et ses sous-répertoires toute la documentation afférente au projet, sans oublier les notes et courriers électroniques importants. Vous pouvez avoir des sous-répertoires de doc contenant différents types de documents ou pour différentes phases du projet.
--8<----8<----8<----8<----8<--  end  -->8---->8---->8---->8---->8----

  Vincent.



---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus

[-- Attachment #2: CONTRIBUTING.md --]
[-- Type: text/plain, Size: 2416 bytes --]

Guide de contribution
=====================

WorkFlow
--------
Ce projet utilise Git-flow au pied de la lettre:
* http://nvie.com/posts/a-successful-git-branching-model/

L'article de base qui donnera naissance au projet


* https://danielkummer.github.io/git-flow-cheatsheet/index.fr_FR.html

Aide mémoire français (et en d'autre traduction).


Contributions
-------------
Libre à vous de cloner le dépôt... Et de proposer des modifications.


Conventions de nomnage
======================

Arborescence de fichier
-----------------------

### doc
Placez dans *doc* et ses sous-répertoires toute la documentation afférente au projet, sans oublier les notes et courriers électroniques importants. Vous pouvez avoir des sous-répertoires de doc contenant différents types de documents ou pour différentes phases du projet.

Si vous avez besoin de documentation externe, envisager de la copier ici. Cela rendra service pour maintenir le projet si l'endroit où les données en questions étaient accessibles disparaît.


### src
Ce répertoire contient le code source du projet. Vous pouvez y faire des sous-répertoires pour différents types de code source, par exemple:

* src/inc
* src/img
* ...


### util
Répertoire contenant les utilitaires, outils et scripts spécifiques au projet.


### vendor
Si le projet utilise des bibliothèques fournies par une partie tierce ou des fichiers d'en-têtes que vous désirez archiver avec votre code, faites-le ici.


Gestionnaire de version
-----------------------
Le workflow git suit scrupuleusement git-flow.


### Branche **master**
Elle représente le dernier état installable en production du projet. Seul les administrateurs du dépôt peuvent travailler dans cette branche.


### Branche **devel**
La branche où est récolté le travail de tout le monde, des branches de développement privées. Seul la "Team" peut travailler dans cette branche.


### les branches **feature**
Chaque branche doit être Nommée de la manière suivante:

* PSEUDO-DESCRIPTION

où:

* **PSEUDO** est le pseudo de l'administrateur (le créateur) de la branche
* **DESCRIPTION** Une description en CamelCase (RaisonCreationBranche) de cette branche



[comment]: # ( Local Variables: )
[comment]: # ( coding: utf-8	)
[comment]: # ( eval: (message "Coucou")	)
[comment]: # ( End:		)

				

^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file
  2017-06-16 14:08 ` Vincent Belaïche
@ 2017-06-16 14:10   ` Vincent Belaïche
  2017-06-16 18:38   ` Eli Zaretskii
  1 sibling, 0 replies; 21+ messages in thread
From: Vincent Belaïche @ 2017-06-16 14:10 UTC (permalink / raw)
  To: 27391, Eli Zaretskii

Le 16/06/2017 à 16:08, Vincent Belaïche a écrit :
> Le 16/06/2017 à 14:59, Eli Zaretskii a écrit :
>>> From: vincent.belaiche@gmail.com (Vincent Belaïche)
>>> Date: Fri, 16 Jun 2017 12:00:06 +0200
>>> Cc: Vincent Belaïche <vincent.belaiche@gmail.com>
>>>
>>> I was editing some file written in Markdown. Here is the file :
>>>
>>> https://framagit.org/latex-pourquoi-comment/lpc-articles/blob/795ecb9d4f7b8870486fe6557f01d2fe450c4461/CONTRIBUTING.md
>>>
>>> My Emacs default configuration was to get files in latin-1. So I had
>>> added some `coding: utf-8' cookie in this file. But it did not work, the
>>> file was still read in latin-1 instead of utf8.
>> I cannot reproduce this, and I don't see any coding cookies in the
>> file I downloaded.
>>
>> Please provide a minimal recipe that's required to reproduce the
>> problem.  In particular, since you tried in "emacs -q", I don't
>> understand what does it mean that your default configuration is
>> latin-1: in "emacs -q" your default configuration is determined by
>> your system locale.
>>
>> Thanks.
> Attached is the file causing the issue. Recipe is just to visit the file
> with emacs -q, and you see that the encoding is not taken.
>
> For instance I get the following doc section :
>
> --8<----8<----8<----8<----8<-- begin -->8---->8---->8---->8---->8----
> ### doc
> Placez dans *doc* et ses sous-répertoires toute la documentation afférente au projet, sans oublier les notes et courriers électroniques importants. Vous pouvez avoir des sous-répertoires de doc contenant différents types de documents ou pour différentes phases du projet.
> --8<----8<----8<----8<----8<--  end  -->8---->8---->8---->8---->8----
>
> Instead of:
>
> --8<----8<----8<----8<----8<-- begin -->8---->8---->8---->8---->8----
> ### doc
> Placez dans *doc* et ses sous-répertoires toute la documentation afférente au projet, sans oublier les notes et courriers électroniques importants. Vous pouvez avoir des sous-répertoires de doc contenant différents types de documents ou pour différentes phases du projet.
> --8<----8<----8<----8<----8<--  end  -->8---->8---->8---->8---->8----
>
>    Vincent.
>
>
>
> ---
> L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
> https://www.avast.com/antivirus
Just for the clarification, you needed to click on the open raw button
to see the cookie. I should have sent you this link :

https://framagit.org/latex-pourquoi-comment/lpc-articles/raw/795ecb9d4f7b8870486fe6557f01d2fe450c4461/CONTRIBUTING.md

Instead of the "viewer" equivalent link, where the markdown tags are
interpreted into formatting.

You cannot see the cookies with the viewer link because they are
commented out, so the viewer does not display them.

   V.





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file
  2017-06-16 14:08 ` Vincent Belaïche
  2017-06-16 14:10   ` Vincent Belaïche
@ 2017-06-16 18:38   ` Eli Zaretskii
  2017-06-16 19:08     ` Vincent Belaïche
  2017-06-16 19:15     ` Vincent Belaïche
  1 sibling, 2 replies; 21+ messages in thread
From: Eli Zaretskii @ 2017-06-16 18:38 UTC (permalink / raw)
  To: Vincent Belaïche; +Cc: 27391

> From: vincent.belaiche@gmail.com (Vincent Belaïche)
> Cc: Vincent Belaïche <vincent.belaiche@gmail.com> 
> Date: Fri, 16 Jun 2017 16:08:09 +0200
> 
> Attached is the file causing the issue. Recipe is just to visit the file
> with emacs -q, and you see that the encoding is not taken.

Your fancy comment causes this: remove the leading '[' and the problem
goes away.  Looks like regex-quoting that somehow misfires.





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file
  2017-06-16 18:38   ` Eli Zaretskii
@ 2017-06-16 19:08     ` Vincent Belaïche
  2017-06-16 19:15     ` Vincent Belaïche
  1 sibling, 0 replies; 21+ messages in thread
From: Vincent Belaïche @ 2017-06-16 19:08 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 27391



Le 16/06/2017 à 20:38, Eli Zaretskii a écrit :
>> From: vincent.belaiche@gmail.com (Vincent Belaïche)
>> Cc: Vincent Belaïche <vincent.belaiche@gmail.com>
>> Date: Fri, 16 Jun 2017 16:08:09 +0200
>>
>> Attached is the file causing the issue. Recipe is just to visit the file
>> with emacs -q, and you see that the encoding is not taken.
> Your fancy comment causes this: remove the leading '[' and the problem
> goes away.  Looks like regex-quoting that somehow misfires.


I used this type of comment marks after reading this discussion:

https://stackoverflow.com/questions/4823468/comments-in-markdown

   V.

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus






^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file
  2017-06-16 18:38   ` Eli Zaretskii
  2017-06-16 19:08     ` Vincent Belaïche
@ 2017-06-16 19:15     ` Vincent Belaïche
  2017-06-16 19:31       ` Andreas Schwab
  2017-06-16 19:37       ` Vincent Belaïche
  1 sibling, 2 replies; 21+ messages in thread
From: Vincent Belaïche @ 2017-06-16 19:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 27391



Le 16/06/2017 à 20:38, Eli Zaretskii a écrit :
>> From: vincent.belaiche@gmail.com (Vincent Belaïche)
>> Cc: Vincent Belaïche <vincent.belaiche@gmail.com>
>> Date: Fri, 16 Jun 2017 16:08:09 +0200
>>
>> Attached is the file causing the issue. Recipe is just to visit the file
>> with emacs -q, and you see that the encoding is not taken.
> Your fancy comment causes this: remove the leading '[' and the problem
> goes away.  Looks like regex-quoting that somehow misfires.

After some investigation, it seems that the bug is in regexp-quote:

(regexp-quote "[comment]: # (")

outputs

"^\\[comment]: # ( "

instead of

"^\\[comment\\]: # ( "


   Vincent.



---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus






^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file
  2017-06-16 19:15     ` Vincent Belaïche
@ 2017-06-16 19:31       ` Andreas Schwab
  2017-06-16 19:37       ` Vincent Belaïche
  1 sibling, 0 replies; 21+ messages in thread
From: Andreas Schwab @ 2017-06-16 19:31 UTC (permalink / raw)
  To: Vincent Belaïche; +Cc: 27391

On Jun 16 2017, Vincent Belaïche <vincent.belaiche@gmail.com> wrote:

> After some investigation, it seems that the bug is in regexp-quote:
>
> (regexp-quote "[comment]: # (")
>
> outputs
>
> "^\\[comment]: # ( "
>
> instead of
>
> "^\\[comment\\]: # ( "

But `]' is not special.

(string-match "^\\[comment]: # ( " "[comment]: # ( ") => 0

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file
  2017-06-16 19:15     ` Vincent Belaïche
  2017-06-16 19:31       ` Andreas Schwab
@ 2017-06-16 19:37       ` Vincent Belaïche
  1 sibling, 0 replies; 21+ messages in thread
From: Vincent Belaïche @ 2017-06-16 19:37 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 27391



Le 16/06/2017 à 21:15, Vincent Belaïche a écrit :
>
>
> Le 16/06/2017 à 20:38, Eli Zaretskii a écrit :
>>> From: vincent.belaiche@gmail.com (Vincent Belaïche)
>>> Cc: Vincent Belaïche <vincent.belaiche@gmail.com>
>>> Date: Fri, 16 Jun 2017 16:08:09 +0200
>>>
>>> Attached is the file causing the issue. Recipe is just to visit the 
>>> file
>>> with emacs -q, and you see that the encoding is not taken.
>> Your fancy comment causes this: remove the leading '[' and the problem
>> goes away.  Looks like regex-quoting that somehow misfires.
>
> After some investigation, it seems that the bug is in regexp-quote:
>
> (regexp-quote "[comment]: # (")
>
> outputs
>
> "^\\[comment]: # ( "
>
> instead of
>
> "^\\[comment\\]: # ( "
>
>
>   Vincent.
>
>
After some more investigation, I think that the bug is in function
insert-file-contents of fileio.c which is the one that decide and sets
the coding system well before the other local variables are looked into.

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus






^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file
  2017-06-16 10:00 bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file Vincent Belaïche
  2017-06-16 12:59 ` Eli Zaretskii
  2017-06-16 14:08 ` Vincent Belaïche
@ 2017-06-16 21:27 ` Vincent Belaïche
  2017-06-16 21:34   ` Philipp Stephani
  2017-06-16 22:09 ` Vincent Belaïche
  3 siblings, 1 reply; 21+ messages in thread
From: Vincent Belaïche @ 2017-06-16 21:27 UTC (permalink / raw)
  To: Eli Zaretskii, 27391; +Cc: Vincent Belaïche



Le 16/06/2017 à 21:37, Vincent Belaïche a écrit :
>
>
> Le 16/06/2017 à 21:15, Vincent Belaïche a écrit :
>>

[...]

>>
>>
> After some more investigation, I think that the bug is in function
> insert-file-contents of fileio.c which is the one that decide and sets
> the coding system well before the other local variables are looked into. 

After some more investigation, in the end the find-auto-coding of
mule.el is what is called to detect the coding. This function calls some
re-coding regexp.

Here is a test function defining the same regexp.


(defun doit ()
  (interactive)
  (let* ((prefix (regexp-quote "[comment]: # ("))
	 (suffix (regexp-quote ")"))
	 (re-coding
	  (concat
	   "[\r\n]" prefix
	   ;; N.B. without the \n below, the regexp can
	   ;; eat newlines.
	   "[ \t]*coding[ \t]*:[ \t]*\\([^ \t\r\n]+\\)[ \t]*"
	   suffix "[\r\n]")))
    (message (if (looking-at re-coding) "ok" "nak"))))

I tried it with point at end of line

[comment]: # ( Local Variables: )

and it answered "ok". Now I defined this with re-search-forward instead
of looking-at:

(defun doit ()
  (interactive)
  (let* ((prefix (regexp-quote "[comment]: # ("))
	 (suffix (regexp-quote ")"))
	 (re-coding
	  (concat
	   "[\r\n]" prefix
	   ;; N.B. without the \n below, the regexp can
	   ;; eat newlines.
	   "[ \t]*coding[ \t]*:[ \t]*\\([^ \t\r\n]+\\)[ \t]*"
	   suffix "[\r\n]")))
    (message (if (re-search-forward re-coding nil t) "ok" "nak"))))

I placed the point before the coding: line, and I also got answer "ok"

So I don't think that the regexp as such is to blame. Something else
seems to happen. It is too late now, I need to go to bed...

  Vincent.


---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus






^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file
  2017-06-16 21:27 ` Vincent Belaïche
@ 2017-06-16 21:34   ` Philipp Stephani
  2017-06-16 21:39     ` Philipp Stephani
  0 siblings, 1 reply; 21+ messages in thread
From: Philipp Stephani @ 2017-06-16 21:34 UTC (permalink / raw)
  To: Vincent Belaïche, Eli Zaretskii, 27391

[-- Attachment #1: Type: text/plain, Size: 2417 bytes --]

Vincent Belaïche <vincent.belaiche@gmail.com> schrieb am Fr., 16. Juni 2017
um 23:28 Uhr:

>
>
> Le 16/06/2017 à 21:37, Vincent Belaïche a écrit :
> >
> >
> > Le 16/06/2017 à 21:15, Vincent Belaïche a écrit :
> >>
>
> [...]
>
> >>
> >>
> > After some more investigation, I think that the bug is in function
> > insert-file-contents of fileio.c which is the one that decide and sets
> > the coding system well before the other local variables are looked into.
>
> After some more investigation, in the end the find-auto-coding of
> mule.el is what is called to detect the coding. This function calls some
> re-coding regexp.
>
> Here is a test function defining the same regexp.
>
>
> (defun doit ()
>   (interactive)
>   (let* ((prefix (regexp-quote "[comment]: # ("))
>          (suffix (regexp-quote ")"))
>          (re-coding
>           (concat
>            "[\r\n]" prefix
>            ;; N.B. without the \n below, the regexp can
>            ;; eat newlines.
>            "[ \t]*coding[ \t]*:[ \t]*\\([^ \t\r\n]+\\)[ \t]*"
>            suffix "[\r\n]")))
>     (message (if (looking-at re-coding) "ok" "nak"))))
>
> I tried it with point at end of line
>
> [comment]: # ( Local Variables: )
>
> and it answered "ok". Now I defined this with re-search-forward instead
> of looking-at:
>
> (defun doit ()
>   (interactive)
>   (let* ((prefix (regexp-quote "[comment]: # ("))
>          (suffix (regexp-quote ")"))
>          (re-coding
>           (concat
>            "[\r\n]" prefix
>            ;; N.B. without the \n below, the regexp can
>            ;; eat newlines.
>            "[ \t]*coding[ \t]*:[ \t]*\\([^ \t\r\n]+\\)[ \t]*"
>            suffix "[\r\n]")))
>     (message (if (re-search-forward re-coding nil t) "ok" "nak"))))
>
> I placed the point before the coding: line, and I also got answer "ok"
>
> So I don't think that the regexp as such is to blame. Something else
> seems to happen. It is too late now, I need to go to bed...
>
>   Vincent.
>
>
I think it's actually the regexp that searches for "Local Variables". The
following minimal example fails for me:

(with-temp-buffer
  (insert "

[comment]: # ( Local Variables: )
[comment]: # ( coding: utf-8 )
[comment]: # ( End: )

")
(goto-char (point-min))
(re-search-forward
 "[\r\n]\\([^[\r\n]*\\)[ \t]*Local Variables:[ \t]*\\([^\r\n]*\\)[\r\n]"))

[-- Attachment #2: Type: text/html, Size: 3433 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file
  2017-06-16 21:34   ` Philipp Stephani
@ 2017-06-16 21:39     ` Philipp Stephani
  2017-06-16 21:52       ` Philipp Stephani
  0 siblings, 1 reply; 21+ messages in thread
From: Philipp Stephani @ 2017-06-16 21:39 UTC (permalink / raw)
  To: Vincent Belaïche, Eli Zaretskii, 27391

[-- Attachment #1: Type: text/plain, Size: 2738 bytes --]

Philipp Stephani <p.stephani2@gmail.com> schrieb am Fr., 16. Juni 2017 um
23:34 Uhr:

> Vincent Belaïche <vincent.belaiche@gmail.com> schrieb am Fr., 16. Juni
> 2017 um 23:28 Uhr:
>
>>
>>
>> Le 16/06/2017 à 21:37, Vincent Belaïche a écrit :
>> >
>> >
>> > Le 16/06/2017 à 21:15, Vincent Belaïche a écrit :
>> >>
>>
>> [...]
>>
>> >>
>> >>
>> > After some more investigation, I think that the bug is in function
>> > insert-file-contents of fileio.c which is the one that decide and sets
>> > the coding system well before the other local variables are looked into.
>>
>> After some more investigation, in the end the find-auto-coding of
>> mule.el is what is called to detect the coding. This function calls some
>> re-coding regexp.
>>
>> Here is a test function defining the same regexp.
>>
>>
>> (defun doit ()
>>   (interactive)
>>   (let* ((prefix (regexp-quote "[comment]: # ("))
>>          (suffix (regexp-quote ")"))
>>          (re-coding
>>           (concat
>>            "[\r\n]" prefix
>>            ;; N.B. without the \n below, the regexp can
>>            ;; eat newlines.
>>            "[ \t]*coding[ \t]*:[ \t]*\\([^ \t\r\n]+\\)[ \t]*"
>>            suffix "[\r\n]")))
>>     (message (if (looking-at re-coding) "ok" "nak"))))
>>
>> I tried it with point at end of line
>>
>> [comment]: # ( Local Variables: )
>>
>> and it answered "ok". Now I defined this with re-search-forward instead
>> of looking-at:
>>
>> (defun doit ()
>>   (interactive)
>>   (let* ((prefix (regexp-quote "[comment]: # ("))
>>          (suffix (regexp-quote ")"))
>>          (re-coding
>>           (concat
>>            "[\r\n]" prefix
>>            ;; N.B. without the \n below, the regexp can
>>            ;; eat newlines.
>>            "[ \t]*coding[ \t]*:[ \t]*\\([^ \t\r\n]+\\)[ \t]*"
>>            suffix "[\r\n]")))
>>     (message (if (re-search-forward re-coding nil t) "ok" "nak"))))
>>
>> I placed the point before the coding: line, and I also got answer "ok"
>>
>> So I don't think that the regexp as such is to blame. Something else
>> seems to happen. It is too late now, I need to go to bed...
>>
>>   Vincent.
>>
>>
> I think it's actually the regexp that searches for "Local Variables". The
> following minimal example fails for me:
>
> (with-temp-buffer
>   (insert "
>
> [comment]: # ( Local Variables: )
> [comment]: # ( coding: utf-8 )
> [comment]: # ( End: )
>
> ")
> (goto-char (point-min))
> (re-search-forward
>  "[\r\n]\\([^[\r\n]*\\)[ \t]*Local Variables:[ \t]*\\([^\r\n]*\\)[\r\n]"))
>
>
Does anybody know why the second character range says [^[\r\n] instead of
 [^\r\n]? This seems to explicitly exclude a leading [.

[-- Attachment #2: Type: text/html, Size: 4161 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file
  2017-06-16 21:39     ` Philipp Stephani
@ 2017-06-16 21:52       ` Philipp Stephani
  0 siblings, 0 replies; 21+ messages in thread
From: Philipp Stephani @ 2017-06-16 21:52 UTC (permalink / raw)
  To: Vincent Belaïche, Eli Zaretskii, 27391


[-- Attachment #1.1: Type: text/plain, Size: 2970 bytes --]

Philipp Stephani <p.stephani2@gmail.com> schrieb am Fr., 16. Juni 2017 um
23:39 Uhr:

> Philipp Stephani <p.stephani2@gmail.com> schrieb am Fr., 16. Juni 2017 um
> 23:34 Uhr:
>
>> Vincent Belaïche <vincent.belaiche@gmail.com> schrieb am Fr., 16. Juni
>> 2017 um 23:28 Uhr:
>>
>>>
>>>
>>> Le 16/06/2017 à 21:37, Vincent Belaïche a écrit :
>>> >
>>> >
>>> > Le 16/06/2017 à 21:15, Vincent Belaïche a écrit :
>>> >>
>>>
>>> [...]
>>>
>>> >>
>>> >>
>>> > After some more investigation, I think that the bug is in function
>>> > insert-file-contents of fileio.c which is the one that decide and sets
>>> > the coding system well before the other local variables are looked
>>> into.
>>>
>>> After some more investigation, in the end the find-auto-coding of
>>> mule.el is what is called to detect the coding. This function calls some
>>> re-coding regexp.
>>>
>>> Here is a test function defining the same regexp.
>>>
>>>
>>> (defun doit ()
>>>   (interactive)
>>>   (let* ((prefix (regexp-quote "[comment]: # ("))
>>>          (suffix (regexp-quote ")"))
>>>          (re-coding
>>>           (concat
>>>            "[\r\n]" prefix
>>>            ;; N.B. without the \n below, the regexp can
>>>            ;; eat newlines.
>>>            "[ \t]*coding[ \t]*:[ \t]*\\([^ \t\r\n]+\\)[ \t]*"
>>>            suffix "[\r\n]")))
>>>     (message (if (looking-at re-coding) "ok" "nak"))))
>>>
>>> I tried it with point at end of line
>>>
>>> [comment]: # ( Local Variables: )
>>>
>>> and it answered "ok". Now I defined this with re-search-forward instead
>>> of looking-at:
>>>
>>> (defun doit ()
>>>   (interactive)
>>>   (let* ((prefix (regexp-quote "[comment]: # ("))
>>>          (suffix (regexp-quote ")"))
>>>          (re-coding
>>>           (concat
>>>            "[\r\n]" prefix
>>>            ;; N.B. without the \n below, the regexp can
>>>            ;; eat newlines.
>>>            "[ \t]*coding[ \t]*:[ \t]*\\([^ \t\r\n]+\\)[ \t]*"
>>>            suffix "[\r\n]")))
>>>     (message (if (re-search-forward re-coding nil t) "ok" "nak"))))
>>>
>>> I placed the point before the coding: line, and I also got answer "ok"
>>>
>>> So I don't think that the regexp as such is to blame. Something else
>>> seems to happen. It is too late now, I need to go to bed...
>>>
>>>   Vincent.
>>>
>>>
>> I think it's actually the regexp that searches for "Local Variables". The
>> following minimal example fails for me:
>>
>> (with-temp-buffer
>>   (insert "
>>
>> [comment]: # ( Local Variables: )
>> [comment]: # ( coding: utf-8 )
>> [comment]: # ( End: )
>>
>> ")
>> (goto-char (point-min))
>> (re-search-forward
>>  "[\r\n]\\([^[\r\n]*\\)[ \t]*Local Variables:[ \t]*\\([^\r\n]*\\)[\r\n]"))
>>
>>
> Does anybody know why the second character range says [^[\r\n] instead of
>  [^\r\n]? This seems to explicitly exclude a leading [.
>

If this is a typo, then here's a patch.

[-- Attachment #1.2: Type: text/html, Size: 4668 bytes --]

[-- Attachment #2: 0001-Allow-local-variables-section-to-begin-with-a-square-b.txt --]
[-- Type: text/plain, Size: 3003 bytes --]

From 2f8bbf7e729ea09addf0d066861a3b38c312141d Mon Sep 17 00:00:00 2001
From: Philipp Stephani <phst@google.com>
Date: Fri, 16 Jun 2017 23:49:09 +0200
Subject: [PATCH] Allow local variables section to begin with a square bracket

Fixes Bug#27391.

* lisp/international/mule.el (find-auto-coding): Fix regular
expression for "Local Variables" section.

* test/lisp/international/mule-tests.el (find-auto-coding--bug27391):
Add unit test.
---
 lisp/international/mule.el            |  2 +-
 test/lisp/international/mule-tests.el | 39 +++++++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+), 1 deletion(-)
 create mode 100644 test/lisp/international/mule-tests.el

diff --git a/lisp/international/mule.el b/lisp/international/mule.el
index fa3ad80e2f..6cfb7e6d45 100644
--- a/lisp/international/mule.el
+++ b/lisp/international/mule.el
@@ -1970,7 +1970,7 @@ find-auto-coding
 	  (goto-char tail-start)
 	  (re-search-forward "[\r\n]\^L" tail-end t)
 	  (if (re-search-forward
-	       "[\r\n]\\([^[\r\n]*\\)[ \t]*Local Variables:[ \t]*\\([^\r\n]*\\)[\r\n]"
+	       "[\r\n]\\([^\r\n]*\\)[ \t]*Local Variables:[ \t]*\\([^\r\n]*\\)[\r\n]"
 	       tail-end t)
 	      ;; The prefix is what comes before "local variables:" in its
 	      ;; line.  The suffix is what comes after "local variables:"
diff --git a/test/lisp/international/mule-tests.el b/test/lisp/international/mule-tests.el
new file mode 100644
index 0000000000..084f609c45
--- /dev/null
+++ b/test/lisp/international/mule-tests.el
@@ -0,0 +1,39 @@
+;;; mule-tests.el --- unit tests for mule.el         -*- lexical-binding: t; -*-
+
+;; Copyright (C) 2017 Free Software Foundation, Inc.
+
+;; This file is part of GNU Emacs.
+
+;; GNU Emacs is free software: you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation, either version 3 of the License, or
+;; (at your option) any later version.
+
+;; GNU Emacs is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GNU Emacs.  If not, see <http://www.gnu.org/licenses/>.
+
+;;; Commentary:
+
+;; Unit tests for lisp/international/mule.el.
+
+;;; Code:
+
+(ert-deftest find-auto-coding--bug27391 ()
+  "Check that Bug#27391 is fixed."
+  (with-temp-buffer
+    (insert "\n[comment]: # ( Local Variables: )\n"
+            "[comment]: # ( coding: utf-8	)\n"
+            "[comment]: # ( End:		)n")
+    (goto-char (point-min))
+    (should (equal (let ((auto-coding-alist ())
+                         (auto-coding-regexp-alist ())
+                         (auto-coding-functions ()))
+                     (find-auto-coding "" (buffer-size)))
+                   '(utf-8 . :coding)))))
+
+;;; mule-tests.el ends here
-- 
2.13.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file
  2017-06-16 10:00 bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file Vincent Belaïche
                   ` (2 preceding siblings ...)
  2017-06-16 21:27 ` Vincent Belaïche
@ 2017-06-16 22:09 ` Vincent Belaïche
  2017-06-16 22:23   ` Vincent Belaïche
  3 siblings, 1 reply; 21+ messages in thread
From: Vincent Belaïche @ 2017-06-16 22:09 UTC (permalink / raw)
  To: Eli Zaretskii, 27391; +Cc: Vincent Belaïche



Le 16/06/2017 à 21:37, Vincent Belaïche a écrit :
>
>
> Le 16/06/2017 à 21:15, Vincent Belaïche a écrit :
>>

[...]

>>
>>
> After some more investigation, I think that the bug is in function
> insert-file-contents of fileio.c which is the one that decide and sets
> the coding system well before the other local variables are looked into. 

I have located the bug.

After some more investigation, in the end the find-auto-coding of
mule.el is what is called to detect the coding.

This function evaluates this expression to find the local variables:

 (re-search-forward
	       "[\r\n]\\([^[\r\n]*\\)[ \t]*Local Variables:[ \t]*\\([^\r\n]*\\)[\r\n]"
	       tail-end t)

This expression evaluates to nil over file CONTRIBUTING.md

I can make a simple fix if you tell me on which branch to do it.

However I think that the root of the problem is poor code factorization
of local variable parsing between mule.el and file.el. A better, more
futureproof fix would be some unique local variable parser with some
input constrain telling what sort of setting are sought. The output of
the parse could be used in file.el and mule.el.

  Vincent.


---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus






^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file
  2017-06-16 22:09 ` Vincent Belaïche
@ 2017-06-16 22:23   ` Vincent Belaïche
  2017-06-17  5:45     ` Vincent Belaïche
  2017-06-17 14:15     ` Philipp Stephani
  0 siblings, 2 replies; 21+ messages in thread
From: Vincent Belaïche @ 2017-06-16 22:23 UTC (permalink / raw)
  To: Eli Zaretskii, 27391, p.stephani2



Le 17/06/2017 à 00:09, Vincent Belaïche a écrit :
>
> Le 16/06/2017 à 21:37, Vincent Belaïche a écrit :
>>
>> Le 16/06/2017 à 21:15, Vincent Belaïche a écrit :
> [...]
>
>>>
>> After some more investigation, I think that the bug is in function
>> insert-file-contents of fileio.c which is the one that decide and sets
>> the coding system well before the other local variables are looked into.
> I have located the bug.
>
> After some more investigation, in the end the find-auto-coding of
> mule.el is what is called to detect the coding.
>
> This function evaluates this expression to find the local variables:
>
>   (re-search-forward
> 	       "[\r\n]\\([^[\r\n]*\\)[ \t]*Local Variables:[ \t]*\\([^\r\n]*\\)[\r\n]"
> 	       tail-end t)
>
> This expression evaluates to nil over file CONTRIBUTING.md
>
> I can make a simple fix if you tell me on which branch to do it.
>
> However I think that the root of the problem is poor code factorization
> of local variable parsing between mule.el and file.el. A better, more
> futureproof fix would be some unique local variable parser with some
> input constrain telling what sort of setting are sought. The output of
> the parse could be used in file.el and mule.el.
>
>    Vincent.
>
>
Ooops... my lengthy email of T23:34 was unwantedly sent. A shorter
version with only the conclusion and w/o all the details of my
investigation is above.

Anyway, Philipp's patch is what I had in mind as a quick fix. Although I
don't think that this is a good solution not to factorize code when
possible. Factorizing makes it more maintainable.

  V.

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus






^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file
  2017-06-16 22:23   ` Vincent Belaïche
@ 2017-06-17  5:45     ` Vincent Belaïche
  2017-06-17 14:30       ` Philipp Stephani
  2017-06-17 14:15     ` Philipp Stephani
  1 sibling, 1 reply; 21+ messages in thread
From: Vincent Belaïche @ 2017-06-17  5:45 UTC (permalink / raw)
  To: Eli Zaretskii, 27391, p.stephani2; +Cc: Vincent Belaïche



Le 17/06/2017 à 00:23, Vincent Belaïche a écrit :
>
>
> Le 17/06/2017 à 00:09, Vincent Belaïche a écrit :
>>
>> Le 16/06/2017 à 21:37, Vincent Belaïche a écrit :
>>>
>>> Le 16/06/2017 à 21:15, Vincent Belaïche a écrit :
>> [...]
>>
>>>>
>>> After some more investigation, I think that the bug is in function
>>> insert-file-contents of fileio.c which is the one that decide and sets
>>> the coding system well before the other local variables are looked into.
>> I have located the bug.
>>
>> After some more investigation, in the end the find-auto-coding of
>> mule.el is what is called to detect the coding.
>>
>> This function evaluates this expression to find the local variables:
>>
>>   (re-search-forward
>>            "[\r\n]\\([^[\r\n]*\\)[ \t]*Local Variables:[ \t]*\\([^\r\n]*\\)[\r\n]"
>>            tail-end t)
>>
>> This expression evaluates to nil over file CONTRIBUTING.md
>>
>> I can make a simple fix if you tell me on which branch to do it.
>>
>> However I think that the root of the problem is poor code factorization
>> of local variable parsing between mule.el and file.el. A better, more
>> futureproof fix would be some unique local variable parser with some
>> input constrain telling what sort of setting are sought. The output of
>> the parse could be used in file.el and mule.el.
>>
>>    Vincent.
>>
>>
> Ooops... my lengthy email of T23:34 was unwantedly sent. A shorter
> version with only the conclusion and w/o all the details of my
> investigation is above.
>
> Anyway, Philipp's patch is what I had in mind as a quick fix. Although I
> don't think that this is a good solution not to factorize code when
> possible. Factorizing makes it more maintainable.
>
>  V.

Just to mention the following points noted by me when comparing the code
in find-auto-coding and in hack-local-variables:

* In hack-local-variables the tailing local variables section is
  considered to be at max 3000 characters from eob, while in
  find-auto-coding it is considered to be 3072. The « correct » figure
  should be 3072, not 3000, for consistency with « 1024 * 3 » code in
  function Finsert_file_contents of fileio.c :

		  if (nread == 1024)
		    {
		      int ntail;
		      if (lseek (fd, - (1024 * 3), SEEK_END) < 0)
			report_file_error ("Setting file position",
					   orig_filename);
		      ntail = emacs_read_quit (fd, read_buf + nread, 1024 * 3);
		      nread = ntail < 0 ? ntail : nread + ntail;
		    }

   Maybe the exact value should be in some constant.

* In find-auto-coding there is no such thing as regexp operator "^" (for
  bol) or "$" (for eol) used, instead there is "[\r\n]". I suspect that
  this is because at this stage the coding system is not yet set, and
  therefore there is no such thing as bol or eol, the whole buffer is a
  single line. If as such, I withdraw my previous statement that code
  factorization is desirable.


* In both cases what is sought for is the *FIRST* occurrence searched
  *FORWARD* of case sensitive string "Local Variables:" in the buffer
  tailing 3000--3072 characters. I think that this is a problem and that
  either we should search it *BACKWARD* or after finding the 1st
  occurrence, possible subsequent occurrences should be searched for,
  and the last occurrence should be considered instead. I say this
  because with emacs-template package it is possible that the template
  file has some local variables in the template definition section that
  differ from that of template itself. See
                (info "(template) DefSect")
  For instance the end of the template file would be as follow:


--8<----8<----8<----8<----8<-- begin -->8---->8---->8---->8---->8----

... blah blah blah template content ...

// Local Variables:
// toto: "tata"
// End:

>>>TEMPLATE-DEFINITION-SECTION<<<

... blah blah blah Lisp Template rules ...

;; Local Variables:
;; foo: "bar"
;; End:
--8<----8<----8<----8<----8<--  end  -->8---->8---->8---->8---->8----

  Maybe preventing the [ character in the prefix string is not a typo
  but was some intentional design to allow preventing false detection of
  the local variable section. I strongly recommend that before doing any
  fix, somebody dig in file history to find when and *WHY* this [
  preventing has been introduced --- sorry, but I do not volunteer for
  this tedious/time consuming kind of work...

   Vincent.

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus






^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file
  2017-06-16 22:23   ` Vincent Belaïche
  2017-06-17  5:45     ` Vincent Belaïche
@ 2017-06-17 14:15     ` Philipp Stephani
  1 sibling, 0 replies; 21+ messages in thread
From: Philipp Stephani @ 2017-06-17 14:15 UTC (permalink / raw)
  To: Vincent Belaïche, Eli Zaretskii, 27391-done

[-- Attachment #1: Type: text/plain, Size: 2130 bytes --]

Vincent Belaïche <vincent.belaiche@gmail.com> schrieb am Sa., 17. Juni 2017
um 00:23 Uhr:

>
>
> Le 17/06/2017 à 00:09, Vincent Belaïche a écrit :
> >
> > Le 16/06/2017 à 21:37, Vincent Belaïche a écrit :
> >>
> >> Le 16/06/2017 à 21:15, Vincent Belaïche a écrit :
> > [...]
> >
> >>>
> >> After some more investigation, I think that the bug is in function
> >> insert-file-contents of fileio.c which is the one that decide and sets
> >> the coding system well before the other local variables are looked into.
> > I have located the bug.
> >
> > After some more investigation, in the end the find-auto-coding of
> > mule.el is what is called to detect the coding.
> >
> > This function evaluates this expression to find the local variables:
> >
> >   (re-search-forward
> >              "[\r\n]\\([^[\r\n]*\\)[ \t]*Local Variables:[
> \t]*\\([^\r\n]*\\)[\r\n]"
> >              tail-end t)
> >
> > This expression evaluates to nil over file CONTRIBUTING.md
> >
> > I can make a simple fix if you tell me on which branch to do it.
> >
> > However I think that the root of the problem is poor code factorization
> > of local variable parsing between mule.el and file.el. A better, more
> > futureproof fix would be some unique local variable parser with some
> > input constrain telling what sort of setting are sought. The output of
> > the parse could be used in file.el and mule.el.
> >
> >    Vincent.
> >
> >
> Ooops... my lengthy email of T23:34 was unwantedly sent. A shorter
> version with only the conclusion and w/o all the details of my
> investigation is above.
>
> Anyway, Philipp's patch is what I had in mind as a quick fix.


OK, I've pushed this commit as c3813b2aa8d2f5a625195fdbbfe6a01a602d7735.


> Although I
> don't think that this is a good solution not to factorize code when
> possible. Factorizing makes it more maintainable.
>

Agreed. Note that there's a third place in Emacs that parses a subset of
file-local variables: lread.c, to detect the lexical-binding variable when
loading ELisp files. Ideally that would be merged as well.

[-- Attachment #2: Type: text/html, Size: 2809 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file
  2017-06-17  5:45     ` Vincent Belaïche
@ 2017-06-17 14:30       ` Philipp Stephani
  2017-06-19 10:51         ` Vincent Belaïche
  0 siblings, 1 reply; 21+ messages in thread
From: Philipp Stephani @ 2017-06-17 14:30 UTC (permalink / raw)
  To: Vincent Belaïche, Eli Zaretskii, 27391

[-- Attachment #1: Type: text/plain, Size: 2173 bytes --]

Vincent Belaïche <vincent.belaiche@gmail.com> schrieb am Sa., 17. Juni 2017
um 07:45 Uhr:

>
>
> Le 17/06/2017 à 00:23, Vincent Belaïche a écrit :
> >
> >
> * In find-auto-coding there is no such thing as regexp operator "^" (for
>   bol) or "$" (for eol) used, instead there is "[\r\n]". I suspect that
>   this is because at this stage the coding system is not yet set, and
>   therefore there is no such thing as bol or eol, the whole buffer is a
>   single line. If as such, I withdraw my previous statement that code
>   factorization is desirable.
>

Why? It's a small variant that should be distinguishable using a parameter
to a shared function, such as:

enum file_local_flags {
  file_local_flag_default = 0x0,
  file_local_flag_use_bol_eol = 0x1,
  file_local_flag_search_trailer = 0x2,
};
Lisp_Object get_file_local_variable_value (Lisp_Object name, enum
file_local_flags flags);


>
>
> * In both cases what is sought for is the *FIRST* occurrence searched
>   *FORWARD* of case sensitive string "Local Variables:" in the buffer
>   tailing 3000--3072 characters. I think that this is a problem and that
>   either we should search it *BACKWARD* or after finding the 1st
>   occurrence, possible subsequent occurrences should be searched for,
>   and the last occurrence should be considered instead.
>

Yes, that would be consistent with normal file-local variables.


>
>   Maybe preventing the [ character in the prefix string is not a typo
>   but was some intentional design to allow preventing false detection of
>   the local variable section. I strongly recommend that before doing any
>   fix, somebody dig in file history to find when and *WHY* this [
>   preventing has been introduced --- sorry, but I do not volunteer for
>   this tedious/time consuming kind of work...
>
>
With git-blame it's not really tedious. Commit
6b61353c0a0320ee15bb6488149735381fed62ec replaced ^\\(.*\\)[ \t]* with
[\r\n]\\([^[\r\n]*\\)[ \t]*, so I think it's almost certain this is a typo
(the previous regex didn't exclude the [ either). Anyway, if people want
this to stay, they should have added a comment.

[-- Attachment #2: Type: text/html, Size: 2933 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file
  2017-06-17 14:30       ` Philipp Stephani
@ 2017-06-19 10:51         ` Vincent Belaïche
  2017-06-26 11:39           ` Philipp Stephani
  0 siblings, 1 reply; 21+ messages in thread
From: Vincent Belaïche @ 2017-06-19 10:51 UTC (permalink / raw)
  To: Philipp Stephani, 27391; +Cc: Vincent Belaïche

[...]
>
> With git-blame it's not really tedious. Commit
> 6b61353c0a0320ee15bb6488149735381fed62ec replaced ^\\(.*\\)[ \t]* with
> [\r\n]\\([^[\r\n]*\\)[ \t]*, so I think it's almost certain this is a
> typo (the previous regex didn't exclude the [ either). Anyway, if
> people want this to stay, they should have added a comment.

Thank you, I had a look at Wikipedia for the QWERTY keyboard layout (I
have a French keyboard and the layout is somehow different for \ and ]).

Modern QWERTY layout is as follows:

1 2 3 4 5 6 7 8 9 0 - =
Q W E R T Y U I O P [ ] \
A S D F G H J K L ; '
Z X C V B N M , . /

So ] is just next to \.

So, yes, definitely this is a typo, the author had too big a finger when
hitting \.

Concerning factorization, couldn't one use [\n\r] in all cases rather
than a switch based on some input argument ?

I was also wondering whether it is not possible to have a single regexp
for the whole Local Variable section. The following `doit' function is a
trial to do so. `M-x doit' will seach forward the whole Local Variables
section and display "ok" if found, "nak" otherwise.

(defun doit ()
  (interactive)
  (let* ((eol "\\(\r\n?\\|\n\\)")
	 (eol-again "\\1")
	 (space-maybe "[ \t]*")
         ;; suffix may be the empty string
	 (suffix  "\\([^ \r\n]+\\(?:[^\r\n]*[^ \r\n]\\)?\\|\\)")
	 (prefix "\\([ \t]*[^ \r\n]+\\(?:[^\r\n]*[^ \r\n]\\)?\\)")
	 (prefix-again "\\2")
	 (suffix-again "\\3")
	 (symbol: "\\(?:\\(?:[^][()'\" \t\r\n]\\|\\\\[][()'\" \t]\\)+[ \t]*:\\)")
	 (sexp (concat "\\(?:" (substring prefix 2))))

    (message (if (and (re-search-forward
		  (concat eol
			  prefix space-maybe "Local Variables:" space-maybe suffix space-maybe eol-again
			  "\\(?:" prefix space-maybe symbol:  sexp space-maybe suffix-again space-maybe eol-again "\\)*"
			  prefix space-maybe "End:" space-maybe suffix space-maybe "\\(" eol-again "\\)?"
			  )
		  nil t)
                  ;; when the tailing eol is not there we must be at EOB.
                  (or (match-string 3) (eobp)))
				    "ok" "nak"))))



   Vincent.



---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus






^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file
  2017-06-19 10:51         ` Vincent Belaïche
@ 2017-06-26 11:39           ` Philipp Stephani
  2017-06-27  6:05             ` Vincent Belaïche
  0 siblings, 1 reply; 21+ messages in thread
From: Philipp Stephani @ 2017-06-26 11:39 UTC (permalink / raw)
  To: Vincent Belaïche, 27391

[-- Attachment #1: Type: text/plain, Size: 1966 bytes --]

Vincent Belaïche <vincent.belaiche@gmail.com> schrieb am Mo., 19. Juni 2017
um 12:51 Uhr:

>
> Concerning factorization, couldn't one use [\n\r] in all cases rather
> than a switch based on some input argument ?
>

It should be possible, but it slightly changes the behavior of file-local
variables. I wouldn't expect anything to break though.


>
> I was also wondering whether it is not possible to have a single regexp
> for the whole Local Variable section. The following `doit' function is a
> trial to do so. `M-x doit' will seach forward the whole Local Variables
> section and display "ok" if found, "nak" otherwise.
>
> (defun doit ()
>   (interactive)
>   (let* ((eol "\\(\r\n?\\|\n\\)")
>          (eol-again "\\1")
>          (space-maybe "[ \t]*")
>          ;; suffix may be the empty string
>          (suffix  "\\([^ \r\n]+\\(?:[^\r\n]*[^ \r\n]\\)?\\|\\)")
>          (prefix "\\([ \t]*[^ \r\n]+\\(?:[^\r\n]*[^ \r\n]\\)?\\)")
>          (prefix-again "\\2")
>          (suffix-again "\\3")
>          (symbol: "\\(?:\\(?:[^][()'\" \t\r\n]\\|\\\\[][()'\" \t]\\)+[
> \t]*:\\)")
>          (sexp (concat "\\(?:" (substring prefix 2))))
>
>     (message (if (and (re-search-forward
>                   (concat eol
>                           prefix space-maybe "Local Variables:"
> space-maybe suffix space-maybe eol-again
>                           "\\(?:" prefix space-maybe symbol:  sexp
> space-maybe suffix-again space-maybe eol-again "\\)*"
>                           prefix space-maybe "End:" space-maybe suffix
> space-maybe "\\(" eol-again "\\)?"
>                           )
>                   nil t)
>                   ;; when the tailing eol is not there we must be at EOB.
>                   (or (match-string 3) (eobp)))
>                                     "ok" "nak"))))
>
>
>
Looks good. Consider using `rx' for complex regexes, in my experiences it
increases readability a lot.

[-- Attachment #2: Type: text/html, Size: 2829 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file
  2017-06-26 11:39           ` Philipp Stephani
@ 2017-06-27  6:05             ` Vincent Belaïche
  0 siblings, 0 replies; 21+ messages in thread
From: Vincent Belaïche @ 2017-06-27  6:05 UTC (permalink / raw)
  To: Philipp Stephani, 27391; +Cc: Vincent Belaïche

My answers inserted below.

Le 26/06/2017 à 13:39, Philipp Stephani a écrit :
>
>
> Vincent Belaïche <vincent.belaiche@gmail.com> schrieb am Mo., 19. Juni 2017 um 12:51 Uhr:
>
>
>     Concerning factorization, couldn't one use [\n\r] in all cases
>     rather than a switch based on some input argument ?
>
>
> It should be possible, but it slightly changes the behavior of
> file-local variables. I wouldn't expect anything to break though.
>
>

Sorry, I can't understand why there should be any slight change in the
current behaviour. BTW, as in the doit function given below what I had
in mind was some "\\(\r\n?\\|\n\\)" construct rather than a plain
"[\r\n]", so it consistently matches CR (as one some Apple computers),
CR-LF (as on MSW) and LF.

>
>     I was also wondering whether it is not possible to have a single regexp
>     for the whole Local Variable section. The following `doit' function is a
>     trial to do so. `M-x doit' will seach forward the whole Local Variables
>     section and display "ok" if found, "nak" otherwise.
>
>     (defun doit ()
>       (interactive)
>       (let* ((eol "\\(\r\n?\\|\n\\)")
>              (eol-again "\\1")
>              (space-maybe "[ \t]*")
>              ;; suffix may be the empty string
>              (suffix  "\\([^ \r\n]+\\(?:[^\r\n]*[^ \r\n]\\)?\\|\\)")
>              (prefix "\\([ \t]*[^ \r\n]+\\(?:[^\r\n]*[^ \r\n]\\)?\\)")
>              (prefix-again "\\2")
>              (suffix-again "\\3")
>              (symbol: "\\(?:\\(?:[^][()'\" \t\r\n]\\|\\\\[][()'\" \t]\\)+[ \t]*:\\)")
>              (sexp (concat "\\(?:" (substring prefix 2))))
>
>         (message (if (and (re-search-forward
>                       (concat eol
>                               prefix space-maybe "Local Variables:" space-maybe suffix space-maybe eol-again
>                               "\\(?:" prefix space-maybe symbol:  sexp space-maybe suffix-again space-maybe eol-again "\\)*"
>                               prefix space-maybe "End:" space-maybe suffix space-maybe "\\(" eol-again "\\)?"
>                               )
>                       nil t)
>                       ;; when the tailing eol is not there we must be at EOB.
>                       (or (match-string 3) (eobp)))
>                                         "ok" "nak"))))
>
>
>
> Looks good. Consider using `rx' for complex regexes, in my experiences it increases readability a lot.

On second thought the regexp considered above has some limitation : it
would fail if the sexp is multiline. For instance the following would
fail.

--8<----8<----8<----8<----8<-- begin -->8---->8---->8---->8---->8----
/* Local Variables: */
/* multiline-sexp: ( "first line"
    "second line" ) */
/* End: */
--8<----8<----8<----8<----8<--  end  -->8---->8---->8---->8---->8----

This is a regression as I think that the current code allows multiline
--- well I am not 100% sure of that, I presume this just from my reading
the current code.

I don't know if multiline sexps in file local variables is a desirable
feature, personally I have never used them.

And I am not even sure either that making a regexp that matches an Elisp
sexp is feasible, or sensible. It is not sensible in my opinion because
any change in the Elisp reader --- like supporting bignums as we had
discussed quite some day ago with Jay Belanger, maintainer of Calc ---
would imply some change in this regexp.

And regpexps do not support either any [:elisp-sexp:] construct that
would do the job with some `read' call under the hood.

  Vincent.


---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus






^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-06-27  6:05 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-16 10:00 bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file Vincent Belaïche
2017-06-16 12:59 ` Eli Zaretskii
2017-06-16 14:08 ` Vincent Belaïche
2017-06-16 14:10   ` Vincent Belaïche
2017-06-16 18:38   ` Eli Zaretskii
2017-06-16 19:08     ` Vincent Belaïche
2017-06-16 19:15     ` Vincent Belaïche
2017-06-16 19:31       ` Andreas Schwab
2017-06-16 19:37       ` Vincent Belaïche
2017-06-16 21:27 ` Vincent Belaïche
2017-06-16 21:34   ` Philipp Stephani
2017-06-16 21:39     ` Philipp Stephani
2017-06-16 21:52       ` Philipp Stephani
2017-06-16 22:09 ` Vincent Belaïche
2017-06-16 22:23   ` Vincent Belaïche
2017-06-17  5:45     ` Vincent Belaïche
2017-06-17 14:30       ` Philipp Stephani
2017-06-19 10:51         ` Vincent Belaïche
2017-06-26 11:39           ` Philipp Stephani
2017-06-27  6:05             ` Vincent Belaïche
2017-06-17 14:15     ` Philipp Stephani

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).