* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file @ 2017-06-16 10:00 Vincent Belaïche 2017-06-16 12:59 ` Eli Zaretskii ` (3 more replies) 0 siblings, 4 replies; 21+ messages in thread From: Vincent Belaïche @ 2017-06-16 10:00 UTC (permalink / raw) To: 27391; +Cc: Vincent Belaïche ================================================================================ I was editing some file written in Markdown. Here is the file : https://framagit.org/latex-pourquoi-comment/lpc-articles/blob/795ecb9d4f7b8870486fe6557f01d2fe450c4461/CONTRIBUTING.md My Emacs default configuration was to get files in latin-1. So I had added some `coding: utf-8' cookie in this file. But it did not work, the file was still read in latin-1 instead of utf8. I made a test with one more cookie `eval: (message "Hello")', this one worked, which means that the problem is not that cookies aren't read, the problem is within the application of the coding scheme. The only way for me to get the correct encoding is to place: (modify-coding-system-alist 'file "\\.m\\(d\\|arkdown\\)\\'" 'prefer-utf-8) In my init file. I made the trial with `emacs -q', and the problem is still there, which shows that markdown-mode is not to blame. My first thought was that markdown-mode was the culprit, see discussion here : https://github.com/jrblevin/markdown-mode/issues/198 Jason Blevin is the author of markdown-mode, he noted that the presence of the [ character has some impact. See: https://github.com/jrblevin/markdown-mode/issues/198#issuecomment-308524696 I did not double check his analysis. To me this looks like some race problem where the automatic encoding detection is applied after the cookie and undoes it. Maybe some semaphore is missing, or something like that. Vincent. ================================================================================ In GNU Emacs 25.2.50.1 (i686-pc-mingw32) of 2017-06-14 built on AIGLEROYAL Repository revision: da62c1532e479bbac4ce242ee1d170df9c435591 Windowing system distributor 'Microsoft Corp.', version 10.0.14393 Configured using: 'configure --prefix=c:/Nos_Programmes/GNU/Emacs --without-jpeg --without-tiff --without-gif --without-png 'CFLAGS= -Og -g3 -L C:/Programmes/installation/emacs-install/libXpm-3.5.8/src' 'CPPFLAGS= -DFOR_MSW=1 -I C:/Programmes/installation/emacs-install/libXpm-3.5.8/include -I C:/Programmes/installation/emacs-install/libXpm-3.5.8/src -L C:/Programmes/installation/emacs-install/libXpm-3.5.8/src'' Configured features: XPM SOUND NOTIFY ACL TOOLKIT_SCROLL_BARS Important settings: value of $LANG: FRA locale-coding-system: cp1252 Major mode: Dired by name Minor modes in effect: diff-auto-refine-mode: t TeX-PDF-mode: t shell-dirtrack-mode: t recentf-mode: t tooltip-mode: t global-eldoc-mode: t electric-indent-mode: t mouse-wheel-mode: t tool-bar-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t blink-cursor-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t buffer-read-only: t line-number-mode: t transient-mark-mode: t Recent messages: Mark set [2 times] Mark saved where search started Quit scroll-up-command: End of buffer Mark set find-dired *Find* finished. dired-get-file-for-visit: No file on this line [2 times] Mark set Quit Making completion list... Load-path shadows: c:/Programmes/installation/cedet-install/cedet-git/lisp/speedbar/loaddefs hides c:/Nos_Programmes/GNU/Emacs/share/emacs/25.2.50/lisp/loaddefs c:/Programmes/installation/cedet-install/cedet-git/lisp/speedbar/loaddefs hides c:/Programmes/installation/cedet-install/cedet-git/lisp/cedet/loaddefs Features: (shadow emacsbug find-dired calc-yank calc-mode calccomp calc-alg calc-vec calc-aent calc-menu cal-move whitespace perl-mode log-edit pcvs-util eieio-opt speedbar sb-image ezimage dframe vc-bzr vc-src vc-sccs vc-svn vc-rcs vc-dir ewoc add-log org-element org-rmail org-mhe org-irc org-info org-gnus org-docview doc-view subr-x jka-compr image-mode org-bibtex bibtex org-bbdb org-w3m org org-macro org-footnote org-pcomplete org-list org-faces org-entities org-version ob-emacs-lisp ob ob-tangle ob-ref ob-lob ob-table ob-exp org-src ob-keys ob-comint ob-core ob-eval org-compat org-macs org-loaddefs find-func cal-menu calendar cal-loaddefs tex-info texinfo vc vc-dispatcher ediff-vers thingatpt rect visual-basic-mode sh-script smie executable make-mode misearch multi-isearch ediff-merg ediff-wind ediff-diff ediff-mult ediff-help ediff-init ediff-util ediff vc-git diff-mode reftex-dcr reftex reftex-vars preview prv-emacs noutline outline pcmpl-unix latexenc tex-bar latex easy-mmode tex-style toolbar-x font-latex plain-tex tex-buf tex advice tex-mode compile shell pcomplete comint ansi-color ring bbdb-print info mailalias smtpmail sort ispell vc-cvs hl-line balance eieio-compat calc-forms dired-aux mail-extr bbdb-message sendmail gnus-async qp gnus-ml cursor-sensor nndraft nnmh nnfolder bbdb-gnus bbdb-mua bbdb-com crm network-stream nsm auth-source eieio eieio-core starttls gnus-agent gnus-srvr gnus-score score-mode nnvirtual gnus-msg gnus-art mm-uu mml2015 mm-view mml-smime smime dig mailcap nntp gnus-cache gnus-sum gnus-group gnus-undo gnus-start gnus-cloud nnimap nnmail mail-source tls gnutls utf7 netrc nnoo parse-time gnus-spec gnus-int gnus-range message dired-x dired format-spec rfc822 mml mml-sec password-cache epg mm-decode mm-bodies mm-encode mail-parse rfc2231 rfc2047 rfc2045 ietf-drums mailabbrev gmm-utils mailheader gnus-win gnus gnus-ems nnheader gnus-util mail-utils mm-util help-fns mail-prsvr edmacro kmacro skeleton calc-misc calc-arith calc-ext calc calc-loaddefs calc-macs tex-mik preview-latex tex-site auto-loads bbdb bbdb-site timezone bbdb-loaddefs template w32utils cl-seq cl-macs cl recentf tree-widget wid-edit load-path-to-cedet-svn finder-inf package epg-config seq byte-opt gv bytecomp byte-compile cl-extra help-mode easymenu cconv cl-loaddefs pcase cl-lib time-date mule-util tooltip eldoc electric uniquify ediff-hook vc-hooks lisp-float-type mwheel dos-w32 ls-lisp disp-table w32-win w32-vars term/common-win tool-bar dnd fontset image regexp-opt fringe tabulated-list newcomment elisp-mode lisp-mode prog-mode register page menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock font-lock syntax facemenu font-core frame cl-generic cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese charscript case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer cl-preloaded nadvice loaddefs button faces cus-face macroexp files text-properties overlay sha1 md5 base64 format env code-pages mule custom widget hashtable-print-readable backquote w32notify w32 multi-tty make-network-process emacs) Memory information: ((conses 8 899957 158092) (symbols 32 53590 0) (miscs 32 2257 2796) (strings 16 133750 20600) (string-bytes 1 5975277) (vectors 8 55330) (vector-slots 4 1716681 54830) (floats 8 651 494) (intervals 28 72632 8079) (buffers 516 78)) --- L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. https://www.avast.com/antivirus ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file 2017-06-16 10:00 bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file Vincent Belaïche @ 2017-06-16 12:59 ` Eli Zaretskii 2017-06-16 14:08 ` Vincent Belaïche ` (2 subsequent siblings) 3 siblings, 0 replies; 21+ messages in thread From: Eli Zaretskii @ 2017-06-16 12:59 UTC (permalink / raw) To: Vincent Belaïche; +Cc: 27391 > From: vincent.belaiche@gmail.com (Vincent Belaïche) > Date: Fri, 16 Jun 2017 12:00:06 +0200 > Cc: Vincent Belaïche <vincent.belaiche@gmail.com> > > I was editing some file written in Markdown. Here is the file : > > https://framagit.org/latex-pourquoi-comment/lpc-articles/blob/795ecb9d4f7b8870486fe6557f01d2fe450c4461/CONTRIBUTING.md > > My Emacs default configuration was to get files in latin-1. So I had > added some `coding: utf-8' cookie in this file. But it did not work, the > file was still read in latin-1 instead of utf8. I cannot reproduce this, and I don't see any coding cookies in the file I downloaded. Please provide a minimal recipe that's required to reproduce the problem. In particular, since you tried in "emacs -q", I don't understand what does it mean that your default configuration is latin-1: in "emacs -q" your default configuration is determined by your system locale. Thanks. ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file 2017-06-16 10:00 bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file Vincent Belaïche 2017-06-16 12:59 ` Eli Zaretskii @ 2017-06-16 14:08 ` Vincent Belaïche 2017-06-16 14:10 ` Vincent Belaïche 2017-06-16 18:38 ` Eli Zaretskii 2017-06-16 21:27 ` Vincent Belaïche 2017-06-16 22:09 ` Vincent Belaïche 3 siblings, 2 replies; 21+ messages in thread From: Vincent Belaïche @ 2017-06-16 14:08 UTC (permalink / raw) To: 27391, Eli Zaretskii; +Cc: Vincent Belaïche [-- Attachment #1: Type: text/plain, Size: 2227 bytes --] Le 16/06/2017 à 14:59, Eli Zaretskii a écrit : >> From: vincent.belaiche@gmail.com (Vincent Belaïche) >> Date: Fri, 16 Jun 2017 12:00:06 +0200 >> Cc: Vincent Belaïche <vincent.belaiche@gmail.com> >> >> I was editing some file written in Markdown. Here is the file : >> >> https://framagit.org/latex-pourquoi-comment/lpc-articles/blob/795ecb9d4f7b8870486fe6557f01d2fe450c4461/CONTRIBUTING.md >> >> My Emacs default configuration was to get files in latin-1. So I had >> added some `coding: utf-8' cookie in this file. But it did not work, the >> file was still read in latin-1 instead of utf8. > > I cannot reproduce this, and I don't see any coding cookies in the > file I downloaded. > > Please provide a minimal recipe that's required to reproduce the > problem. In particular, since you tried in "emacs -q", I don't > understand what does it mean that your default configuration is > latin-1: in "emacs -q" your default configuration is determined by > your system locale. > > Thanks. Attached is the file causing the issue. Recipe is just to visit the file with emacs -q, and you see that the encoding is not taken. For instance I get the following doc section : --8<----8<----8<----8<----8<-- begin -->8---->8---->8---->8---->8---- ### doc Placez dans *doc* et ses sous-répertoires toute la documentation afférente au projet, sans oublier les notes et courriers électroniques importants. Vous pouvez avoir des sous-répertoires de doc contenant différents types de documents ou pour différentes phases du projet. --8<----8<----8<----8<----8<-- end -->8---->8---->8---->8---->8---- Instead of: --8<----8<----8<----8<----8<-- begin -->8---->8---->8---->8---->8---- ### doc Placez dans *doc* et ses sous-répertoires toute la documentation afférente au projet, sans oublier les notes et courriers électroniques importants. Vous pouvez avoir des sous-répertoires de doc contenant différents types de documents ou pour différentes phases du projet. --8<----8<----8<----8<----8<-- end -->8---->8---->8---->8---->8---- Vincent. --- L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. https://www.avast.com/antivirus [-- Attachment #2: CONTRIBUTING.md --] [-- Type: text/plain, Size: 2416 bytes --] Guide de contribution ===================== WorkFlow -------- Ce projet utilise Git-flow au pied de la lettre: * http://nvie.com/posts/a-successful-git-branching-model/ L'article de base qui donnera naissance au projet * https://danielkummer.github.io/git-flow-cheatsheet/index.fr_FR.html Aide mémoire français (et en d'autre traduction). Contributions ------------- Libre à vous de cloner le dépôt... Et de proposer des modifications. Conventions de nomnage ====================== Arborescence de fichier ----------------------- ### doc Placez dans *doc* et ses sous-répertoires toute la documentation afférente au projet, sans oublier les notes et courriers électroniques importants. Vous pouvez avoir des sous-répertoires de doc contenant différents types de documents ou pour différentes phases du projet. Si vous avez besoin de documentation externe, envisager de la copier ici. Cela rendra service pour maintenir le projet si l'endroit où les données en questions étaient accessibles disparaît. ### src Ce répertoire contient le code source du projet. Vous pouvez y faire des sous-répertoires pour différents types de code source, par exemple: * src/inc * src/img * ... ### util Répertoire contenant les utilitaires, outils et scripts spécifiques au projet. ### vendor Si le projet utilise des bibliothèques fournies par une partie tierce ou des fichiers d'en-têtes que vous désirez archiver avec votre code, faites-le ici. Gestionnaire de version ----------------------- Le workflow git suit scrupuleusement git-flow. ### Branche **master** Elle représente le dernier état installable en production du projet. Seul les administrateurs du dépôt peuvent travailler dans cette branche. ### Branche **devel** La branche où est récolté le travail de tout le monde, des branches de développement privées. Seul la "Team" peut travailler dans cette branche. ### les branches **feature** Chaque branche doit être Nommée de la manière suivante: * PSEUDO-DESCRIPTION où: * **PSEUDO** est le pseudo de l'administrateur (le créateur) de la branche * **DESCRIPTION** Une description en CamelCase (RaisonCreationBranche) de cette branche [comment]: # ( Local Variables: ) [comment]: # ( coding: utf-8 ) [comment]: # ( eval: (message "Coucou") ) [comment]: # ( End: ) ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file 2017-06-16 14:08 ` Vincent Belaïche @ 2017-06-16 14:10 ` Vincent Belaïche 2017-06-16 18:38 ` Eli Zaretskii 1 sibling, 0 replies; 21+ messages in thread From: Vincent Belaïche @ 2017-06-16 14:10 UTC (permalink / raw) To: 27391, Eli Zaretskii Le 16/06/2017 à 16:08, Vincent Belaïche a écrit : > Le 16/06/2017 à 14:59, Eli Zaretskii a écrit : >>> From: vincent.belaiche@gmail.com (Vincent Belaïche) >>> Date: Fri, 16 Jun 2017 12:00:06 +0200 >>> Cc: Vincent Belaïche <vincent.belaiche@gmail.com> >>> >>> I was editing some file written in Markdown. Here is the file : >>> >>> https://framagit.org/latex-pourquoi-comment/lpc-articles/blob/795ecb9d4f7b8870486fe6557f01d2fe450c4461/CONTRIBUTING.md >>> >>> My Emacs default configuration was to get files in latin-1. So I had >>> added some `coding: utf-8' cookie in this file. But it did not work, the >>> file was still read in latin-1 instead of utf8. >> I cannot reproduce this, and I don't see any coding cookies in the >> file I downloaded. >> >> Please provide a minimal recipe that's required to reproduce the >> problem. In particular, since you tried in "emacs -q", I don't >> understand what does it mean that your default configuration is >> latin-1: in "emacs -q" your default configuration is determined by >> your system locale. >> >> Thanks. > Attached is the file causing the issue. Recipe is just to visit the file > with emacs -q, and you see that the encoding is not taken. > > For instance I get the following doc section : > > --8<----8<----8<----8<----8<-- begin -->8---->8---->8---->8---->8---- > ### doc > Placez dans *doc* et ses sous-répertoires toute la documentation afférente au projet, sans oublier les notes et courriers électroniques importants. Vous pouvez avoir des sous-répertoires de doc contenant différents types de documents ou pour différentes phases du projet. > --8<----8<----8<----8<----8<-- end -->8---->8---->8---->8---->8---- > > Instead of: > > --8<----8<----8<----8<----8<-- begin -->8---->8---->8---->8---->8---- > ### doc > Placez dans *doc* et ses sous-répertoires toute la documentation afférente au projet, sans oublier les notes et courriers électroniques importants. Vous pouvez avoir des sous-répertoires de doc contenant différents types de documents ou pour différentes phases du projet. > --8<----8<----8<----8<----8<-- end -->8---->8---->8---->8---->8---- > > Vincent. > > > > --- > L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. > https://www.avast.com/antivirus Just for the clarification, you needed to click on the open raw button to see the cookie. I should have sent you this link : https://framagit.org/latex-pourquoi-comment/lpc-articles/raw/795ecb9d4f7b8870486fe6557f01d2fe450c4461/CONTRIBUTING.md Instead of the "viewer" equivalent link, where the markdown tags are interpreted into formatting. You cannot see the cookies with the viewer link because they are commented out, so the viewer does not display them. V. ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file 2017-06-16 14:08 ` Vincent Belaïche 2017-06-16 14:10 ` Vincent Belaïche @ 2017-06-16 18:38 ` Eli Zaretskii 2017-06-16 19:08 ` Vincent Belaïche 2017-06-16 19:15 ` Vincent Belaïche 1 sibling, 2 replies; 21+ messages in thread From: Eli Zaretskii @ 2017-06-16 18:38 UTC (permalink / raw) To: Vincent Belaïche; +Cc: 27391 > From: vincent.belaiche@gmail.com (Vincent Belaïche) > Cc: Vincent Belaïche <vincent.belaiche@gmail.com> > Date: Fri, 16 Jun 2017 16:08:09 +0200 > > Attached is the file causing the issue. Recipe is just to visit the file > with emacs -q, and you see that the encoding is not taken. Your fancy comment causes this: remove the leading '[' and the problem goes away. Looks like regex-quoting that somehow misfires. ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file 2017-06-16 18:38 ` Eli Zaretskii @ 2017-06-16 19:08 ` Vincent Belaïche 2017-06-16 19:15 ` Vincent Belaïche 1 sibling, 0 replies; 21+ messages in thread From: Vincent Belaïche @ 2017-06-16 19:08 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 27391 Le 16/06/2017 à 20:38, Eli Zaretskii a écrit : >> From: vincent.belaiche@gmail.com (Vincent Belaïche) >> Cc: Vincent Belaïche <vincent.belaiche@gmail.com> >> Date: Fri, 16 Jun 2017 16:08:09 +0200 >> >> Attached is the file causing the issue. Recipe is just to visit the file >> with emacs -q, and you see that the encoding is not taken. > Your fancy comment causes this: remove the leading '[' and the problem > goes away. Looks like regex-quoting that somehow misfires. I used this type of comment marks after reading this discussion: https://stackoverflow.com/questions/4823468/comments-in-markdown V. --- L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. https://www.avast.com/antivirus ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file 2017-06-16 18:38 ` Eli Zaretskii 2017-06-16 19:08 ` Vincent Belaïche @ 2017-06-16 19:15 ` Vincent Belaïche 2017-06-16 19:31 ` Andreas Schwab 2017-06-16 19:37 ` Vincent Belaïche 1 sibling, 2 replies; 21+ messages in thread From: Vincent Belaïche @ 2017-06-16 19:15 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 27391 Le 16/06/2017 à 20:38, Eli Zaretskii a écrit : >> From: vincent.belaiche@gmail.com (Vincent Belaïche) >> Cc: Vincent Belaïche <vincent.belaiche@gmail.com> >> Date: Fri, 16 Jun 2017 16:08:09 +0200 >> >> Attached is the file causing the issue. Recipe is just to visit the file >> with emacs -q, and you see that the encoding is not taken. > Your fancy comment causes this: remove the leading '[' and the problem > goes away. Looks like regex-quoting that somehow misfires. After some investigation, it seems that the bug is in regexp-quote: (regexp-quote "[comment]: # (") outputs "^\\[comment]: # ( " instead of "^\\[comment\\]: # ( " Vincent. --- L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. https://www.avast.com/antivirus ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file 2017-06-16 19:15 ` Vincent Belaïche @ 2017-06-16 19:31 ` Andreas Schwab 2017-06-16 19:37 ` Vincent Belaïche 1 sibling, 0 replies; 21+ messages in thread From: Andreas Schwab @ 2017-06-16 19:31 UTC (permalink / raw) To: Vincent Belaïche; +Cc: 27391 On Jun 16 2017, Vincent Belaïche <vincent.belaiche@gmail.com> wrote: > After some investigation, it seems that the bug is in regexp-quote: > > (regexp-quote "[comment]: # (") > > outputs > > "^\\[comment]: # ( " > > instead of > > "^\\[comment\\]: # ( " But `]' is not special. (string-match "^\\[comment]: # ( " "[comment]: # ( ") => 0 Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file 2017-06-16 19:15 ` Vincent Belaïche 2017-06-16 19:31 ` Andreas Schwab @ 2017-06-16 19:37 ` Vincent Belaïche 1 sibling, 0 replies; 21+ messages in thread From: Vincent Belaïche @ 2017-06-16 19:37 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 27391 Le 16/06/2017 à 21:15, Vincent Belaïche a écrit : > > > Le 16/06/2017 à 20:38, Eli Zaretskii a écrit : >>> From: vincent.belaiche@gmail.com (Vincent Belaïche) >>> Cc: Vincent Belaïche <vincent.belaiche@gmail.com> >>> Date: Fri, 16 Jun 2017 16:08:09 +0200 >>> >>> Attached is the file causing the issue. Recipe is just to visit the >>> file >>> with emacs -q, and you see that the encoding is not taken. >> Your fancy comment causes this: remove the leading '[' and the problem >> goes away. Looks like regex-quoting that somehow misfires. > > After some investigation, it seems that the bug is in regexp-quote: > > (regexp-quote "[comment]: # (") > > outputs > > "^\\[comment]: # ( " > > instead of > > "^\\[comment\\]: # ( " > > > Vincent. > > After some more investigation, I think that the bug is in function insert-file-contents of fileio.c which is the one that decide and sets the coding system well before the other local variables are looked into. --- L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. https://www.avast.com/antivirus ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file 2017-06-16 10:00 bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file Vincent Belaïche 2017-06-16 12:59 ` Eli Zaretskii 2017-06-16 14:08 ` Vincent Belaïche @ 2017-06-16 21:27 ` Vincent Belaïche 2017-06-16 21:34 ` Philipp Stephani 2017-06-16 22:09 ` Vincent Belaïche 3 siblings, 1 reply; 21+ messages in thread From: Vincent Belaïche @ 2017-06-16 21:27 UTC (permalink / raw) To: Eli Zaretskii, 27391; +Cc: Vincent Belaïche Le 16/06/2017 à 21:37, Vincent Belaïche a écrit : > > > Le 16/06/2017 à 21:15, Vincent Belaïche a écrit : >> [...] >> >> > After some more investigation, I think that the bug is in function > insert-file-contents of fileio.c which is the one that decide and sets > the coding system well before the other local variables are looked into. After some more investigation, in the end the find-auto-coding of mule.el is what is called to detect the coding. This function calls some re-coding regexp. Here is a test function defining the same regexp. (defun doit () (interactive) (let* ((prefix (regexp-quote "[comment]: # (")) (suffix (regexp-quote ")")) (re-coding (concat "[\r\n]" prefix ;; N.B. without the \n below, the regexp can ;; eat newlines. "[ \t]*coding[ \t]*:[ \t]*\\([^ \t\r\n]+\\)[ \t]*" suffix "[\r\n]"))) (message (if (looking-at re-coding) "ok" "nak")))) I tried it with point at end of line [comment]: # ( Local Variables: ) and it answered "ok". Now I defined this with re-search-forward instead of looking-at: (defun doit () (interactive) (let* ((prefix (regexp-quote "[comment]: # (")) (suffix (regexp-quote ")")) (re-coding (concat "[\r\n]" prefix ;; N.B. without the \n below, the regexp can ;; eat newlines. "[ \t]*coding[ \t]*:[ \t]*\\([^ \t\r\n]+\\)[ \t]*" suffix "[\r\n]"))) (message (if (re-search-forward re-coding nil t) "ok" "nak")))) I placed the point before the coding: line, and I also got answer "ok" So I don't think that the regexp as such is to blame. Something else seems to happen. It is too late now, I need to go to bed... Vincent. --- L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. https://www.avast.com/antivirus ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file 2017-06-16 21:27 ` Vincent Belaïche @ 2017-06-16 21:34 ` Philipp Stephani 2017-06-16 21:39 ` Philipp Stephani 0 siblings, 1 reply; 21+ messages in thread From: Philipp Stephani @ 2017-06-16 21:34 UTC (permalink / raw) To: Vincent Belaïche, Eli Zaretskii, 27391 [-- Attachment #1: Type: text/plain, Size: 2417 bytes --] Vincent Belaïche <vincent.belaiche@gmail.com> schrieb am Fr., 16. Juni 2017 um 23:28 Uhr: > > > Le 16/06/2017 à 21:37, Vincent Belaïche a écrit : > > > > > > Le 16/06/2017 à 21:15, Vincent Belaïche a écrit : > >> > > [...] > > >> > >> > > After some more investigation, I think that the bug is in function > > insert-file-contents of fileio.c which is the one that decide and sets > > the coding system well before the other local variables are looked into. > > After some more investigation, in the end the find-auto-coding of > mule.el is what is called to detect the coding. This function calls some > re-coding regexp. > > Here is a test function defining the same regexp. > > > (defun doit () > (interactive) > (let* ((prefix (regexp-quote "[comment]: # (")) > (suffix (regexp-quote ")")) > (re-coding > (concat > "[\r\n]" prefix > ;; N.B. without the \n below, the regexp can > ;; eat newlines. > "[ \t]*coding[ \t]*:[ \t]*\\([^ \t\r\n]+\\)[ \t]*" > suffix "[\r\n]"))) > (message (if (looking-at re-coding) "ok" "nak")))) > > I tried it with point at end of line > > [comment]: # ( Local Variables: ) > > and it answered "ok". Now I defined this with re-search-forward instead > of looking-at: > > (defun doit () > (interactive) > (let* ((prefix (regexp-quote "[comment]: # (")) > (suffix (regexp-quote ")")) > (re-coding > (concat > "[\r\n]" prefix > ;; N.B. without the \n below, the regexp can > ;; eat newlines. > "[ \t]*coding[ \t]*:[ \t]*\\([^ \t\r\n]+\\)[ \t]*" > suffix "[\r\n]"))) > (message (if (re-search-forward re-coding nil t) "ok" "nak")))) > > I placed the point before the coding: line, and I also got answer "ok" > > So I don't think that the regexp as such is to blame. Something else > seems to happen. It is too late now, I need to go to bed... > > Vincent. > > I think it's actually the regexp that searches for "Local Variables". The following minimal example fails for me: (with-temp-buffer (insert " [comment]: # ( Local Variables: ) [comment]: # ( coding: utf-8 ) [comment]: # ( End: ) ") (goto-char (point-min)) (re-search-forward "[\r\n]\\([^[\r\n]*\\)[ \t]*Local Variables:[ \t]*\\([^\r\n]*\\)[\r\n]")) [-- Attachment #2: Type: text/html, Size: 3433 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file 2017-06-16 21:34 ` Philipp Stephani @ 2017-06-16 21:39 ` Philipp Stephani 2017-06-16 21:52 ` Philipp Stephani 0 siblings, 1 reply; 21+ messages in thread From: Philipp Stephani @ 2017-06-16 21:39 UTC (permalink / raw) To: Vincent Belaïche, Eli Zaretskii, 27391 [-- Attachment #1: Type: text/plain, Size: 2738 bytes --] Philipp Stephani <p.stephani2@gmail.com> schrieb am Fr., 16. Juni 2017 um 23:34 Uhr: > Vincent Belaïche <vincent.belaiche@gmail.com> schrieb am Fr., 16. Juni > 2017 um 23:28 Uhr: > >> >> >> Le 16/06/2017 à 21:37, Vincent Belaïche a écrit : >> > >> > >> > Le 16/06/2017 à 21:15, Vincent Belaïche a écrit : >> >> >> >> [...] >> >> >> >> >> >> > After some more investigation, I think that the bug is in function >> > insert-file-contents of fileio.c which is the one that decide and sets >> > the coding system well before the other local variables are looked into. >> >> After some more investigation, in the end the find-auto-coding of >> mule.el is what is called to detect the coding. This function calls some >> re-coding regexp. >> >> Here is a test function defining the same regexp. >> >> >> (defun doit () >> (interactive) >> (let* ((prefix (regexp-quote "[comment]: # (")) >> (suffix (regexp-quote ")")) >> (re-coding >> (concat >> "[\r\n]" prefix >> ;; N.B. without the \n below, the regexp can >> ;; eat newlines. >> "[ \t]*coding[ \t]*:[ \t]*\\([^ \t\r\n]+\\)[ \t]*" >> suffix "[\r\n]"))) >> (message (if (looking-at re-coding) "ok" "nak")))) >> >> I tried it with point at end of line >> >> [comment]: # ( Local Variables: ) >> >> and it answered "ok". Now I defined this with re-search-forward instead >> of looking-at: >> >> (defun doit () >> (interactive) >> (let* ((prefix (regexp-quote "[comment]: # (")) >> (suffix (regexp-quote ")")) >> (re-coding >> (concat >> "[\r\n]" prefix >> ;; N.B. without the \n below, the regexp can >> ;; eat newlines. >> "[ \t]*coding[ \t]*:[ \t]*\\([^ \t\r\n]+\\)[ \t]*" >> suffix "[\r\n]"))) >> (message (if (re-search-forward re-coding nil t) "ok" "nak")))) >> >> I placed the point before the coding: line, and I also got answer "ok" >> >> So I don't think that the regexp as such is to blame. Something else >> seems to happen. It is too late now, I need to go to bed... >> >> Vincent. >> >> > I think it's actually the regexp that searches for "Local Variables". The > following minimal example fails for me: > > (with-temp-buffer > (insert " > > [comment]: # ( Local Variables: ) > [comment]: # ( coding: utf-8 ) > [comment]: # ( End: ) > > ") > (goto-char (point-min)) > (re-search-forward > "[\r\n]\\([^[\r\n]*\\)[ \t]*Local Variables:[ \t]*\\([^\r\n]*\\)[\r\n]")) > > Does anybody know why the second character range says [^[\r\n] instead of [^\r\n]? This seems to explicitly exclude a leading [. [-- Attachment #2: Type: text/html, Size: 4161 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file 2017-06-16 21:39 ` Philipp Stephani @ 2017-06-16 21:52 ` Philipp Stephani 0 siblings, 0 replies; 21+ messages in thread From: Philipp Stephani @ 2017-06-16 21:52 UTC (permalink / raw) To: Vincent Belaïche, Eli Zaretskii, 27391 [-- Attachment #1.1: Type: text/plain, Size: 2970 bytes --] Philipp Stephani <p.stephani2@gmail.com> schrieb am Fr., 16. Juni 2017 um 23:39 Uhr: > Philipp Stephani <p.stephani2@gmail.com> schrieb am Fr., 16. Juni 2017 um > 23:34 Uhr: > >> Vincent Belaïche <vincent.belaiche@gmail.com> schrieb am Fr., 16. Juni >> 2017 um 23:28 Uhr: >> >>> >>> >>> Le 16/06/2017 à 21:37, Vincent Belaïche a écrit : >>> > >>> > >>> > Le 16/06/2017 à 21:15, Vincent Belaïche a écrit : >>> >> >>> >>> [...] >>> >>> >> >>> >> >>> > After some more investigation, I think that the bug is in function >>> > insert-file-contents of fileio.c which is the one that decide and sets >>> > the coding system well before the other local variables are looked >>> into. >>> >>> After some more investigation, in the end the find-auto-coding of >>> mule.el is what is called to detect the coding. This function calls some >>> re-coding regexp. >>> >>> Here is a test function defining the same regexp. >>> >>> >>> (defun doit () >>> (interactive) >>> (let* ((prefix (regexp-quote "[comment]: # (")) >>> (suffix (regexp-quote ")")) >>> (re-coding >>> (concat >>> "[\r\n]" prefix >>> ;; N.B. without the \n below, the regexp can >>> ;; eat newlines. >>> "[ \t]*coding[ \t]*:[ \t]*\\([^ \t\r\n]+\\)[ \t]*" >>> suffix "[\r\n]"))) >>> (message (if (looking-at re-coding) "ok" "nak")))) >>> >>> I tried it with point at end of line >>> >>> [comment]: # ( Local Variables: ) >>> >>> and it answered "ok". Now I defined this with re-search-forward instead >>> of looking-at: >>> >>> (defun doit () >>> (interactive) >>> (let* ((prefix (regexp-quote "[comment]: # (")) >>> (suffix (regexp-quote ")")) >>> (re-coding >>> (concat >>> "[\r\n]" prefix >>> ;; N.B. without the \n below, the regexp can >>> ;; eat newlines. >>> "[ \t]*coding[ \t]*:[ \t]*\\([^ \t\r\n]+\\)[ \t]*" >>> suffix "[\r\n]"))) >>> (message (if (re-search-forward re-coding nil t) "ok" "nak")))) >>> >>> I placed the point before the coding: line, and I also got answer "ok" >>> >>> So I don't think that the regexp as such is to blame. Something else >>> seems to happen. It is too late now, I need to go to bed... >>> >>> Vincent. >>> >>> >> I think it's actually the regexp that searches for "Local Variables". The >> following minimal example fails for me: >> >> (with-temp-buffer >> (insert " >> >> [comment]: # ( Local Variables: ) >> [comment]: # ( coding: utf-8 ) >> [comment]: # ( End: ) >> >> ") >> (goto-char (point-min)) >> (re-search-forward >> "[\r\n]\\([^[\r\n]*\\)[ \t]*Local Variables:[ \t]*\\([^\r\n]*\\)[\r\n]")) >> >> > Does anybody know why the second character range says [^[\r\n] instead of > [^\r\n]? This seems to explicitly exclude a leading [. > If this is a typo, then here's a patch. [-- Attachment #1.2: Type: text/html, Size: 4668 bytes --] [-- Attachment #2: 0001-Allow-local-variables-section-to-begin-with-a-square-b.txt --] [-- Type: text/plain, Size: 3003 bytes --] From 2f8bbf7e729ea09addf0d066861a3b38c312141d Mon Sep 17 00:00:00 2001 From: Philipp Stephani <phst@google.com> Date: Fri, 16 Jun 2017 23:49:09 +0200 Subject: [PATCH] Allow local variables section to begin with a square bracket Fixes Bug#27391. * lisp/international/mule.el (find-auto-coding): Fix regular expression for "Local Variables" section. * test/lisp/international/mule-tests.el (find-auto-coding--bug27391): Add unit test. --- lisp/international/mule.el | 2 +- test/lisp/international/mule-tests.el | 39 +++++++++++++++++++++++++++++++++++ 2 files changed, 40 insertions(+), 1 deletion(-) create mode 100644 test/lisp/international/mule-tests.el diff --git a/lisp/international/mule.el b/lisp/international/mule.el index fa3ad80e2f..6cfb7e6d45 100644 --- a/lisp/international/mule.el +++ b/lisp/international/mule.el @@ -1970,7 +1970,7 @@ find-auto-coding (goto-char tail-start) (re-search-forward "[\r\n]\^L" tail-end t) (if (re-search-forward - "[\r\n]\\([^[\r\n]*\\)[ \t]*Local Variables:[ \t]*\\([^\r\n]*\\)[\r\n]" + "[\r\n]\\([^\r\n]*\\)[ \t]*Local Variables:[ \t]*\\([^\r\n]*\\)[\r\n]" tail-end t) ;; The prefix is what comes before "local variables:" in its ;; line. The suffix is what comes after "local variables:" diff --git a/test/lisp/international/mule-tests.el b/test/lisp/international/mule-tests.el new file mode 100644 index 0000000000..084f609c45 --- /dev/null +++ b/test/lisp/international/mule-tests.el @@ -0,0 +1,39 @@ +;;; mule-tests.el --- unit tests for mule.el -*- lexical-binding: t; -*- + +;; Copyright (C) 2017 Free Software Foundation, Inc. + +;; This file is part of GNU Emacs. + +;; GNU Emacs is free software: you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation, either version 3 of the License, or +;; (at your option) any later version. + +;; GNU Emacs is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. + +;; You should have received a copy of the GNU General Public License +;; along with GNU Emacs. If not, see <http://www.gnu.org/licenses/>. + +;;; Commentary: + +;; Unit tests for lisp/international/mule.el. + +;;; Code: + +(ert-deftest find-auto-coding--bug27391 () + "Check that Bug#27391 is fixed." + (with-temp-buffer + (insert "\n[comment]: # ( Local Variables: )\n" + "[comment]: # ( coding: utf-8 )\n" + "[comment]: # ( End: )n") + (goto-char (point-min)) + (should (equal (let ((auto-coding-alist ()) + (auto-coding-regexp-alist ()) + (auto-coding-functions ())) + (find-auto-coding "" (buffer-size))) + '(utf-8 . :coding))))) + +;;; mule-tests.el ends here -- 2.13.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file 2017-06-16 10:00 bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file Vincent Belaïche ` (2 preceding siblings ...) 2017-06-16 21:27 ` Vincent Belaïche @ 2017-06-16 22:09 ` Vincent Belaïche 2017-06-16 22:23 ` Vincent Belaïche 3 siblings, 1 reply; 21+ messages in thread From: Vincent Belaïche @ 2017-06-16 22:09 UTC (permalink / raw) To: Eli Zaretskii, 27391; +Cc: Vincent Belaïche Le 16/06/2017 à 21:37, Vincent Belaïche a écrit : > > > Le 16/06/2017 à 21:15, Vincent Belaïche a écrit : >> [...] >> >> > After some more investigation, I think that the bug is in function > insert-file-contents of fileio.c which is the one that decide and sets > the coding system well before the other local variables are looked into. I have located the bug. After some more investigation, in the end the find-auto-coding of mule.el is what is called to detect the coding. This function evaluates this expression to find the local variables: (re-search-forward "[\r\n]\\([^[\r\n]*\\)[ \t]*Local Variables:[ \t]*\\([^\r\n]*\\)[\r\n]" tail-end t) This expression evaluates to nil over file CONTRIBUTING.md I can make a simple fix if you tell me on which branch to do it. However I think that the root of the problem is poor code factorization of local variable parsing between mule.el and file.el. A better, more futureproof fix would be some unique local variable parser with some input constrain telling what sort of setting are sought. The output of the parse could be used in file.el and mule.el. Vincent. --- L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. https://www.avast.com/antivirus ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file 2017-06-16 22:09 ` Vincent Belaïche @ 2017-06-16 22:23 ` Vincent Belaïche 2017-06-17 5:45 ` Vincent Belaïche 2017-06-17 14:15 ` Philipp Stephani 0 siblings, 2 replies; 21+ messages in thread From: Vincent Belaïche @ 2017-06-16 22:23 UTC (permalink / raw) To: Eli Zaretskii, 27391, p.stephani2 Le 17/06/2017 à 00:09, Vincent Belaïche a écrit : > > Le 16/06/2017 à 21:37, Vincent Belaïche a écrit : >> >> Le 16/06/2017 à 21:15, Vincent Belaïche a écrit : > [...] > >>> >> After some more investigation, I think that the bug is in function >> insert-file-contents of fileio.c which is the one that decide and sets >> the coding system well before the other local variables are looked into. > I have located the bug. > > After some more investigation, in the end the find-auto-coding of > mule.el is what is called to detect the coding. > > This function evaluates this expression to find the local variables: > > (re-search-forward > "[\r\n]\\([^[\r\n]*\\)[ \t]*Local Variables:[ \t]*\\([^\r\n]*\\)[\r\n]" > tail-end t) > > This expression evaluates to nil over file CONTRIBUTING.md > > I can make a simple fix if you tell me on which branch to do it. > > However I think that the root of the problem is poor code factorization > of local variable parsing between mule.el and file.el. A better, more > futureproof fix would be some unique local variable parser with some > input constrain telling what sort of setting are sought. The output of > the parse could be used in file.el and mule.el. > > Vincent. > > Ooops... my lengthy email of T23:34 was unwantedly sent. A shorter version with only the conclusion and w/o all the details of my investigation is above. Anyway, Philipp's patch is what I had in mind as a quick fix. Although I don't think that this is a good solution not to factorize code when possible. Factorizing makes it more maintainable. V. --- L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. https://www.avast.com/antivirus ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file 2017-06-16 22:23 ` Vincent Belaïche @ 2017-06-17 5:45 ` Vincent Belaïche 2017-06-17 14:30 ` Philipp Stephani 2017-06-17 14:15 ` Philipp Stephani 1 sibling, 1 reply; 21+ messages in thread From: Vincent Belaïche @ 2017-06-17 5:45 UTC (permalink / raw) To: Eli Zaretskii, 27391, p.stephani2; +Cc: Vincent Belaïche Le 17/06/2017 à 00:23, Vincent Belaïche a écrit : > > > Le 17/06/2017 à 00:09, Vincent Belaïche a écrit : >> >> Le 16/06/2017 à 21:37, Vincent Belaïche a écrit : >>> >>> Le 16/06/2017 à 21:15, Vincent Belaïche a écrit : >> [...] >> >>>> >>> After some more investigation, I think that the bug is in function >>> insert-file-contents of fileio.c which is the one that decide and sets >>> the coding system well before the other local variables are looked into. >> I have located the bug. >> >> After some more investigation, in the end the find-auto-coding of >> mule.el is what is called to detect the coding. >> >> This function evaluates this expression to find the local variables: >> >> (re-search-forward >> "[\r\n]\\([^[\r\n]*\\)[ \t]*Local Variables:[ \t]*\\([^\r\n]*\\)[\r\n]" >> tail-end t) >> >> This expression evaluates to nil over file CONTRIBUTING.md >> >> I can make a simple fix if you tell me on which branch to do it. >> >> However I think that the root of the problem is poor code factorization >> of local variable parsing between mule.el and file.el. A better, more >> futureproof fix would be some unique local variable parser with some >> input constrain telling what sort of setting are sought. The output of >> the parse could be used in file.el and mule.el. >> >> Vincent. >> >> > Ooops... my lengthy email of T23:34 was unwantedly sent. A shorter > version with only the conclusion and w/o all the details of my > investigation is above. > > Anyway, Philipp's patch is what I had in mind as a quick fix. Although I > don't think that this is a good solution not to factorize code when > possible. Factorizing makes it more maintainable. > > V. Just to mention the following points noted by me when comparing the code in find-auto-coding and in hack-local-variables: * In hack-local-variables the tailing local variables section is considered to be at max 3000 characters from eob, while in find-auto-coding it is considered to be 3072. The « correct » figure should be 3072, not 3000, for consistency with « 1024 * 3 » code in function Finsert_file_contents of fileio.c : if (nread == 1024) { int ntail; if (lseek (fd, - (1024 * 3), SEEK_END) < 0) report_file_error ("Setting file position", orig_filename); ntail = emacs_read_quit (fd, read_buf + nread, 1024 * 3); nread = ntail < 0 ? ntail : nread + ntail; } Maybe the exact value should be in some constant. * In find-auto-coding there is no such thing as regexp operator "^" (for bol) or "$" (for eol) used, instead there is "[\r\n]". I suspect that this is because at this stage the coding system is not yet set, and therefore there is no such thing as bol or eol, the whole buffer is a single line. If as such, I withdraw my previous statement that code factorization is desirable. * In both cases what is sought for is the *FIRST* occurrence searched *FORWARD* of case sensitive string "Local Variables:" in the buffer tailing 3000--3072 characters. I think that this is a problem and that either we should search it *BACKWARD* or after finding the 1st occurrence, possible subsequent occurrences should be searched for, and the last occurrence should be considered instead. I say this because with emacs-template package it is possible that the template file has some local variables in the template definition section that differ from that of template itself. See (info "(template) DefSect") For instance the end of the template file would be as follow: --8<----8<----8<----8<----8<-- begin -->8---->8---->8---->8---->8---- ... blah blah blah template content ... // Local Variables: // toto: "tata" // End: >>>TEMPLATE-DEFINITION-SECTION<<< ... blah blah blah Lisp Template rules ... ;; Local Variables: ;; foo: "bar" ;; End: --8<----8<----8<----8<----8<-- end -->8---->8---->8---->8---->8---- Maybe preventing the [ character in the prefix string is not a typo but was some intentional design to allow preventing false detection of the local variable section. I strongly recommend that before doing any fix, somebody dig in file history to find when and *WHY* this [ preventing has been introduced --- sorry, but I do not volunteer for this tedious/time consuming kind of work... Vincent. --- L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. https://www.avast.com/antivirus ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file 2017-06-17 5:45 ` Vincent Belaïche @ 2017-06-17 14:30 ` Philipp Stephani 2017-06-19 10:51 ` Vincent Belaïche 0 siblings, 1 reply; 21+ messages in thread From: Philipp Stephani @ 2017-06-17 14:30 UTC (permalink / raw) To: Vincent Belaïche, Eli Zaretskii, 27391 [-- Attachment #1: Type: text/plain, Size: 2173 bytes --] Vincent Belaïche <vincent.belaiche@gmail.com> schrieb am Sa., 17. Juni 2017 um 07:45 Uhr: > > > Le 17/06/2017 à 00:23, Vincent Belaïche a écrit : > > > > > * In find-auto-coding there is no such thing as regexp operator "^" (for > bol) or "$" (for eol) used, instead there is "[\r\n]". I suspect that > this is because at this stage the coding system is not yet set, and > therefore there is no such thing as bol or eol, the whole buffer is a > single line. If as such, I withdraw my previous statement that code > factorization is desirable. > Why? It's a small variant that should be distinguishable using a parameter to a shared function, such as: enum file_local_flags { file_local_flag_default = 0x0, file_local_flag_use_bol_eol = 0x1, file_local_flag_search_trailer = 0x2, }; Lisp_Object get_file_local_variable_value (Lisp_Object name, enum file_local_flags flags); > > > * In both cases what is sought for is the *FIRST* occurrence searched > *FORWARD* of case sensitive string "Local Variables:" in the buffer > tailing 3000--3072 characters. I think that this is a problem and that > either we should search it *BACKWARD* or after finding the 1st > occurrence, possible subsequent occurrences should be searched for, > and the last occurrence should be considered instead. > Yes, that would be consistent with normal file-local variables. > > Maybe preventing the [ character in the prefix string is not a typo > but was some intentional design to allow preventing false detection of > the local variable section. I strongly recommend that before doing any > fix, somebody dig in file history to find when and *WHY* this [ > preventing has been introduced --- sorry, but I do not volunteer for > this tedious/time consuming kind of work... > > With git-blame it's not really tedious. Commit 6b61353c0a0320ee15bb6488149735381fed62ec replaced ^\\(.*\\)[ \t]* with [\r\n]\\([^[\r\n]*\\)[ \t]*, so I think it's almost certain this is a typo (the previous regex didn't exclude the [ either). Anyway, if people want this to stay, they should have added a comment. [-- Attachment #2: Type: text/html, Size: 2933 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file 2017-06-17 14:30 ` Philipp Stephani @ 2017-06-19 10:51 ` Vincent Belaïche 2017-06-26 11:39 ` Philipp Stephani 0 siblings, 1 reply; 21+ messages in thread From: Vincent Belaïche @ 2017-06-19 10:51 UTC (permalink / raw) To: Philipp Stephani, 27391; +Cc: Vincent Belaïche [...] > > With git-blame it's not really tedious. Commit > 6b61353c0a0320ee15bb6488149735381fed62ec replaced ^\\(.*\\)[ \t]* with > [\r\n]\\([^[\r\n]*\\)[ \t]*, so I think it's almost certain this is a > typo (the previous regex didn't exclude the [ either). Anyway, if > people want this to stay, they should have added a comment. Thank you, I had a look at Wikipedia for the QWERTY keyboard layout (I have a French keyboard and the layout is somehow different for \ and ]). Modern QWERTY layout is as follows: 1 2 3 4 5 6 7 8 9 0 - = Q W E R T Y U I O P [ ] \ A S D F G H J K L ; ' Z X C V B N M , . / So ] is just next to \. So, yes, definitely this is a typo, the author had too big a finger when hitting \. Concerning factorization, couldn't one use [\n\r] in all cases rather than a switch based on some input argument ? I was also wondering whether it is not possible to have a single regexp for the whole Local Variable section. The following `doit' function is a trial to do so. `M-x doit' will seach forward the whole Local Variables section and display "ok" if found, "nak" otherwise. (defun doit () (interactive) (let* ((eol "\\(\r\n?\\|\n\\)") (eol-again "\\1") (space-maybe "[ \t]*") ;; suffix may be the empty string (suffix "\\([^ \r\n]+\\(?:[^\r\n]*[^ \r\n]\\)?\\|\\)") (prefix "\\([ \t]*[^ \r\n]+\\(?:[^\r\n]*[^ \r\n]\\)?\\)") (prefix-again "\\2") (suffix-again "\\3") (symbol: "\\(?:\\(?:[^][()'\" \t\r\n]\\|\\\\[][()'\" \t]\\)+[ \t]*:\\)") (sexp (concat "\\(?:" (substring prefix 2)))) (message (if (and (re-search-forward (concat eol prefix space-maybe "Local Variables:" space-maybe suffix space-maybe eol-again "\\(?:" prefix space-maybe symbol: sexp space-maybe suffix-again space-maybe eol-again "\\)*" prefix space-maybe "End:" space-maybe suffix space-maybe "\\(" eol-again "\\)?" ) nil t) ;; when the tailing eol is not there we must be at EOB. (or (match-string 3) (eobp))) "ok" "nak")))) Vincent. --- L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. https://www.avast.com/antivirus ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file 2017-06-19 10:51 ` Vincent Belaïche @ 2017-06-26 11:39 ` Philipp Stephani 2017-06-27 6:05 ` Vincent Belaïche 0 siblings, 1 reply; 21+ messages in thread From: Philipp Stephani @ 2017-06-26 11:39 UTC (permalink / raw) To: Vincent Belaïche, 27391 [-- Attachment #1: Type: text/plain, Size: 1966 bytes --] Vincent Belaïche <vincent.belaiche@gmail.com> schrieb am Mo., 19. Juni 2017 um 12:51 Uhr: > > Concerning factorization, couldn't one use [\n\r] in all cases rather > than a switch based on some input argument ? > It should be possible, but it slightly changes the behavior of file-local variables. I wouldn't expect anything to break though. > > I was also wondering whether it is not possible to have a single regexp > for the whole Local Variable section. The following `doit' function is a > trial to do so. `M-x doit' will seach forward the whole Local Variables > section and display "ok" if found, "nak" otherwise. > > (defun doit () > (interactive) > (let* ((eol "\\(\r\n?\\|\n\\)") > (eol-again "\\1") > (space-maybe "[ \t]*") > ;; suffix may be the empty string > (suffix "\\([^ \r\n]+\\(?:[^\r\n]*[^ \r\n]\\)?\\|\\)") > (prefix "\\([ \t]*[^ \r\n]+\\(?:[^\r\n]*[^ \r\n]\\)?\\)") > (prefix-again "\\2") > (suffix-again "\\3") > (symbol: "\\(?:\\(?:[^][()'\" \t\r\n]\\|\\\\[][()'\" \t]\\)+[ > \t]*:\\)") > (sexp (concat "\\(?:" (substring prefix 2)))) > > (message (if (and (re-search-forward > (concat eol > prefix space-maybe "Local Variables:" > space-maybe suffix space-maybe eol-again > "\\(?:" prefix space-maybe symbol: sexp > space-maybe suffix-again space-maybe eol-again "\\)*" > prefix space-maybe "End:" space-maybe suffix > space-maybe "\\(" eol-again "\\)?" > ) > nil t) > ;; when the tailing eol is not there we must be at EOB. > (or (match-string 3) (eobp))) > "ok" "nak")))) > > > Looks good. Consider using `rx' for complex regexes, in my experiences it increases readability a lot. [-- Attachment #2: Type: text/html, Size: 2829 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file 2017-06-26 11:39 ` Philipp Stephani @ 2017-06-27 6:05 ` Vincent Belaïche 0 siblings, 0 replies; 21+ messages in thread From: Vincent Belaïche @ 2017-06-27 6:05 UTC (permalink / raw) To: Philipp Stephani, 27391; +Cc: Vincent Belaïche My answers inserted below. Le 26/06/2017 à 13:39, Philipp Stephani a écrit : > > > Vincent Belaïche <vincent.belaiche@gmail.com> schrieb am Mo., 19. Juni 2017 um 12:51 Uhr: > > > Concerning factorization, couldn't one use [\n\r] in all cases > rather than a switch based on some input argument ? > > > It should be possible, but it slightly changes the behavior of > file-local variables. I wouldn't expect anything to break though. > > Sorry, I can't understand why there should be any slight change in the current behaviour. BTW, as in the doit function given below what I had in mind was some "\\(\r\n?\\|\n\\)" construct rather than a plain "[\r\n]", so it consistently matches CR (as one some Apple computers), CR-LF (as on MSW) and LF. > > I was also wondering whether it is not possible to have a single regexp > for the whole Local Variable section. The following `doit' function is a > trial to do so. `M-x doit' will seach forward the whole Local Variables > section and display "ok" if found, "nak" otherwise. > > (defun doit () > (interactive) > (let* ((eol "\\(\r\n?\\|\n\\)") > (eol-again "\\1") > (space-maybe "[ \t]*") > ;; suffix may be the empty string > (suffix "\\([^ \r\n]+\\(?:[^\r\n]*[^ \r\n]\\)?\\|\\)") > (prefix "\\([ \t]*[^ \r\n]+\\(?:[^\r\n]*[^ \r\n]\\)?\\)") > (prefix-again "\\2") > (suffix-again "\\3") > (symbol: "\\(?:\\(?:[^][()'\" \t\r\n]\\|\\\\[][()'\" \t]\\)+[ \t]*:\\)") > (sexp (concat "\\(?:" (substring prefix 2)))) > > (message (if (and (re-search-forward > (concat eol > prefix space-maybe "Local Variables:" space-maybe suffix space-maybe eol-again > "\\(?:" prefix space-maybe symbol: sexp space-maybe suffix-again space-maybe eol-again "\\)*" > prefix space-maybe "End:" space-maybe suffix space-maybe "\\(" eol-again "\\)?" > ) > nil t) > ;; when the tailing eol is not there we must be at EOB. > (or (match-string 3) (eobp))) > "ok" "nak")))) > > > > Looks good. Consider using `rx' for complex regexes, in my experiences it increases readability a lot. On second thought the regexp considered above has some limitation : it would fail if the sexp is multiline. For instance the following would fail. --8<----8<----8<----8<----8<-- begin -->8---->8---->8---->8---->8---- /* Local Variables: */ /* multiline-sexp: ( "first line" "second line" ) */ /* End: */ --8<----8<----8<----8<----8<-- end -->8---->8---->8---->8---->8---- This is a regression as I think that the current code allows multiline --- well I am not 100% sure of that, I presume this just from my reading the current code. I don't know if multiline sexps in file local variables is a desirable feature, personally I have never used them. And I am not even sure either that making a regexp that matches an Elisp sexp is feasible, or sensible. It is not sensible in my opinion because any change in the Elisp reader --- like supporting bignums as we had discussed quite some day ago with Jay Belanger, maintainer of Calc --- would imply some change in this regexp. And regpexps do not support either any [:elisp-sexp:] construct that would do the job with some `read' call under the hood. Vincent. --- L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. https://www.avast.com/antivirus ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file 2017-06-16 22:23 ` Vincent Belaïche 2017-06-17 5:45 ` Vincent Belaïche @ 2017-06-17 14:15 ` Philipp Stephani 1 sibling, 0 replies; 21+ messages in thread From: Philipp Stephani @ 2017-06-17 14:15 UTC (permalink / raw) To: Vincent Belaïche, Eli Zaretskii, 27391-done [-- Attachment #1: Type: text/plain, Size: 2130 bytes --] Vincent Belaïche <vincent.belaiche@gmail.com> schrieb am Sa., 17. Juni 2017 um 00:23 Uhr: > > > Le 17/06/2017 à 00:09, Vincent Belaïche a écrit : > > > > Le 16/06/2017 à 21:37, Vincent Belaïche a écrit : > >> > >> Le 16/06/2017 à 21:15, Vincent Belaïche a écrit : > > [...] > > > >>> > >> After some more investigation, I think that the bug is in function > >> insert-file-contents of fileio.c which is the one that decide and sets > >> the coding system well before the other local variables are looked into. > > I have located the bug. > > > > After some more investigation, in the end the find-auto-coding of > > mule.el is what is called to detect the coding. > > > > This function evaluates this expression to find the local variables: > > > > (re-search-forward > > "[\r\n]\\([^[\r\n]*\\)[ \t]*Local Variables:[ > \t]*\\([^\r\n]*\\)[\r\n]" > > tail-end t) > > > > This expression evaluates to nil over file CONTRIBUTING.md > > > > I can make a simple fix if you tell me on which branch to do it. > > > > However I think that the root of the problem is poor code factorization > > of local variable parsing between mule.el and file.el. A better, more > > futureproof fix would be some unique local variable parser with some > > input constrain telling what sort of setting are sought. The output of > > the parse could be used in file.el and mule.el. > > > > Vincent. > > > > > Ooops... my lengthy email of T23:34 was unwantedly sent. A shorter > version with only the conclusion and w/o all the details of my > investigation is above. > > Anyway, Philipp's patch is what I had in mind as a quick fix. OK, I've pushed this commit as c3813b2aa8d2f5a625195fdbbfe6a01a602d7735. > Although I > don't think that this is a good solution not to factorize code when > possible. Factorizing makes it more maintainable. > Agreed. Note that there's a third place in Emacs that parses a subset of file-local variables: lread.c, to detect the lexical-binding variable when loading ELisp files. Ideally that would be merged as well. [-- Attachment #2: Type: text/html, Size: 2809 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2017-06-27 6:05 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-06-16 10:00 bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file Vincent Belaïche 2017-06-16 12:59 ` Eli Zaretskii 2017-06-16 14:08 ` Vincent Belaïche 2017-06-16 14:10 ` Vincent Belaïche 2017-06-16 18:38 ` Eli Zaretskii 2017-06-16 19:08 ` Vincent Belaïche 2017-06-16 19:15 ` Vincent Belaïche 2017-06-16 19:31 ` Andreas Schwab 2017-06-16 19:37 ` Vincent Belaïche 2017-06-16 21:27 ` Vincent Belaïche 2017-06-16 21:34 ` Philipp Stephani 2017-06-16 21:39 ` Philipp Stephani 2017-06-16 21:52 ` Philipp Stephani 2017-06-16 22:09 ` Vincent Belaïche 2017-06-16 22:23 ` Vincent Belaïche 2017-06-17 5:45 ` Vincent Belaïche 2017-06-17 14:30 ` Philipp Stephani 2017-06-19 10:51 ` Vincent Belaïche 2017-06-26 11:39 ` Philipp Stephani 2017-06-27 6:05 ` Vincent Belaïche 2017-06-17 14:15 ` Philipp Stephani
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).