unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#38287: 26.3.50; filenotify.el: the Chinese file name in the event is messy code
       [not found] <aeef7181-2c14-4b93-90fd-dd71993ed3c8@Spark>
@ 2019-11-20  3:50 ` HaiJun Zhang
  2019-11-20 16:32   ` Michael Albinus
  0 siblings, 1 reply; 9+ messages in thread
From: HaiJun Zhang @ 2019-11-20  3:50 UTC (permalink / raw)
  To: 38287


[-- Attachment #1.1: Type: text/plain, Size: 3597 bytes --]

So file name comparing in the event callback of filenotify.el always fails. And there is no autorevert for this file.

[cid:D1374FE5DD8D46F9BEA64B829E25DBC4]

In GNU Emacs 26.3.50 (build 1, x86_64-apple-darwin17.7.0, NS appkit-1561.61 Version 10.13.6 (Build 17G8037))
 of 2019-10-30 built on jundeMac
Repository revision: 3ee8ee8476fef2a5e8159f7597e36e0953295ce2
Windowing system distributor ‘Apple', version 10.3.1561
Recent messages:
+++ new: 31, ( *Echo Area 1*)
t
next-line: End of buffer [2 times]
previous-line: Beginning of buffer [13 times]
next-line: End of buffer [14 times]

Configured using:
 ‘configure —with-ns '--enable-locallisppath=/Library/Application
 Support/Emacs/${version}/site-lisp:/Library/Application
 Support/Emacs/site-lisp’ --with-modules --disable-acl
 —without-makeinfo CFLAGS=-O2’

Configured features:
JPEG RSVG GLIB NOTIFY GNUTLS LIBXML2 ZLIB TOOLKIT_SCROLL_BARS NS MODULES
THREADS LCMS2

Important settings:
  value of $LANG: zh_CN.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Messages

Minor modes in effect:
  global-auto-revert-mode: t
  shell-dirtrack-mode: t
  ido-everywhere: t
  global-hl-line-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  buffer-read-only: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message rmc puny dired dired-loaddefs
format-spec rfc822 mml mml-sec epa derived epg gnus-util rmail
rmail-loaddefs mm-decode mm-bodies mm-encode mail-parse rfc2231
mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums
mm-util mail-prsvr mail-utils thingatpt help-fns radix-tree help-mode
autorevert easy-mmode filenotify subr-x map edmacro kmacro tex-mode
compile shell pcomplete comint ansi-color ring latexenc package easymenu
epg-config url-handlers url-parse auth-source cl-seq eieio eieio-core
cl-macs eieio-loaddefs password-cache url-vars windmove ido seq byte-opt
gv bytecomp byte-compile cconv cl-loaddefs cl-lib display-line-numbers
hl-line elec-pair time-date china-util tooltip eldoc electric uniquify
ediff-hook vc-hooks lisp-float-type mwheel term/ns-win ns-win
ucs-normalize mule-util term/common-win tool-bar dnd fontset image
regexp-opt fringe tabulated-list replace newcomment text-mode elisp-mode
lisp-mode prog-mode register page menu-bar rfn-eshadow isearch timer
select scroll-bar mouse jit-lock font-lock syntax facemenu font-core
term/tty-colors frame cl-generic cham georgian utf-8-lang misc-lang
vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932
hebrew greek romanian slovak czech european ethiopic indian cyrillic
chinese composite charscript charprop case-table epa-hook jka-cmpr-hook
help simple abbrev obarray minibuffer cl-preloaded nadvice loaddefs
button faces cus-face macroexp files text-properties overlay sha1 md5
base64 format env code-pages mule custom widget hashtable-print-readable
backquote threads kqueue cocoa ns lcms2 multi-tty make-network-process
emacs)

Memory information:
((conses 16 263112 15865)
 (symbols 48 22940 1)
 (miscs 40 405 241)
 (strings 32 42961 1400)
 (string-bytes 1 1130894)
 (vectors 16 40024)
 (vector-slots 8 826529 14554)
 (floats 8 63 332)
 (intervals 56 503 0)
 (buffers 992 14))


[-- Attachment #1.2: Type: text/html, Size: 6230 bytes --]

[-- Attachment #2: Attachment.png --]
[-- Type: image/png, Size: 105906 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#38287: 26.3.50; filenotify.el: the Chinese file name in the event is messy code
  2019-11-20  3:50 ` bug#38287: 26.3.50; filenotify.el: the Chinese file name in the event is messy code HaiJun Zhang
@ 2019-11-20 16:32   ` Michael Albinus
  2019-11-20 17:35     ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Albinus @ 2019-11-20 16:32 UTC (permalink / raw)
  To: HaiJun Zhang; +Cc: 38287

HaiJun Zhang <netjune@outlook.com> writes:

Hi,

> So file name comparing in the event callback of filenotify.el always
> fails. And there is no autorevert for this file.

Well, it is hard to analyse based on a .png file. Could you please
uncomment the line 93 in filenotify.el (it is a message call), and rerun
the test? There shall be debug output in the *Messages* buffer then.

> In GNU Emacs 26.3.50 (build 1, x86_64-apple-darwin17.7.0, NS
> appkit-1561.61 Version 10.13.6 (Build 17G8037))
>  of 2019-10-30 built on jundeMac
> Repository revision: 3ee8ee8476fef2a5e8159f7597e36e0953295ce2

It's a Mac. That means, kqueue is the file-notify backend.

Does the underlying file system supports utf8? Is it enabled? Maybe
there's something to convert, when getting a kevent from the system?

> Important settings:
>   value of $LANG: zh_CN.UTF-8
>   locale-coding-system: utf-8-unix

That looks OK, although I'm not sure whether the coding system shall be
utf-8-hfs or something like this.

Unfortunately, I'm not able to debug on Mac :-(

Best regards, Michael.





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#38287: 26.3.50; filenotify.el: the Chinese file name in the event is messy code
  2019-11-20 16:32   ` Michael Albinus
@ 2019-11-20 17:35     ` Eli Zaretskii
  2019-11-20 17:49       ` Michael Albinus
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2019-11-20 17:35 UTC (permalink / raw)
  To: Michael Albinus; +Cc: 38287, netjune

> From: Michael Albinus <michael.albinus@gmx.de>
> Date: Wed, 20 Nov 2019 17:32:24 +0100
> Cc: 38287@debbugs.gnu.org
> 
> Does the underlying file system supports utf8? Is it enabled? Maybe
> there's something to convert, when getting a kevent from the system?
> 
> > Important settings:
> >   value of $LANG: zh_CN.UTF-8
> >   locale-coding-system: utf-8-unix
> 
> That looks OK, although I'm not sure whether the coding system shall be
> utf-8-hfs or something like this.

The strings shown in the image are UTF-8 encoded.





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#38287: 26.3.50; filenotify.el: the Chinese file name in the event is messy code
  2019-11-20 17:35     ` Eli Zaretskii
@ 2019-11-20 17:49       ` Michael Albinus
  2019-11-20 18:25         ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Albinus @ 2019-11-20 17:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 38287, netjune

Eli Zaretskii <eliz@gnu.org> writes:

>> That looks OK, although I'm not sure whether the coding system shall be
>> utf-8-hfs or something like this.
>
> The strings shown in the image are UTF-8 encoded.

Hmm. kqueue.c is very lazy in using ENCODE_FILE, it uses it only in
kqueue-add-watch. Maybe it is missing somewhere else?

(I always fail to handle utf-8 properly, especially in C code :-( )

Best regards, Michael.





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#38287: 26.3.50; filenotify.el: the Chinese file name in the event is messy code
  2019-11-20 17:49       ` Michael Albinus
@ 2019-11-20 18:25         ` Eli Zaretskii
  2019-11-20 18:45           ` Michael Albinus
                             ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Eli Zaretskii @ 2019-11-20 18:25 UTC (permalink / raw)
  To: Michael Albinus; +Cc: 38287, netjune

> From: Michael Albinus <michael.albinus@gmx.de>
> Cc: netjune@outlook.com,  38287@debbugs.gnu.org
> Date: Wed, 20 Nov 2019 18:49:34 +0100
> 
> > The strings shown in the image are UTF-8 encoded.
> 
> Hmm. kqueue.c is very lazy in using ENCODE_FILE, it uses it only in
> kqueue-add-watch. Maybe it is missing somewhere else?

I see one potential problem: in kqueue-add-watch, you encode the file
name, but then pass it to APIs that generally expect multibyte
(i.e. un-encoded) strings, although they will also work with encoded
unibyte strings.  Moreover, you put the unibyte encoded file name into
the watch object.  Not sure if this is related to the issue at hand,
but it would be cleaner to make this change:

diff --git a/src/kqueue.c b/src/kqueue.c
index 76d7fc1..1383d7d 100644
--- a/src/kqueue.c
+++ b/src/kqueue.c
@@ -414,7 +414,7 @@ DEFUN ("kqueue-add-watch", Fkqueue_add_watch, Skqueue_add_watch, 3, 3, 0,
     }
 
   /* Open file.  */
-  file = ENCODE_FILE (file);
+  Lisp_Object encoded_file = ENCODE_FILE (file);
   oflags = O_NONBLOCK;
 #if O_EVTONLY
   oflags |= O_EVTONLY;
@@ -426,7 +426,7 @@ DEFUN ("kqueue-add-watch", Fkqueue_add_watch, Skqueue_add_watch, 3, 3, 0,
 #else
     oflags |= O_NOFOLLOW;
 #endif
-  fd = emacs_open (SSDATA (file), oflags, 0);
+  fd = emacs_open (SSDATA (encoded_file), oflags, 0);
   if (fd == -1)
     report_file_error ("File cannot be opened", file);
 
Btw, I don't think I understand the nature of the problem yet: where
were the unibyte strings shown in the report printed?  Did some Emacs
code print them, and if so, where is that code?





^ permalink raw reply related	[flat|nested] 9+ messages in thread

* bug#38287: 26.3.50; filenotify.el: the Chinese file name in the event is messy code
  2019-11-20 18:25         ` Eli Zaretskii
@ 2019-11-20 18:45           ` Michael Albinus
  2019-11-21  0:35           ` HaiJun Zhang
  2019-11-21  2:33           ` HaiJun Zhang
  2 siblings, 0 replies; 9+ messages in thread
From: Michael Albinus @ 2019-11-20 18:45 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 38287, netjune

Eli Zaretskii <eliz@gnu.org> writes:

>> Hmm. kqueue.c is very lazy in using ENCODE_FILE, it uses it only in
>> kqueue-add-watch. Maybe it is missing somewhere else?
>
> I see one potential problem: in kqueue-add-watch, you encode the file
> name, but then pass it to APIs that generally expect multibyte
> (i.e. un-encoded) strings, although they will also work with encoded
> unibyte strings.  Moreover, you put the unibyte encoded file name into
> the watch object.  Not sure if this is related to the issue at hand,
> but it would be cleaner to make this change:
>
> diff --git a/src/kqueue.c b/src/kqueue.c
> index 76d7fc1..1383d7d 100644
> --- a/src/kqueue.c
> +++ b/src/kqueue.c
> @@ -414,7 +414,7 @@ DEFUN ("kqueue-add-watch", Fkqueue_add_watch, Skqueue_add_watch, 3, 3, 0,
>      }
>
>    /* Open file.  */
> -  file = ENCODE_FILE (file);
> +  Lisp_Object encoded_file = ENCODE_FILE (file);
>    oflags = O_NONBLOCK;
>  #if O_EVTONLY
>    oflags |= O_EVTONLY;
> @@ -426,7 +426,7 @@ DEFUN ("kqueue-add-watch", Fkqueue_add_watch, Skqueue_add_watch, 3, 3, 0,
>  #else
>      oflags |= O_NOFOLLOW;
>  #endif
> -  fd = emacs_open (SSDATA (file), oflags, 0);
> +  fd = emacs_open (SSDATA (encoded_file), oflags, 0);
>    if (fd == -1)
>      report_file_error ("File cannot be opened", file);

Thanks, let's see how far we go with this.

> Btw, I don't think I understand the nature of the problem yet: where
> were the unibyte strings shown in the report printed?  Did some Emacs
> code print them, and if so, where is that code?

Same question here. Looks like the OP has added some prints to the code.

In Emacs 27.0.50, we have file-notify-debug, which does it for us when
set to t. But this is Emacs 26.3.50, that's why I have asked to activate
the relevant debug message manually.

Best regards, Michael.





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#38287: 26.3.50; filenotify.el: the Chinese file name in the event is messy code
  2019-11-20 18:25         ` Eli Zaretskii
  2019-11-20 18:45           ` Michael Albinus
@ 2019-11-21  0:35           ` HaiJun Zhang
  2019-11-21  2:33           ` HaiJun Zhang
  2 siblings, 0 replies; 9+ messages in thread
From: HaiJun Zhang @ 2019-11-21  0:35 UTC (permalink / raw)
  To: Michael Albinus, Eli Zaretskii; +Cc: 38287@debbugs.gnu.org

[-- Attachment #1: Type: text/plain, Size: 975 bytes --]

在 2019年11月21日 +0800 AM2:24,Eli Zaretskii <eliz@gnu.org>,写道:
Btw, I don't think I understand the nature of the problem yet: where
were the unibyte strings shown in the report printed? Did some Emacs
code print them, and if so, where is that code?

It’s my fault. I didn’t describe the problem clearly. I have added some debug messages to notify.el.
Auto-revert doesn’t work for many files on my machine, so I want to find the cause and added the debug messages. Finally I find that it is because the messy code.

The scenario:

  1.  Low level file event comes, there is a file name in the event which has messy code int it.
  2.  In file notify.el, it receives the event, extracts the file name in the event and compares it with the one it has stored when adding the watch. The extracted on is messy code, and the stored one is good string. They are not equal. So the event is discarded.
  3.  Then no auto-revert for the file forever.


[-- Attachment #2: Type: text/html, Size: 1565 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#38287: 26.3.50; filenotify.el: the Chinese file name in the event is messy code
  2019-11-20 18:25         ` Eli Zaretskii
  2019-11-20 18:45           ` Michael Albinus
  2019-11-21  0:35           ` HaiJun Zhang
@ 2019-11-21  2:33           ` HaiJun Zhang
  2019-11-21 14:42             ` Eli Zaretskii
  2 siblings, 1 reply; 9+ messages in thread
From: HaiJun Zhang @ 2019-11-21  2:33 UTC (permalink / raw)
  To: Michael Albinus, Eli Zaretskii; +Cc: 38287@debbugs.gnu.org

[-- Attachment #1: Type: text/plain, Size: 1827 bytes --]

在 2019年11月21日 +0800 AM2:24,Eli Zaretskii <eliz@gnu.org>,写道:
From: Michael Albinus <michael.albinus@gmx.de>
Cc: netjune@outlook.com, 38287@debbugs.gnu.org
Date: Wed, 20 Nov 2019 18:49:34 +0100

The strings shown in the image are UTF-8 encoded.

Hmm. kqueue.c is very lazy in using ENCODE_FILE, it uses it only in
kqueue-add-watch. Maybe it is missing somewhere else?

I see one potential problem: in kqueue-add-watch, you encode the file
name, but then pass it to APIs that generally expect multibyte
(i.e. un-encoded) strings, although they will also work with encoded
unibyte strings. Moreover, you put the unibyte encoded file name into
the watch object. Not sure if this is related to the issue at hand,
but it would be cleaner to make this change:

diff --git a/src/kqueue.c b/src/kqueue.c
index 76d7fc1..1383d7d 100644
--- a/src/kqueue.c
+++ b/src/kqueue.c
@@ -414,7 +414,7 @@ DEFUN ("kqueue-add-watch", Fkqueue_add_watch, Skqueue_add_watch, 3, 3, 0,
}

/* Open file. */
- file = ENCODE_FILE (file);
+ Lisp_Object encoded_file = ENCODE_FILE (file);
oflags = O_NONBLOCK;
#if O_EVTONLY
oflags |= O_EVTONLY;
@@ -426,7 +426,7 @@ DEFUN ("kqueue-add-watch", Fkqueue_add_watch, Skqueue_add_watch, 3, 3, 0,
#else
oflags |= O_NOFOLLOW;
#endif
- fd = emacs_open (SSDATA (file), oflags, 0);
+ fd = emacs_open (SSDATA (encoded_file), oflags, 0);
if (fd == -1)
report_file_error ("File cannot be opened", file);

It is fixed by your patch. Thanks.

A question:
I print the value of file and encoded_file with safe_debug_print in kqueue.c. The former is normal string. The latter is messy code. What is the encoding of encoded_file? The value of file-name-coding-system is utf-8-hfs. How much does utf-8-hfs diff with utf-8-unix? Is utf-8-hfs not really utf-8?








[-- Attachment #2: Type: text/html, Size: 2962 bytes --]

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* bug#38287: 26.3.50; filenotify.el: the Chinese file name in the event is messy code
  2019-11-21  2:33           ` HaiJun Zhang
@ 2019-11-21 14:42             ` Eli Zaretskii
  0 siblings, 0 replies; 9+ messages in thread
From: Eli Zaretskii @ 2019-11-21 14:42 UTC (permalink / raw)
  To: HaiJun Zhang; +Cc: 38287-done, michael.albinus

> From: HaiJun Zhang <netjune@outlook.com>
> CC: "38287@debbugs.gnu.org" <38287@debbugs.gnu.org>
> Date: Thu, 21 Nov 2019 02:33:42 +0000
> 
> It is fixed by your patch. Thanks. 

Thanks, I installed it.

> I print the value of file and encoded_file with safe_debug_print in kqueue.c. The former is normal string. The
> latter is messy code. What is the encoding of encoded_file? The value of file-name-coding-system is
> utf-8-hfs. How much does utf-8-hfs diff with utf-8-unix? Is utf-8-hfs not really utf-8?

encoded_file is in UTF-8 on your system.  What you perceive as "messy
code" is how Emacs displays unibyte strings, which are actually
sequences of raw bytes, not of characters.

utf-8 and utf-8-hfs are not exactly the same, but for Chinese
characters they produce the same results, because those characters
don't have decompositions.





^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-11-21 14:42 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <aeef7181-2c14-4b93-90fd-dd71993ed3c8@Spark>
2019-11-20  3:50 ` bug#38287: 26.3.50; filenotify.el: the Chinese file name in the event is messy code HaiJun Zhang
2019-11-20 16:32   ` Michael Albinus
2019-11-20 17:35     ` Eli Zaretskii
2019-11-20 17:49       ` Michael Albinus
2019-11-20 18:25         ` Eli Zaretskii
2019-11-20 18:45           ` Michael Albinus
2019-11-21  0:35           ` HaiJun Zhang
2019-11-21  2:33           ` HaiJun Zhang
2019-11-21 14:42             ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).