Intermittent problem with unencodable-char-position

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Intermittent problem with unencodable-char-position
@ 2010-04-14  4:19 Harald Hanche-Olsen
  2010-04-14  4:38 ` Harald Hanche-Olsen
  0 siblings, 1 reply; 4+ messages in thread
From: Harald Hanche-Olsen @ 2010-04-14  4:19 UTC (permalink / raw)
  To: emacs-devel

Evaluating the form

(unencodable-char-position 0 5 'iso-latin-1-unix 1 "100 Ω")

normally returns the list (4), since capital Omega is not encodable in
latin-1. However, after I have run emacs for a while, it happens that
this form begins to return nil [*]. I have no idea what triggers this
behaviour, and the only cure seems to be to quit and restart emacs.

I suspect some internal memory corruption, but if anyone here can
suggest another possible reason, I'd like to hear about it. Or if you
can think of a debugging technique that might shed some light on this,
I'll be happy to try it when it happens again. (I warn you that I run
on OS X, though, so debugging is, um, different.)

[*] I notice because attempts to save a buffer containing non-latin-1
characters with the latin-1 charset fails without the usual offer to
select a different character set. I have narrowed the problem down to
the above behaviour inside select-safe-coding-system-interactively.
(That code doesn't use the string argument, but it happens whether you
look at a string or the current buffer.)

- Harald

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Intermittent problem with unencodable-char-position
  2010-04-14  4:19 Intermittent problem with unencodable-char-position Harald Hanche-Olsen
@ 2010-04-14  4:38 ` Harald Hanche-Olsen
  2010-04-14 15:42   ` Harald Hanche-Olsen
  0 siblings, 1 reply; 4+ messages in thread
From: Harald Hanche-Olsen @ 2010-04-14  4:38 UTC (permalink / raw)
  To: emacs-devel

+ Harald Hanche-Olsen <hanche@math.ntnu.no>:

> Evaluating the form
> 
> (unencodable-char-position 0 5 'iso-latin-1-unix 1 "100 Ω")
> 
> normally returns the list (4), since capital Omega is not encodable in
> latin-1. However, after I have run emacs for a while, it happens that
> this form begins to return nil [*]. I have no idea what triggers this
> behaviour, [...]

Well, lo and behold, after sending the above mail I immediately
discovered how to trigger the problem: Sending mail does it.

I use mew and send through a TSL encrypted server; mew uses stunnel to
handle the encryption. I suppose it tweaks some global setting in the
process of doing the communication, but surely, that should not affect
the behaviour of unencodable-char-position? I have asked about this on
the mew mailing list too, but maybe this narrows it down enough to
give someone here an idea.

- Harald

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Intermittent problem with unencodable-char-position
  2010-04-14  4:38 ` Harald Hanche-Olsen
@ 2010-04-14 15:42   ` Harald Hanche-Olsen
  2010-04-14 16:11     ` Harald Hanche-Olsen
  0 siblings, 1 reply; 4+ messages in thread
From: Harald Hanche-Olsen @ 2010-04-14 15:42 UTC (permalink / raw)
  To: emacs-devel

+ Harald Hanche-Olsen <hanche@math.ntnu.no>:

> + Harald Hanche-Olsen <hanche@math.ntnu.no>:
> 
> > Evaluating the form
> > 
> > (unencodable-char-position 0 5 'iso-latin-1-unix 1 "100 Ω")
> > 
> > normally returns the list (4), since capital Omega is not encodable in
> > latin-1. However, after I have run emacs for a while, it happens that
> > this form begins to return nil [*]. I have no idea what triggers this
> > behaviour, [...]
> 
> Well, lo and behold, after sending the above mail I immediately
> discovered how to trigger the problem: Sending mail does it.

After a couple hours of debugging effort I managed to drill down to
the code in mew that triggers the problem: It is this little snippet

(apply 'set-charset-priority charset-list)

in which charset-list is a humongous list of charset names. (Included
below my signature in order to not interrupt your train of thought.)
I can undo the damage by running set-charset-priority on a much
shorter list, snipped from the head of the big one.

I have no idea why the author of mew thinks he needs to do this, but
in any case, having it influence the behaviour of
unencodable-char-position must surely be a bug? I'll submit a bug
report to that effect unless someone here jumps up and explains why it
is not a bug.

- Harald

PS. Damaging value of charset-list:

(unicode-bmp unicode iso-8859-1 ascii latin-iso8859-1 control-1
 iso-8859-2 latin-iso8859-2 iso-8859-3 latin-iso8859-3 iso-8859-4
 latin-iso8859-4 iso-8859-5 cyrillic-iso8859-5 iso-8859-6
 arabic-iso8859-6 iso-8859-7 greek-iso8859-7 iso-8859-8
 hebrew-iso8859-8 iso-8859-9 latin-iso8859-9 iso-8859-10
 latin-iso8859-10 iso-8859-11 thai-iso8859-11 iso-8859-13
 latin-iso8859-13 iso-8859-14 latin-iso8859-14 iso-8859-15
 latin-iso8859-15 iso-8859-16 latin-iso8859-16 thai-tis620 tis620-2533
 jisx0201 chinese-gb2312 chinese-gbk chinese-cns11643-1
 chinese-cns11643-2 chinese-cns11643-3 chinese-cns11643-4
 chinese-cns11643-5 chinese-cns11643-6 chinese-cns11643-7 big5
 japanese-jisx0208 japanese-jisx0208-1978 japanese-jisx0212
 japanese-jisx0213-1 japanese-jisx0213-2 japanese-jisx0213.2004-1
 cp932 korean-ksc5601 big5-hkscs cp949 viscii vscii vscii-2 koi8-r
 alternativnyj cp866 koi8-u koi8-t georgian-ps georgian-academy
 windows-1250 windows-1251 windows-1252 windows-1253 windows-1254
 windows-1255 windows-1256 windows-1257 windows-1258 next cp1125 cp437
 cp720 cp737 cp775 cp851 cp852 cp855 cp857 cp858 cp860 cp861 cp862
 cp863 cp864 cp865 cp869 cp874 unicode-smp unicode-sip unicode-ssp
 mac-roman ebcdic-us ebcdic-uk ibm1047 hp-roman8
 adobe-standard-encoding symbol ibm850 mik ptcp154 gb18030
 chinese-cns11643-15 emacs eight-bit eight-bit-control
 eight-bit-graphic latin-jisx0201 katakana-jisx0201 chinese-big5-1
 chinese-big5-2 japanese-jisx0213-a katakana-sjis cp932-2-byte
 cp949-2-byte chinese-sisheng ipa vietnamese-viscii-lower
 vietnamese-viscii-upper arabic-digit arabic-1-column arabic-2-column
 lao mule-lao indian-is13194 devanagari-cdac sanskrit-cdac
 bengali-cdac tamil-cdac telugu-cdac assamese-cdac oriya-cdac
 kannada-cdac malayalam-cdac gujarati-cdac punjabi-cdac
 devanagari-akruti bengali-akruti punjabi-akruti gujarati-akruti
 oriya-akruti tamil-akruti telugu-akruti kannada-akruti
 malayalam-akruti indian-glyph indian-1-column indian-2-column tibetan
 tibetan-1-column mule-unicode-2500-33ff mule-unicode-e000-ffff
 mule-unicode-0100-24ff ethiopic gb18030-2-byte gb18030-4-byte-bmp
 gb18030-4-byte-smp gb18030-4-byte-ext-1 gb18030-4-byte-ext-2)




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Intermittent problem with unencodable-char-position
  2010-04-14 15:42   ` Harald Hanche-Olsen
@ 2010-04-14 16:11     ` Harald Hanche-Olsen
  0 siblings, 0 replies; 4+ messages in thread
From: Harald Hanche-Olsen @ 2010-04-14 16:11 UTC (permalink / raw)
  To: emacs-devel

My simplest way to show the bug yet:

(list
 (unencodable-char-position 0 5 'iso-latin-1-unix 1 "100 Ω")
 (progn (apply 'set-charset-priority (charset-priority-list))
	(unencodable-char-position 0 5 'iso-latin-1-unix 1 "100 Ω"))
 (progn (apply 'set-charset-priority (list (charset-priority-list t)))
	(unencodable-char-position 0 5 'iso-latin-1-unix 1 "100 Ω")))
=> ((4) nil (4)) ; the middle nil is wrong

I am submitting a bug report.

- Harald




^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-04-14 16:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-14  4:19 Intermittent problem with unencodable-char-position Harald Hanche-Olsen
2010-04-14  4:38 ` Harald Hanche-Olsen
2010-04-14 15:42   ` Harald Hanche-Olsen
2010-04-14 16:11     ` Harald Hanche-Olsen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).