unibyte buffers won't display latin-1 characters

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

* unibyte buffers won't display latin-1 characters
@ 2002-08-24 20:52 David Kuehling
  2002-08-27  1:20 ` Kenichi Handa
  0 siblings, 1 reply; 8+ messages in thread
From: David Kuehling @ 2002-08-24 20:52 UTC (permalink / raw)


Hi,

I'm trying to edit compressed files using auto-compression-mode, which
always switches the buffer to unibyte.  Unfortunately I can't get Emacs
to display latin-1 characters in unibyte buffers, although the
documentation states that this is possible.

Here's what I did:

M-x set-variable <Ret> 
unibyte-display-via-language-environment <Ret> 
t <Ret>
M-x set-language-environment <Ret>
Latin-1 <Ret>
M-x auto-compression-mode <Ret>
C-x C-f test.txt.gz <Ret>

I then tried to enter äöü, and Emacs displayed \344\366\374.

Even if this is a small problem, the question remains, why gzipped files
are opened as unibyte.  This is extremely inconvenient.  I think that
also keeps me from reading japanese info files which are (in my Debian
Woody system) by default gzipped.  Emacs only displays them properly
when I gunzip them.

The hole problem also applys to crypt++.

Could anybody please shade some light on that topic?

David Kühling

PS: please CC, I'm not subscribed..
-- 
GnuPG public key: http://user.cs.tu-berlin.de/~dvdkhlng/dk.gpg
Fingerprint: B17A DC95 D293 657B 4205  D016 7DEF 5323 C174 7D40

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: unibyte buffers won't display latin-1 characters
  2002-08-24 20:52 unibyte buffers won't display latin-1 characters David Kuehling
@ 2002-08-27  1:20 ` Kenichi Handa
  2002-08-27 19:44   ` David Kuehling
  0 siblings, 1 reply; 8+ messages in thread
From: Kenichi Handa @ 2002-08-27  1:20 UTC (permalink / raw)
  Cc: bug-gnu-emacs, karl

I included Karl <karl@gnu.org> in CC: because it seems that
the current problem is related to cyrpt++.

In article <873ct4ughi.fsf@snail.pool>, David Kuehling <dvdkhlng@gmx.de> writes:
> Even if this is a small problem, the question remains, why gzipped files
> are opened as unibyte.  This is extremely inconvenient.  I think that
> also keeps me from reading japanese info files which are (in my Debian
> Woody system) by default gzipped.  Emacs only displays them properly
> when I gunzip them.

> The hole problem also applys to crypt++.

I've just tried crypt++ Ver.2.90.  When I load it, on
reading a compressed file, it is automatically decompressed
even if auto-compression-mode is turned off.  And, in that
case, I confirmed that the file is read into a unibyte
buffer without any code conversion.

But, when I turned on auto-compression-mode, the file is
read into a multibyte buffer with normal code convesion.

Karl, the comment of cyrpt++.el says that you are the
maintainer.   Do you know how to fix this problem?

Here's a reply to the other bug.

> Unfortunately I can't get Emacs
> to display latin-1 characters in unibyte buffers, although the
> documentation states that this is possible.

For that, I've just installed the attached fix in HEAD and
RC.  Could you please try it?

---
Ken'ichi HANDA
handa@etl.go.jp


2002-08-27  Kenichi Handa  <handa@etl.go.jp>

	* xdisp.c (get_next_display_element): In unibyte case, don't use
	octal form for such eight-bit characters that can be converted to
	multibyte char.

Index: xdisp.c
===================================================================
RCS file: /cvs/emacs/src/xdisp.c,v
retrieving revision 1.777
retrieving revision 1.778
diff -u -c -r1.777 -r1.778
cvs server: conflicting specifications of output style
*** xdisp.c	22 Aug 2002 16:52:56 -0000	1.777
--- xdisp.c	27 Aug 2002 00:59:55 -0000	1.778
***************
*** 4258,4271 ****
  	     the translation.  This could easily be changed but I
  	     don't believe that it is worth doing.
  
! 	     Non-printable multibyte characters are also translated
! 	     octal form.  */
! 	  else if ((it->c < ' '
  		    && (it->area != TEXT_AREA
  			|| (it->c != '\n' && it->c != '\t')))
! 		   || (it->c >= 127
! 		       && it->len == 1)
! 		   || !CHAR_PRINTABLE_P (it->c))
  	    {
  	      /* IT->c is a control character which must be displayed
  		 either as '\003' or as `^C' where the '\\' and '^'
--- 4258,4279 ----
  	     the translation.  This could easily be changed but I
  	     don't believe that it is worth doing.
  
! 	     If it->multibyte_p is nonzero, eight-bit characters and
! 	     non-printable multibyte characters are also translated to
! 	     octal form.
! 
! 	     If it->multibyte_p is zero, eight-bit characters that
! 	     don't have corresponding multibyte char code are also
! 	     translated to octal form.  */
! 	  else if (((it->c < ' ' || it->c == 127)
  		    && (it->area != TEXT_AREA
  			|| (it->c != '\n' && it->c != '\t')))
! 		   || (it->multibyte_p
! 		       ? ((it->c >= 127
! 			   && it->len == 1)
! 			  || !CHAR_PRINTABLE_P (it->c))
! 		       : (it->c >= 128
! 			  && it->c == unibyte_char_to_multibyte (it->c))))
  	    {
  	      /* IT->c is a control character which must be displayed
  		 either as '\003' or as `^C' where the '\\' and '^'

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: unibyte buffers won't display latin-1 characters
  2002-08-27  1:20 ` Kenichi Handa
@ 2002-08-27 19:44   ` David Kuehling
  0 siblings, 0 replies; 8+ messages in thread
From: David Kuehling @ 2002-08-27 19:44 UTC (permalink / raw)
  Cc: bug-gnu-emacs, karl

>>>>> "Kenichi" == Kenichi Handa <handa@etl.go.jp> writes:

> I've just tried crypt++ Ver.2.90.  When I load it, on reading a
> compressed file, it is automatically decompressed even if
> auto-compression-mode is turned off.  And, in that case, I confirmed
> that the file is read into a unibyte buffer without any code
> conversion.

> But, when I turned on auto-compression-mode, the file is read into a
> multibyte buffer with normal code convesion.

I just disabled the loading of crypt++, started emacs -q, and now
GZIPPed files load correctly, `auto-compression-mode' is much smarter
than I thought.  Sorry for the false accusation ;-)

Since in Debian Woody crypt++ is always loaded when installed,
(/etc/emacs/site-start.d/50crypt++.el), I might also file a Debian bug
report, right?

> Here's a reply to the other bug.

> For that, I've just installed the attached fix in HEAD and RC.  Could
> you please try it?

That's difficult, since I'm currently using Debian's binary package of
Emacs (GNU Emacs 21.2.1 (i386-debian-linux-gnu, X toolkit, Xaw3d scroll
bars) of 2002-03-22 on raven, modified by Debian).  Well, I'll to
install, patch and compile the source package tomorrow...

Thanks for your help,

David Kühling.
-- 
GnuPG public key: http://user.cs.tu-berlin.de/~dvdkhlng/dk.gpg
Fingerprint: B17A DC95 D293 657B 4205  D016 7DEF 5323 C174 7D40

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: unibyte buffers won't display latin-1 characters
@ 2002-08-29 16:28 Karl Berry
  2002-08-29 17:47 ` David Kuehling
  2002-08-30 19:18 ` Richard Stallman
  0 siblings, 2 replies; 8+ messages in thread
From: Karl Berry @ 2002-08-29 16:28 UTC (permalink / raw)
  Cc: dvdkhlng, bug-gnu-emacs

    Karl, the comment of cyrpt++.el says that you are the maintainer.   

Although I've been looking for a replacement for years :).

    Do you know how to fix this problem?

Unfortunately I do not.  Since compressed (or encrypted files) are
arbitrary binary data, I don't know any way to read them except with
'no-conversion, which I assume is what stops the translation into a
multibyte buffer.  

I got bug reports about random failures for quite a while before I
realized emacs had not maintained backward compatibility on
reading/writing files and additional settings had to be made now.  So
the 'no-conversion stuff is necessary in that respect.  When I looked at
the emacs code a while ago, it had a bunch of random heuristics about
determining whether a file was binary or some multibyte coding system,
and those heuristics were failing.  (I reported a bug about this at that
time.)

Anyway, at this point, I suggest that crypt++ not be loaded by default
and probably not be used at all.  The builtin (un)compression and
line-ending support in emacs should suffice.

Thanks,
k

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: unibyte buffers won't display latin-1 characters
  2002-08-29 16:28 Karl Berry
@ 2002-08-29 17:47 ` David Kuehling
  2002-08-30 19:18 ` Richard Stallman
  1 sibling, 0 replies; 8+ messages in thread
From: David Kuehling @ 2002-08-29 17:47 UTC (permalink / raw)
  Cc: handa, bug-gnu-emacs

>>>>> "Karl" == Karl Berry <karl@freefriends.org> writes:

> Unfortunately I do not.  Since compressed (or encrypted files) are
> arbitrary binary data, I don't know any way to read them except with
> 'no-conversion, which I assume is what stops the translation into a
> multibyte buffer.

`auto-compression-mode' seems to know a way around that.  It loads the
compressed file with coding `no-conversion' (default value of
`auto-coding-alist' makes it do that), and decodes the uncompressed data
into multibyte representation, determining the coding system by calling
`auto-coding-alist-lookup' on the filename with the extension .gz etc
stripped.

Well, that's what I think after a short look at `jka-compr.el', I'm far
from understanding what actually goes on there...

> Anyway, at this point, I suggest that crypt++ not be loaded by default
> and probably not be used at all.  The builtin (un)compression and
> line-ending support in emacs should suffice.

Someone would have to implement a way to let `auto-compression-mode'
query for passwords when required.

David
-- 
GnuPG public key: http://user.cs.tu-berlin.de/~dvdkhlng/dk.gpg
Fingerprint: B17A DC95 D293 657B 4205  D016 7DEF 5323 C174 7D40

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: unibyte buffers won't display latin-1 characters
  2002-08-29 16:28 Karl Berry
  2002-08-29 17:47 ` David Kuehling
@ 2002-08-30 19:18 ` Richard Stallman
  2002-08-31  6:18   ` Eli Zaretskii
  1 sibling, 1 reply; 8+ messages in thread
From: Richard Stallman @ 2002-08-30 19:18 UTC (permalink / raw)
  Cc: handa, dvdkhlng, bug-gnu-emacs

    Unfortunately I do not.  Since compressed (or encrypted files) are
    arbitrary binary data, I don't know any way to read them except with
    'no-conversion, which I assume is what stops the translation into a
    multibyte buffer.  

You don't want conversion while reading the compressed or encrypted
file, but after uncompressing or unencrypting it, you probably at that
point want to perform conversion if, and as, Emacs would have done so
reading the same data out of a file with the corresponding name.
Is there any way to do that?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: unibyte buffers won't display latin-1 characters
  2002-08-30 19:18 ` Richard Stallman
@ 2002-08-31  6:18   ` Eli Zaretskii
  2002-09-03  6:23     ` Kenichi Handa
  0 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2002-08-31  6:18 UTC (permalink / raw)
  Cc: karl, handa, dvdkhlng, bug-gnu-emacs

> From: Richard Stallman <rms@gnu.org>
> Date: Fri, 30 Aug 2002 15:18:46 -0400
> 
> You don't want conversion while reading the compressed or encrypted
> file, but after uncompressing or unencrypting it, you probably at that
> point want to perform conversion if, and as, Emacs would have done so
> reading the same data out of a file with the corresponding name.
> Is there any way to do that?

Yes.  A Lisp program which needs to do that should invoke
detect-coding-region and then decode the text with
decode-coding-region.

(That's an oversimplification: the code should pay attention to
coding-system-for-read if non-nil, support the coding: tags, invoke
find-operation-coding-system, etc.)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: unibyte buffers won't display latin-1 characters
  2002-08-31  6:18   ` Eli Zaretskii
@ 2002-09-03  6:23     ` Kenichi Handa
  0 siblings, 0 replies; 8+ messages in thread
From: Kenichi Handa @ 2002-09-03  6:23 UTC (permalink / raw)
  Cc: rms, karl, dvdkhlng, bug-gnu-emacs

In article <2427-Sat31Aug2002091855+0300-eliz@is.elta.co.il>, "Eli Zaretskii" <eliz@is.elta.co.il> writes:
>>  You don't want conversion while reading the compressed or encrypted
>>  file, but after uncompressing or unencrypting it, you probably at that
>>  point want to perform conversion if, and as, Emacs would have done so
>>  reading the same data out of a file with the corresponding name.
>>  Is there any way to do that?

> Yes.  A Lisp program which needs to do that should invoke
> detect-coding-region and then decode the text with
> decode-coding-region.

> (That's an oversimplification: the code should pay attention to
> coding-system-for-read if non-nil, support the coding: tags, invoke
> find-operation-coding-system, etc.)

I long ago wrote the function
archive-set-buffer-as-visiting-file in arc-mode.el.  It does
almost the same thing.  It may be worth generalizing this
function so that it can be use in the similar situation.

---
Ken'ichi HANDA
handa@etl.go.jp

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2002-09-03  6:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-08-24 20:52 unibyte buffers won't display latin-1 characters David Kuehling
2002-08-27  1:20 ` Kenichi Handa
2002-08-27 19:44   ` David Kuehling
  -- strict thread matches above, loose matches on Subject: below --
2002-08-29 16:28 Karl Berry
2002-08-29 17:47 ` David Kuehling
2002-08-30 19:18 ` Richard Stallman
2002-08-31  6:18   ` Eli Zaretskii
2002-09-03  6:23     ` Kenichi Handa

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).