Using utf-8-auto as a process coding system

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Using utf-8-auto as a process coding system
@ 2014-07-21 13:15 Bozhidar Batsov
  2014-07-21 13:25 ` Andreas Schwab
  0 siblings, 1 reply; 7+ messages in thread
From: Bozhidar Batsov @ 2014-07-21 13:15 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 795 bytes --]

Hi guys,

I was wondering if someone can explain the rationale behind the utf-8-auto coding system that’s available in Emacs.
Is this something that will spare you from explicitly checking the underlying OS to determine whether you need to use utf-8-unix or some utf-8 variation for
windows (for instance)? 

I’m asking because its use was suggested on cider’s issue tracker (cider is a clojure programming environment for Emacs) (link https://github.com/bbatsov/ruby-style-guide/pull/338), but I couldn’t find any documentation about utf-8-auto and it seems that most Emacs extensions use utf-8-unix as their process coding systems. As I’m using only GNU/Linux and OS X I don’t have a problem with utf-8-unix, but Windows users obviously have. :-)

-- 
Cheers,
Bozhidar

[-- Attachment #2: Type: text/html, Size: 2196 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Using utf-8-auto as a process coding system
  2014-07-21 13:15 Using utf-8-auto as a process coding system Bozhidar Batsov
@ 2014-07-21 13:25 ` Andreas Schwab
  2014-07-21 14:54   ` Bozhidar Batsov
  0 siblings, 1 reply; 7+ messages in thread
From: Andreas Schwab @ 2014-07-21 13:25 UTC (permalink / raw)
  To: Bozhidar Batsov; +Cc: emacs-devel

Bozhidar Batsov <bozhidar@batsov.com> writes:

> I was wondering if someone can explain the rationale behind the utf-8-auto
> coding system that’s available in Emacs.

UTF-8 (auto-detect signature (BOM))

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Using utf-8-auto as a process coding system
  2014-07-21 13:25 ` Andreas Schwab
@ 2014-07-21 14:54   ` Bozhidar Batsov
  2014-07-21 15:36     ` Eli Zaretskii
  0 siblings, 1 reply; 7+ messages in thread
From: Bozhidar Batsov @ 2014-07-21 14:54 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 647 bytes --]

On July 21, 2014 at 4:25:04 PM, Andreas Schwab (schwab@suse.de) wrote:
Bozhidar Batsov <bozhidar@batsov.com> writes: 

> I was wondering if someone can explain the rationale behind the utf-8-auto 
> coding system that’s available in Emacs. 

UTF-8 (auto-detect signature (BOM)) 
So, it’s a good idea to use it for files, but it’s probably not a good idea to use it as process encoding, because there likely won’t be a BOM to check?



Andreas. 

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de 
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 
"And now for something completely different." 

[-- Attachment #2: Type: text/html, Size: 2460 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Using utf-8-auto as a process coding system
  2014-07-21 14:54   ` Bozhidar Batsov
@ 2014-07-21 15:36     ` Eli Zaretskii
  2014-07-22  5:55       ` Bozhidar Batsov
  0 siblings, 1 reply; 7+ messages in thread
From: Eli Zaretskii @ 2014-07-21 15:36 UTC (permalink / raw)
  To: Bozhidar Batsov; +Cc: schwab, emacs-devel

> Date: Mon, 21 Jul 2014 17:54:42 +0300
> From: Bozhidar Batsov <bozhidar@batsov.com>
> Cc: emacs-devel <emacs-devel@gnu.org>
> 
> On July 21, 2014 at 4:25:04 PM, Andreas Schwab (schwab@suse.de) wrote:
> Bozhidar Batsov <bozhidar@batsov.com> writes: 
> 
> > I was wondering if someone can explain the rationale behind the utf-8-auto 
> > coding system that’s available in Emacs. 
> 
> UTF-8 (auto-detect signature (BOM)) 
> So, it’s a good idea to use it for files, but it’s probably not a good idea to use it as process encoding, because there likely won’t be a BOM to check?

Actually, I don't see why it would be a good idea to use it anywhere.

Why did you get a recommendation for this encoding?  (Sorry, I didn't
find it in the URL you mentioned earlier.)  It's sound like a strange
recommendation to me.




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Using utf-8-auto as a process coding system
  2014-07-21 15:36     ` Eli Zaretskii
@ 2014-07-22  5:55       ` Bozhidar Batsov
  2014-07-25 13:03         ` Eli Zaretskii
  0 siblings, 1 reply; 7+ messages in thread
From: Bozhidar Batsov @ 2014-07-22  5:55 UTC (permalink / raw)
  To: Bozhidar Batsov, Eli Zaretskii; +Cc: schwab, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1646 bytes --]

Ops, I actually used an incorrect link. That’s the proper one - https://github.com/clojure-emacs/cider/issues/532

The gist of my problem is that Windows users have encoding related problems running cider (discussion here https://github.com/clojure-emacs/cider/issues/474). I guess the problem stems from this bit of code: 

(set-process-coding-system process 'utf-8-unix 'utf-8-unix)

As my Windows knowledge is quite limited I’m not sure how to handle the encoding for Windows users - perhaps I should check the operating system and set a different process encoding there? Maybe someone can point me to examples in the Emacs source code that handle this problem.

— 
Cheers,
Bozhidar
On July 21, 2014 at 6:37:17 PM, Eli Zaretskii (eliz@gnu.org) wrote:

> Date: Mon, 21 Jul 2014 17:54:42 +0300  
> From: Bozhidar Batsov <bozhidar@batsov.com>  
> Cc: emacs-devel <emacs-devel@gnu.org>  
>  
> On July 21, 2014 at 4:25:04 PM, Andreas Schwab (schwab@suse.de) wrote:  
> Bozhidar Batsov <bozhidar@batsov.com> writes:   
>  
> > I was wondering if someone can explain the rationale behind the utf-8-auto   
> > coding system that’s available in Emacs.   
>  
> UTF-8 (auto-detect signature (BOM))   
> So, it’s a good idea to use it for files, but it’s probably not a good idea to use it as process encoding, because there likely won’t be a BOM to check?  

Actually, I don't see why it would be a good idea to use it anywhere.  

Why did you get a recommendation for this encoding? (Sorry, I didn't  
find it in the URL you mentioned earlier.) It's sound like a strange  
recommendation to me.  

[-- Attachment #2: Type: text/html, Size: 3591 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Using utf-8-auto as a process coding system
  2014-07-22  5:55       ` Bozhidar Batsov
@ 2014-07-25 13:03         ` Eli Zaretskii
  2014-07-28 15:05           ` Bozhidar Batsov
  0 siblings, 1 reply; 7+ messages in thread
From: Eli Zaretskii @ 2014-07-25 13:03 UTC (permalink / raw)
  To: Bozhidar Batsov; +Cc: schwab, emacs-devel

> Date: Tue, 22 Jul 2014 08:55:53 +0300
> From: Bozhidar Batsov <bozhidar.batsov@gmail.com>
> Cc: schwab@suse.de, emacs-devel@gnu.org
> 
> Ops, I actually used an incorrect link. That’s the proper one - https://github.com/clojure-emacs/cider/issues/532

OK, but that one is very short ;-)

> The gist of my problem is that Windows users have encoding related problems running cider (discussion here https://github.com/clojure-emacs/cider/issues/474).

There's a lot of confusion in that discussion.

> I guess the problem stems from this bit of code: 
> 
> (set-process-coding-system process 'utf-8-unix 'utf-8-unix)

'-auto' is not about end-of-line (EOL) format, it is about the Byte
Order Mark BOM (http://en.wikipedia.org/wiki/Byte_order_mark).

Does Cider on Windows indeed output UTF-8 encoded text preceded by a
BOM?  I'd be surprised, as the BOM is normally not needed with UTF-8.
In which case all this -auto thing just comes from another confused
user.

If there's no BOM, the first thing I'd try on Windows is this:

 (set-process-coding-system process 'utf-8-dos 'utf-8-unix)

This is because Windows programs will normally accept Unix EOL format
on input, but will usually output Windows CR-LF EOL, which Emacs needs
to decode into a single newline character.

A more elegant solution, which should be platform-independent, is to
use something like below (untested)

  (set-process-coding-system process
  			     (cons (coding-system-change-text-conversion
			            (car default-process-coding-system)
			            'utf-8)
				   (coding-system-change-text-conversion
				    (cdr default-process-coding-system)
                                    'utf-8)))

This has the advantage that it uses the wisdom already invested in
setting the defaults for each platform.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Using utf-8-auto as a process coding system
  2014-07-25 13:03         ` Eli Zaretskii
@ 2014-07-28 15:05           ` Bozhidar Batsov
  0 siblings, 0 replies; 7+ messages in thread
From: Bozhidar Batsov @ 2014-07-28 15:05 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: schwab, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2282 bytes --]

On July 25, 2014 at 4:03:58 PM, Eli Zaretskii (eliz@gnu.org) wrote:
> Date: Tue, 22 Jul 2014 08:55:53 +0300 
> From: Bozhidar Batsov <bozhidar.batsov@gmail.com> 
> Cc: schwab@suse.de, emacs-devel@gnu.org 
> 
> Ops, I actually used an incorrect link. That’s the proper one - https://github.com/clojure-emacs/cider/issues/532 

OK, but that one is very short ;-) 

> The gist of my problem is that Windows users have encoding related problems running cider (discussion here https://github.com/clojure-emacs/cider/issues/474). 

There's a lot of confusion in that discussion. 

> I guess the problem stems from this bit of code:  
> 
> (set-process-coding-system process 'utf-8-unix 'utf-8-unix) 

'-auto' is not about end-of-line (EOL) format, it is about the Byte 
Order Mark BOM (http://en.wikipedia.org/wiki/Byte_order_mark). 

Does Cider on Windows indeed output UTF-8 encoded text preceded by a 
BOM? I'd be surprised, as the BOM is normally not needed with UTF-8. 
In which case all this -auto thing just comes from another confused 
user. 
The problem is Java, not cider. cider is an Emacs Lisp client for a Clojure REPL server (nREPL), which runs on top of the JVM. I seem to recall that Java doesn’t use UTF-8 at all, think it was using the older UTF-16.

I guess it’s compatible with UTF-8 to some extent.



If there's no BOM, the first thing I'd try on Windows is this: 

(set-process-coding-system process 'utf-8-dos 'utf-8-unix) 

This is because Windows programs will normally accept Unix EOL format 
on input, but will usually output Windows CR-LF EOL, which Emacs needs 
to decode into a single newline character. 

A more elegant solution, which should be platform-independent, is to 
use something like below (untested) 

(set-process-coding-system process 
(cons (coding-system-change-text-conversion 
(car default-process-coding-system) 
'utf-8) 
(coding-system-change-text-conversion 
(cdr default-process-coding-system) 
'utf-8))) 

This has the advantage that it uses the wisdom already invested in 
setting the defaults for each platform. 
Thanks for the suggestions. I’ll ask some of the Windows users to try them both and see what works and what doesn’t.



[-- Attachment #2: Type: text/html, Size: 5816 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-07-28 15:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-21 13:15 Using utf-8-auto as a process coding system Bozhidar Batsov
2014-07-21 13:25 ` Andreas Schwab
2014-07-21 14:54   ` Bozhidar Batsov
2014-07-21 15:36     ` Eli Zaretskii
2014-07-22  5:55       ` Bozhidar Batsov
2014-07-25 13:03         ` Eli Zaretskii
2014-07-28 15:05           ` Bozhidar Batsov

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).