unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#24953: 25.1; Possible inefficiency in UTF-8
@ 2016-11-16  4:25 Eli Barzilay
  2016-11-16 16:17 ` Eli Zaretskii
  0 siblings, 1 reply; 12+ messages in thread
From: Eli Barzilay @ 2016-11-16  4:25 UTC (permalink / raw)
  To: 24953

In an empty buffer, insert a pile of "foo" lines (a few screens high),
in the middle, add the result of (insert #x23ce) (to get the unicode
character).  Now `C-x RET l' and "UTF-8".  Now, just moving the cursor
around in the buffer shows very noticeable delays, up to more than a
second.  In the default language of "English" I don't see that
happening.  This is on a Windows 7 machine.




In GNU Emacs 25.1.1 (x86_64-w64-mingw32)
 of 2016-09-17 built on LAPHROAIG
Windowing system distributor 'Microsoft Corp.', version 6.1.7601
Configured using:
 'configure --without-dbus --without-compress-install CFLAGS=-static'

Configured features:
XPM JPEG TIFF GIF PNG RSVG SOUND NOTIFY ACL GNUTLS LIBXML2 ZLIB
TOOLKIT_SCROLL_BARS

Important settings:
  value of $LC_COLLATE: POSIX
  value of $LANG: en_US.UTF-8
  locale-coding-system: cp1252

Major mode: Text

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent messages:
next-line: End of buffer [9 times]
previous-line: Beginning of buffer [6 times]
Mark set [90 times]
previous-line: Beginning of buffer [10 times]
You can run the command ‘set-language-environment’ with C-x RET l
Auto-saving...done
next-line: End of buffer [11 times]
previous-line: Beginning of buffer [10 times]
next-line: End of buffer [9 times]
Making completion list...

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message idna dired format-spec rfc822
mml mml-sec password-cache epg epg-config gnus-util mm-decode mm-bodies
mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader sendmail
rfc2047 rfc2045 ietf-drums mm-util help-fns help-mode easymenu
cl-loaddefs pcase cl-lib mail-prsvr mail-utils time-date mule-util
tooltip eldoc electric uniquify ediff-hook vc-hooks lisp-float-type
mwheel dos-w32 ls-lisp disp-table w32-win w32-vars term/common-win
tool-bar dnd fontset image regexp-opt fringe tabulated-list newcomment
elisp-mode lisp-mode prog-mode register page menu-bar rfn-eshadow timer
select scroll-bar mouse jit-lock font-lock syntax facemenu font-core
frame cl-generic cham georgian utf-8-lang misc-lang vietnamese tibetan
thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek romanian
slovak czech european ethiopic indian cyrillic chinese charscript
case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer
cl-preloaded nadvice loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote w32notify w32 multi-tty
make-network-process emacs)

Memory information:
((conses 16 91694 5152)
 (symbols 56 19958 0)
 (miscs 48 116 129)
 (strings 32 16382 3241)
 (string-bytes 1 443818)
 (vectors 16 11828)
 (vector-slots 8 426659 5914)
 (floats 8 164 74)
 (intervals 56 442 840)
 (buffers 976 25))

-- 
                   ((x=>x(x))(x=>x(x)))                  Eli Barzilay:
                   http://barzilay.org/                  Maze is Life!





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#24953: 25.1; Possible inefficiency in UTF-8
  2016-11-16  4:25 bug#24953: 25.1; Possible inefficiency in UTF-8 Eli Barzilay
@ 2016-11-16 16:17 ` Eli Zaretskii
  2016-11-16 19:11   ` Eli Barzilay
  0 siblings, 1 reply; 12+ messages in thread
From: Eli Zaretskii @ 2016-11-16 16:17 UTC (permalink / raw)
  To: Eli Barzilay; +Cc: 24953-done

> From: Eli Barzilay <eli@barzilay.org>
> Date: Tue, 15 Nov 2016 23:25:35 -0500
> 
> In an empty buffer, insert a pile of "foo" lines (a few screens high),
> in the middle, add the result of (insert #x23ce) (to get the unicode
> character).  Now `C-x RET l' and "UTF-8".

Regardless of the issue reported here, I would advise against setting
the UTF-8 language environment on MS-Windows.  Doing so is likely to
screw you in some situations.  For example, that sets up the encoding
of communications with subprocesses to use UTF-8, something that only
works with pure ASCII text and command-line arguments, and breaks
otherwise on Windows.

You shouldn't need to do this, not in Emacs 25 anyway.

> Now, just moving the cursor around in the buffer shows very
> noticeable delays, up to more than a second.  In the default
> language of "English" I don't see that happening.  This is on a
> Windows 7 machine.

Fixed on the emacs-25 branch.  The reason was sub-optimal definition
of the default fontset wrt to fonts that cover symbols and punctuation
Unicode blocks.

Thanks.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#24953: 25.1; Possible inefficiency in UTF-8
  2016-11-16 16:17 ` Eli Zaretskii
@ 2016-11-16 19:11   ` Eli Barzilay
  2016-11-16 19:38     ` Eli Zaretskii
  0 siblings, 1 reply; 12+ messages in thread
From: Eli Barzilay @ 2016-11-16 19:11 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 24953-done

On Wed, Nov 16, 2016 at 11:17 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>> From: Eli Barzilay <eli@barzilay.org>
>> Date: Tue, 15 Nov 2016 23:25:35 -0500
>>
>> In an empty buffer, insert a pile of "foo" lines (a few screens high),
>> in the middle, add the result of (insert #x23ce) (to get the unicode
>> character).  Now `C-x RET l' and "UTF-8".
>
> Regardless of the issue reported here, I would advise against setting
> the UTF-8 language environment on MS-Windows.  [...]
> You shouldn't need to do this, not in Emacs 25 anyway.

Is it a good idea to use it on linux?  (I originally thought that all it
gets me is a default encoding for non-ascii files...)


>> Now, just moving the cursor around in the buffer shows very
>> noticeable delays, up to more than a second.  In the default language
>> of "English" I don't see that happening.  This is on a Windows 7
>> machine.
>
> Fixed on the emacs-25 branch.  The reason was sub-optimal definition
> of the default fontset wrt to fonts that cover symbols and punctuation
> Unicode blocks.

Thanks!

-- 
                   ((x=>x(x))(x=>x(x)))                  Eli Barzilay:
                   http://barzilay.org/                  Maze is Life!





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#24953: 25.1; Possible inefficiency in UTF-8
  2016-11-16 19:11   ` Eli Barzilay
@ 2016-11-16 19:38     ` Eli Zaretskii
  2016-11-16 19:59       ` Eli Barzilay
  0 siblings, 1 reply; 12+ messages in thread
From: Eli Zaretskii @ 2016-11-16 19:38 UTC (permalink / raw)
  To: Eli Barzilay; +Cc: 24953

> From: Eli Barzilay <eli@barzilay.org>
> Date: Wed, 16 Nov 2016 14:11:57 -0500
> Cc: 24953-done@debbugs.gnu.org
> 
> > Regardless of the issue reported here, I would advise against setting
> > the UTF-8 language environment on MS-Windows.  [...]
> > You shouldn't need to do this, not in Emacs 25 anyway.
> 
> Is it a good idea to use it on linux?

Yes.  (And it should be the default on GNU/Linux, at least on most of
its flavors.)

> (I originally thought that all it gets me is a default encoding for
> non-ascii files...)

You could customize just that, then.  And I don't really understand
why that is needed, either, unless you want the files you create to be
automatically in UTF-8.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#24953: 25.1; Possible inefficiency in UTF-8
  2016-11-16 19:38     ` Eli Zaretskii
@ 2016-11-16 19:59       ` Eli Barzilay
  2016-11-16 20:16         ` Eli Zaretskii
  0 siblings, 1 reply; 12+ messages in thread
From: Eli Barzilay @ 2016-11-16 19:59 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 24953

On Wed, Nov 16, 2016 at 2:38 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>> From: Eli Barzilay <eli@barzilay.org>
>> Date: Wed, 16 Nov 2016 14:11:57 -0500
>> Cc: 24953-done@debbugs.gnu.org
>>
>> > Regardless of the issue reported here, I would advise against setting
>> > the UTF-8 language environment on MS-Windows.  [...]
>> > You shouldn't need to do this, not in Emacs 25 anyway.
>>
>> Is it a good idea to use it on linux?
>
> Yes.  (And it should be the default on GNU/Linux, at least on most of
> its flavors.)

I built and installed 25.1, and it looks like the default is also
English.  (At least according to `current-language-environment'.)  Is
that a bug then?


>> (I originally thought that all it gets me is a default encoding for
>> non-ascii files...)
>
> You could customize just that, then.

Using `prefer-coding-system'?  Or something else?


> And I don't really understand why that is needed, either, unless you
> want the files you create to be automatically in UTF-8.

Doesn't that make sense if I almost always edit source code for
languages that use utf-8 to read sources?

-- 
                   ((x=>x(x))(x=>x(x)))                  Eli Barzilay:
                   http://barzilay.org/                  Maze is Life!





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#24953: 25.1; Possible inefficiency in UTF-8
  2016-11-16 19:59       ` Eli Barzilay
@ 2016-11-16 20:16         ` Eli Zaretskii
  2016-11-20 22:32           ` Eli Barzilay
  0 siblings, 1 reply; 12+ messages in thread
From: Eli Zaretskii @ 2016-11-16 20:16 UTC (permalink / raw)
  To: Eli Barzilay; +Cc: 24953

> From: Eli Barzilay <eli@barzilay.org>
> Date: Wed, 16 Nov 2016 14:59:58 -0500
> Cc: 24953@debbugs.gnu.org
> 
> > Yes.  (And it should be the default on GNU/Linux, at least on most of
> > its flavors.)
> 
> I built and installed 25.1, and it looks like the default is also
> English.  (At least according to `current-language-environment'.)  Is
> that a bug then?

It's not a bug.  The important thing is the various coding-systems: if
your locale is something like en_US.UTF-8, then the default encodings
should all be utf-8, which is what you want.

> >> (I originally thought that all it gets me is a default encoding for
> >> non-ascii files...)
> >
> > You could customize just that, then.
> 
> Using `prefer-coding-system'?  Or something else?

Using setq-default.

> > And I don't really understand why that is needed, either, unless you
> > want the files you create to be automatically in UTF-8.
> 
> Doesn't that make sense if I almost always edit source code for
> languages that use utf-8 to read sources?

Languages that use utf-8 should already encode in utf-8 by default.
Which ones don't?





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#24953: 25.1; Possible inefficiency in UTF-8
  2016-11-16 20:16         ` Eli Zaretskii
@ 2016-11-20 22:32           ` Eli Barzilay
  2016-11-21  3:32             ` Eli Zaretskii
  0 siblings, 1 reply; 12+ messages in thread
From: Eli Barzilay @ 2016-11-20 22:32 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 24953

On Wed, Nov 16, 2016 at 3:16 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>>
>> >> (I originally thought that all it gets me is a default encoding for
>> >> non-ascii files...)
>> >
>> > You could customize just that, then.
>>
>> Using `prefer-coding-system'?  Or something else?
>
> Using setq-default.

Sorry for being thick, but which variable?


>> > And I don't really understand why that is needed, either, unless
>> > you want the files you create to be automatically in UTF-8.
>>
>> Doesn't that make sense if I almost always edit source code for
>> languages that use utf-8 to read sources?
>
> Languages that use utf-8 should already encode in utf-8 by default.
> Which ones don't?

I can't reproduce any such problems so perhaps whatever made me add it
is no longer a problem...  However, since I removed the UTF-8 language
setting on Windows I had one case which wasn't working right: I ran some
`git show ...` to look at a past change in a (utf-8 encoded) source
file; I ran it with `shell-command` and the resulting text that popped
up had some junk next to some non-ascii characters.  I'm pretty sure
that I didn't have such an issue with the setting on.

If this is not expected, I can try to get reproducible instructions.

-- 
                   ((x=>x(x))(x=>x(x)))                  Eli Barzilay:
                   http://barzilay.org/                  Maze is Life!





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#24953: 25.1; Possible inefficiency in UTF-8
  2016-11-20 22:32           ` Eli Barzilay
@ 2016-11-21  3:32             ` Eli Zaretskii
  2016-11-21 21:38               ` Eli Barzilay
  0 siblings, 1 reply; 12+ messages in thread
From: Eli Zaretskii @ 2016-11-21  3:32 UTC (permalink / raw)
  To: Eli Barzilay; +Cc: 24953

> From: Eli Barzilay <eli@barzilay.org>
> Date: Sun, 20 Nov 2016 17:32:07 -0500
> Cc: 24953@debbugs.gnu.org
> 
> On Wed, Nov 16, 2016 at 3:16 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> >>
> >> >> (I originally thought that all it gets me is a default encoding for
> >> >> non-ascii files...)
> >> >
> >> > You could customize just that, then.
> >>
> >> Using `prefer-coding-system'?  Or something else?
> >
> > Using setq-default.
> 
> Sorry for being thick, but which variable?

buffer-file-coding-system

> >> Doesn't that make sense if I almost always edit source code for
> >> languages that use utf-8 to read sources?
> >
> > Languages that use utf-8 should already encode in utf-8 by default.
> > Which ones don't?
> 
> I can't reproduce any such problems so perhaps whatever made me add it
> is no longer a problem...  However, since I removed the UTF-8 language
> setting on Windows I had one case which wasn't working right: I ran some
> `git show ...` to look at a past change in a (utf-8 encoded) source
> file; I ran it with `shell-command` and the resulting text that popped
> up had some junk next to some non-ascii characters.  I'm pretty sure
> that I didn't have such an issue with the setting on.
> 
> If this is not expected, I can try to get reproducible instructions.

It isn't expected with Emacs 25.1, as it specifically sets up things
assuming Git reports its data in UTF-8.  Are you sure you didn't set
i18n.commitEncoding in your Git configuration, or have some non-ASCII
text encoded in something other than UTF-8 in that repository?





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#24953: 25.1; Possible inefficiency in UTF-8
  2016-11-21  3:32             ` Eli Zaretskii
@ 2016-11-21 21:38               ` Eli Barzilay
  2016-11-22  3:33                 ` Eli Zaretskii
  0 siblings, 1 reply; 12+ messages in thread
From: Eli Barzilay @ 2016-11-21 21:38 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 24953

On Sun, Nov 20, 2016 at 10:32 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>>
>> If this is not expected, I can try to get reproducible instructions.
>
> It isn't expected with Emacs 25.1, as it specifically sets up things
> assuming Git reports its data in UTF-8.  Are you sure you didn't set
> i18n.commitEncoding in your Git configuration, or have some non-ASCII
> text encoded in something other than UTF-8 in that repository?

I just ran into an encoding problem, with files now: openning a utf-8
file with a single lambda (\316\273) sometimes failes, and I traced this
to whether I start Emacs from Windows or from a cygwin shell.  In the
latter case, my shell has an explicit LANG setting and the result is
that Emacs opens that file fine, but in the first case, when Emacs is
started directly by windows, there is no LANG, and the utf-8 file is not
treated as such (looks like it opens it in latin-1).  I verified this
with "emacs -Q" too.

-- 
                   ((x=>x(x))(x=>x(x)))                  Eli Barzilay:
                   http://barzilay.org/                  Maze is Life!





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#24953: 25.1; Possible inefficiency in UTF-8
  2016-11-21 21:38               ` Eli Barzilay
@ 2016-11-22  3:33                 ` Eli Zaretskii
  2016-12-11 10:53                   ` Eli Barzilay
  0 siblings, 1 reply; 12+ messages in thread
From: Eli Zaretskii @ 2016-11-22  3:33 UTC (permalink / raw)
  To: Eli Barzilay; +Cc: 24953

> From: Eli Barzilay <eli@barzilay.org>
> Date: Mon, 21 Nov 2016 16:38:07 -0500
> Cc: 24953@debbugs.gnu.org
> 
> > It isn't expected with Emacs 25.1, as it specifically sets up things
> > assuming Git reports its data in UTF-8.  Are you sure you didn't set
> > i18n.commitEncoding in your Git configuration, or have some non-ASCII
> > text encoded in something other than UTF-8 in that repository?
> 
> I just ran into an encoding problem, with files now: openning a utf-8
> file with a single lambda (\316\273) sometimes failes, and I traced this
> to whether I start Emacs from Windows or from a cygwin shell.

That's unrelated to Git, though.

> In the latter case, my shell has an explicit LANG setting and the
> result is that Emacs opens that file fine, but in the first case,
> when Emacs is started directly by windows, there is no LANG, and the
> utf-8 file is not treated as such (looks like it opens it in
> latin-1).  I verified this with "emacs -Q" too.

Emacs sets LANG internally, by using a suitable Windows API, when it
runs on Windows (unless LANG is set in the shell), so the fact that
LANG is not set is not a problem in itself.

What you describe is expected with the default Windows settings: a
file that is in no particular mode which requires UTF-8 will not be
automatically decoded as UTF-8, without some customizations.  You
could also put file-local variables into the file.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#24953: 25.1; Possible inefficiency in UTF-8
  2016-11-22  3:33                 ` Eli Zaretskii
@ 2016-12-11 10:53                   ` Eli Barzilay
  2016-12-11 15:37                     ` Eli Zaretskii
  0 siblings, 1 reply; 12+ messages in thread
From: Eli Barzilay @ 2016-12-11 10:53 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 24953

On Mon, Nov 21, 2016 at 10:33 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>> From: Eli Barzilay <eli@barzilay.org>
>> Date: Mon, 21 Nov 2016 16:38:07 -0500
>> Cc: 24953@debbugs.gnu.org
>>
>> In the latter case, my shell has an explicit LANG setting and the
>> result is that Emacs opens that file fine, but in the first case,
>> when Emacs is started directly by windows, there is no LANG, and the
>> utf-8 file is not treated as such (looks like it opens it in
>> latin-1).  I verified this with "emacs -Q" too.
>
> Emacs sets LANG internally, by using a suitable Windows API, when it
> runs on Windows (unless LANG is set in the shell), so the fact that
> LANG is not set is not a problem in itself.
>
> What you describe is expected with the default Windows settings: a
> file that is in no particular mode which requires UTF-8 will not be
> automatically decoded as UTF-8, without some customizations.  You
> could also put file-local variables into the file.

I still didn't get to try and see if I can get the git problem, but the
problem seems to be moot for me: I use *both* Linux and Windows, a lot.
I synchronize files between the two, and I work with Linux mounts on
Windows.  In short, I have a ton of files that are UTF-8, so it makes
sense to have it be my default on Windows too.  In the around-month-or-
so which I used without it, I ran into several cases where the default
character encoding was broken, and OTOH, in years of using an explicit
UTF-8 I haven't had any problems...

-- 
                   ((x=>x(x))(x=>x(x)))                  Eli Barzilay:
                   http://barzilay.org/                  Maze is Life!





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#24953: 25.1; Possible inefficiency in UTF-8
  2016-12-11 10:53                   ` Eli Barzilay
@ 2016-12-11 15:37                     ` Eli Zaretskii
  0 siblings, 0 replies; 12+ messages in thread
From: Eli Zaretskii @ 2016-12-11 15:37 UTC (permalink / raw)
  To: Eli Barzilay; +Cc: 24953

> From: Eli Barzilay <eli@barzilay.org>
> Date: Sun, 11 Dec 2016 05:53:33 -0500
> Cc: 24953@debbugs.gnu.org
> 
> I still didn't get to try and see if I can get the git problem, but the
> problem seems to be moot for me: I use *both* Linux and Windows, a lot.
> I synchronize files between the two, and I work with Linux mounts on
> Windows.  In short, I have a ton of files that are UTF-8, so it makes
> sense to have it be my default on Windows too.  In the around-month-or-
> so which I used without it, I ran into several cases where the default
> character encoding was broken, and OTOH, in years of using an explicit
> UTF-8 I haven't had any problems...

Whatever you do, don't change the locale, because it will affect how
command-line arguments to subprocesses are encoded, which will bite
you some day.  If you need the text decoding/encoding to use UTF-8 by
default, use setq-default to change buffer-file-coding-system.





^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-12-11 15:37 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-16  4:25 bug#24953: 25.1; Possible inefficiency in UTF-8 Eli Barzilay
2016-11-16 16:17 ` Eli Zaretskii
2016-11-16 19:11   ` Eli Barzilay
2016-11-16 19:38     ` Eli Zaretskii
2016-11-16 19:59       ` Eli Barzilay
2016-11-16 20:16         ` Eli Zaretskii
2016-11-20 22:32           ` Eli Barzilay
2016-11-21  3:32             ` Eli Zaretskii
2016-11-21 21:38               ` Eli Barzilay
2016-11-22  3:33                 ` Eli Zaretskii
2016-12-11 10:53                   ` Eli Barzilay
2016-12-11 15:37                     ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).