unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
@ 2014-12-16 15:21 Tassilo Horn
  2014-12-16 16:05 ` Eli Zaretskii
                   ` (3 more replies)
  0 siblings, 4 replies; 33+ messages in thread
From: Tassilo Horn @ 2014-12-16 15:21 UTC (permalink / raw)
  To: 19393


I've dowloaded the following file

  ftp://ftp.fu-berlin.de/pub/misc/movies/database/movies.list.gz

which contains all movies known to the international movie database
(IMDb.com).  When I open that file using "emacs -Q movies.list.gz" (or
unzip it first) and then do M-x describe-coding-system I can see that it
is "t -- raw-text-unix".  As a result of this, the last movie in that
file is displayed as "\374\347 (2012) 2012".

However, according to the `file' command, the file is plain ISO-8859.
And I can easily convert it to UTF-8 using

  % iconv -f ISO-8859-15 -t UTF-8 < movies.list > movies.list.utf8

without any encoding errors being reported.

Emacs can guess the encoding of the resulting UTF-8 encoded file
movies.list.utf8, i.e., the coding system when opening the file is "U --
utf-8-unix".  Emacs shows the last movie as "üç (2012) 2012" which is
correct.

I also tried

  % iconv -f ISO-8859-15 -t ISO-8859-15 < movies.list > movies.list.iso-8859

but for the result file movies.list.iso-8859 the same issue as for the
original file applies, i.e., Emacs uses the encoding "t --
raw-text-unix" and displays garbage for all non-ASCII characters.

I also can't force Emacs to use ISO-8859 for that or the original file.
`C-x RET f iso-8859-15 RET' results in a query that certain characters
cannot be encoded using latin-9, e.g., \374 and \347, and I'm expected
to choose another encoding.

So `file' and `iconv' say the file is valid latin-9 but Emacs seems to
disagree.  Who is correct?  I tend towards file/iconv but I might be
wrong.

And shouldn't it be possible to force Emacs to a certain coding system?
I mean, even if a file's content has a broken encoding, e.g., coding X
in part A, coding Y in part B, I might want to switch to X in order to
be able to read part A at all.  (Ok, in that case I should get a big fat
warning that saving the buffer will corrupt the file even more.  Or
maybe the buffer should become read-only...)

The issue can be reproduced also with the other IMDb files containing
non-ASCII chars, e.g., actors.list.gz, actresses.list.gz, etc.  They are
all available in the FTP directory above.



In GNU Emacs 25.0.50.10 (x86_64-unknown-linux-gnu, GTK+ Version 3.14.5)
 of 2014-12-16 on thinkpad-t440p
Repository revision: 15426191a1353ac208d8ebe4a5920228e0df41a4
Windowing system distributor `The X.Org Foundation', version 11.0.11602901
System Description:	Arch Linux

Configured features:
XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND GPM DBUS GCONF GSETTINGS
NOTIFY ACL GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB

Important settings:
  value of $LC_MONETARY: de_DE.utf8
  value of $LC_NUMERIC: de_DE.utf8
  value of $LC_TIME: de_DE.utf8
  value of $LANG: en_US.utf8
  locale-coding-system: utf-8-unix

Major mode: Group

Minor modes in effect:
  TeX-PDF-mode: t
  TeX-source-correlate-mode: t
  diff-auto-refine-mode: t
  gnus-topic-mode: t
  hl-line-mode: t
  global-company-mode: t
  global-aggressive-indent-mode: t
  gnus-undo-mode: t
  global-edit-server-edit-mode: t
  recentf-mode: t
  shell-dirtrack-mode: t
  helm-match-plugin-mode: t
  helm-occur-match-plugin-mode: t
  global-subword-mode: t
  subword-mode: t
  savehist-mode: t
  show-paren-mode: t
  icomplete-mode: t
  minibuffer-depth-indicate-mode: t
  electric-pair-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  buffer-read-only: t
  column-number-mode: t
  line-number-mode: t

Recent messages:
Buffer dictionary was nil
Ispell process killed
Local Ispell dictionary set to en
Buffer dictionary is now en
Starting new Ispell process /usr/bin/aspell with en dictionary...
Checking region...
Spell Checking...100% [diss]
Spell Checking completed.
Quit
Auto-saving...

Load-path shadows:
~/Repos/el/auctex/lpath hides ~/Repos/el/gnus/lisp/lpath
~/Repos/el/gnus/lisp/md4 hides /home/horn/Repos/el/emacs/lisp/md4
~/Repos/el/gnus/lisp/color hides /home/horn/Repos/el/emacs/lisp/color
~/Repos/el/gnus/lisp/format-spec hides /home/horn/Repos/el/emacs/lisp/format-spec
~/Repos/el/gnus/lisp/password-cache hides /home/horn/Repos/el/emacs/lisp/password-cache
~/Repos/el/gnus/lisp/hex-util hides /home/horn/Repos/el/emacs/lisp/hex-util
~/Repos/el/gnus/lisp/dns-mode hides /home/horn/Repos/el/emacs/lisp/textmodes/dns-mode
/home/horn/.emacs.d/elpa/org-20141215/ob-plantuml hides /home/horn/Repos/el/emacs/lisp/org/ob-plantuml
/home/horn/.emacs.d/elpa/org-20141215/org-archive hides /home/horn/Repos/el/emacs/lisp/org/org-archive
/home/horn/.emacs.d/elpa/org-20141215/org-w3m hides /home/horn/Repos/el/emacs/lisp/org/org-w3m
/home/horn/.emacs.d/elpa/org-20141215/ox-org hides /home/horn/Repos/el/emacs/lisp/org/ox-org
/home/horn/.emacs.d/elpa/org-20141215/ob hides /home/horn/Repos/el/emacs/lisp/org/ob
/home/horn/.emacs.d/elpa/org-20141215/org-faces hides /home/horn/Repos/el/emacs/lisp/org/org-faces
/home/horn/.emacs.d/elpa/org-20141215/ob-awk hides /home/horn/Repos/el/emacs/lisp/org/ob-awk
/home/horn/.emacs.d/elpa/org-20141215/org-habit hides /home/horn/Repos/el/emacs/lisp/org/org-habit
/home/horn/.emacs.d/elpa/org-20141215/ob-sass hides /home/horn/Repos/el/emacs/lisp/org/ob-sass
/home/horn/.emacs.d/elpa/org-20141215/org-ctags hides /home/horn/Repos/el/emacs/lisp/org/org-ctags
/home/horn/.emacs.d/elpa/org-20141215/ob-screen hides /home/horn/Repos/el/emacs/lisp/org/ob-screen
/home/horn/.emacs.d/elpa/org-20141215/ox-md hides /home/horn/Repos/el/emacs/lisp/org/ox-md
/home/horn/.emacs.d/elpa/org-20141215/ox-beamer hides /home/horn/Repos/el/emacs/lisp/org/ox-beamer
/home/horn/.emacs.d/elpa/org-20141215/org-loaddefs hides /home/horn/Repos/el/emacs/lisp/org/org-loaddefs
/home/horn/.emacs.d/elpa/org-20141215/ob-perl hides /home/horn/Repos/el/emacs/lisp/org/ob-perl
/home/horn/.emacs.d/elpa/org-20141215/org-rmail hides /home/horn/Repos/el/emacs/lisp/org/org-rmail
/home/horn/.emacs.d/elpa/org-20141215/org-id hides /home/horn/Repos/el/emacs/lisp/org/org-id
/home/horn/.emacs.d/elpa/org-20141215/ox-publish hides /home/horn/Repos/el/emacs/lisp/org/ox-publish
/home/horn/.emacs.d/elpa/org-20141215/ob-maxima hides /home/horn/Repos/el/emacs/lisp/org/ob-maxima
/home/horn/.emacs.d/elpa/org-20141215/org-install hides /home/horn/Repos/el/emacs/lisp/org/org-install
/home/horn/.emacs.d/elpa/org-20141215/org-feed hides /home/horn/Repos/el/emacs/lisp/org/org-feed
/home/horn/.emacs.d/elpa/org-20141215/ob-R hides /home/horn/Repos/el/emacs/lisp/org/ob-R
/home/horn/.emacs.d/elpa/org-20141215/ox-latex hides /home/horn/Repos/el/emacs/lisp/org/ox-latex
/home/horn/.emacs.d/elpa/org-20141215/org-timer hides /home/horn/Repos/el/emacs/lisp/org/org-timer
/home/horn/.emacs.d/elpa/org-20141215/ob-core hides /home/horn/Repos/el/emacs/lisp/org/ob-core
/home/horn/.emacs.d/elpa/org-20141215/org-datetree hides /home/horn/Repos/el/emacs/lisp/org/org-datetree
/home/horn/.emacs.d/elpa/org-20141215/ob-sql hides /home/horn/Repos/el/emacs/lisp/org/ob-sql
/home/horn/.emacs.d/elpa/org-20141215/ob-js hides /home/horn/Repos/el/emacs/lisp/org/ob-js
/home/horn/.emacs.d/elpa/org-20141215/ob-tangle hides /home/horn/Repos/el/emacs/lisp/org/ob-tangle
/home/horn/.emacs.d/elpa/org-20141215/org-capture hides /home/horn/Repos/el/emacs/lisp/org/org-capture
/home/horn/.emacs.d/elpa/org-20141215/ob-haskell hides /home/horn/Repos/el/emacs/lisp/org/ob-haskell
/home/horn/.emacs.d/elpa/org-20141215/ob-dot hides /home/horn/Repos/el/emacs/lisp/org/ob-dot
/home/horn/.emacs.d/elpa/org-20141215/ob-exp hides /home/horn/Repos/el/emacs/lisp/org/ob-exp
/home/horn/.emacs.d/elpa/org-20141215/org-info hides /home/horn/Repos/el/emacs/lisp/org/org-info
/home/horn/.emacs.d/elpa/org-20141215/ob-octave hides /home/horn/Repos/el/emacs/lisp/org/ob-octave
/home/horn/.emacs.d/elpa/org-20141215/org-mobile hides /home/horn/Repos/el/emacs/lisp/org/org-mobile
/home/horn/.emacs.d/elpa/org-20141215/org-indent hides /home/horn/Repos/el/emacs/lisp/org/org-indent
/home/horn/.emacs.d/elpa/org-20141215/org-attach hides /home/horn/Repos/el/emacs/lisp/org/org-attach
/home/horn/.emacs.d/elpa/org-20141215/ob-java hides /home/horn/Repos/el/emacs/lisp/org/ob-java
/home/horn/.emacs.d/elpa/org-20141215/org-mhe hides /home/horn/Repos/el/emacs/lisp/org/org-mhe
/home/horn/.emacs.d/elpa/org-20141215/ob-scheme hides /home/horn/Repos/el/emacs/lisp/org/ob-scheme
/home/horn/.emacs.d/elpa/org-20141215/ob-lob hides /home/horn/Repos/el/emacs/lisp/org/ob-lob
/home/horn/.emacs.d/elpa/org-20141215/ob-calc hides /home/horn/Repos/el/emacs/lisp/org/ob-calc
/home/horn/.emacs.d/elpa/org-20141215/org-agenda hides /home/horn/Repos/el/emacs/lisp/org/org-agenda
/home/horn/.emacs.d/elpa/org-20141215/org-version hides /home/horn/Repos/el/emacs/lisp/org/org-version
/home/horn/.emacs.d/elpa/org-20141215/org-clock hides /home/horn/Repos/el/emacs/lisp/org/org-clock
/home/horn/.emacs.d/elpa/org-20141215/org-macro hides /home/horn/Repos/el/emacs/lisp/org/org-macro
/home/horn/.emacs.d/elpa/org-20141215/ob-fortran hides /home/horn/Repos/el/emacs/lisp/org/ob-fortran
/home/horn/.emacs.d/elpa/org-20141215/ob-picolisp hides /home/horn/Repos/el/emacs/lisp/org/ob-picolisp
/home/horn/.emacs.d/elpa/org-20141215/ob-mscgen hides /home/horn/Repos/el/emacs/lisp/org/ob-mscgen
/home/horn/.emacs.d/elpa/org-20141215/ox-texinfo hides /home/horn/Repos/el/emacs/lisp/org/ox-texinfo
/home/horn/.emacs.d/elpa/org-20141215/org-table hides /home/horn/Repos/el/emacs/lisp/org/org-table
/home/horn/.emacs.d/elpa/org-20141215/ob-matlab hides /home/horn/Repos/el/emacs/lisp/org/ob-matlab
/home/horn/.emacs.d/elpa/org-20141215/ox-html hides /home/horn/Repos/el/emacs/lisp/org/ox-html
/home/horn/.emacs.d/elpa/org-20141215/ox-icalendar hides /home/horn/Repos/el/emacs/lisp/org/ox-icalendar
/home/horn/.emacs.d/elpa/org-20141215/org-bbdb hides /home/horn/Repos/el/emacs/lisp/org/org-bbdb
/home/horn/.emacs.d/elpa/org-20141215/ob-asymptote hides /home/horn/Repos/el/emacs/lisp/org/ob-asymptote
/home/horn/.emacs.d/elpa/org-20141215/org-eshell hides /home/horn/Repos/el/emacs/lisp/org/org-eshell
/home/horn/.emacs.d/elpa/org-20141215/ob-comint hides /home/horn/Repos/el/emacs/lisp/org/ob-comint
/home/horn/.emacs.d/elpa/org-20141215/org hides /home/horn/Repos/el/emacs/lisp/org/org
/home/horn/.emacs.d/elpa/org-20141215/org-irc hides /home/horn/Repos/el/emacs/lisp/org/org-irc
/home/horn/.emacs.d/elpa/org-20141215/ob-table hides /home/horn/Repos/el/emacs/lisp/org/ob-table
/home/horn/.emacs.d/elpa/org-20141215/ob-scala hides /home/horn/Repos/el/emacs/lisp/org/ob-scala
/home/horn/.emacs.d/elpa/org-20141215/ob-io hides /home/horn/Repos/el/emacs/lisp/org/ob-io
/home/horn/.emacs.d/elpa/org-20141215/ox-ascii hides /home/horn/Repos/el/emacs/lisp/org/ox-ascii
/home/horn/.emacs.d/elpa/org-20141215/ob-lisp hides /home/horn/Repos/el/emacs/lisp/org/ob-lisp
/home/horn/.emacs.d/elpa/org-20141215/org-macs hides /home/horn/Repos/el/emacs/lisp/org/org-macs
/home/horn/.emacs.d/elpa/org-20141215/ob-sqlite hides /home/horn/Repos/el/emacs/lisp/org/ob-sqlite
/home/horn/.emacs.d/elpa/org-20141215/ob-latex hides /home/horn/Repos/el/emacs/lisp/org/ob-latex
/home/horn/.emacs.d/elpa/org-20141215/ob-css hides /home/horn/Repos/el/emacs/lisp/org/ob-css
/home/horn/.emacs.d/elpa/org-20141215/org-protocol hides /home/horn/Repos/el/emacs/lisp/org/org-protocol
/home/horn/.emacs.d/elpa/org-20141215/ob-keys hides /home/horn/Repos/el/emacs/lisp/org/ob-keys
/home/horn/.emacs.d/elpa/org-20141215/org-mouse hides /home/horn/Repos/el/emacs/lisp/org/org-mouse
/home/horn/.emacs.d/elpa/org-20141215/ob-ruby hides /home/horn/Repos/el/emacs/lisp/org/ob-ruby
/home/horn/.emacs.d/elpa/org-20141215/org-element hides /home/horn/Repos/el/emacs/lisp/org/org-element
/home/horn/.emacs.d/elpa/org-20141215/org-bibtex hides /home/horn/Repos/el/emacs/lisp/org/org-bibtex
/home/horn/.emacs.d/elpa/org-20141215/ob-C hides /home/horn/Repos/el/emacs/lisp/org/ob-C
/home/horn/.emacs.d/elpa/org-20141215/org-src hides /home/horn/Repos/el/emacs/lisp/org/org-src
/home/horn/.emacs.d/elpa/org-20141215/ob-makefile hides /home/horn/Repos/el/emacs/lisp/org/ob-makefile
/home/horn/.emacs.d/elpa/org-20141215/org-colview hides /home/horn/Repos/el/emacs/lisp/org/org-colview
/home/horn/.emacs.d/elpa/org-20141215/ob-ledger hides /home/horn/Repos/el/emacs/lisp/org/ob-ledger
/home/horn/.emacs.d/elpa/org-20141215/org-crypt hides /home/horn/Repos/el/emacs/lisp/org/org-crypt
/home/horn/.emacs.d/elpa/org-20141215/ob-shen hides /home/horn/Repos/el/emacs/lisp/org/ob-shen
/home/horn/.emacs.d/elpa/org-20141215/ob-gnuplot hides /home/horn/Repos/el/emacs/lisp/org/ob-gnuplot
/home/horn/.emacs.d/elpa/org-20141215/org-inlinetask hides /home/horn/Repos/el/emacs/lisp/org/org-inlinetask
/home/horn/.emacs.d/elpa/org-20141215/org-gnus hides /home/horn/Repos/el/emacs/lisp/org/org-gnus
/home/horn/.emacs.d/elpa/org-20141215/ob-sh hides /home/horn/Repos/el/emacs/lisp/org/ob-sh
/home/horn/.emacs.d/elpa/org-20141215/org-pcomplete hides /home/horn/Repos/el/emacs/lisp/org/org-pcomplete
/home/horn/.emacs.d/elpa/org-20141215/org-docview hides /home/horn/Repos/el/emacs/lisp/org/org-docview
/home/horn/.emacs.d/elpa/org-20141215/ox-man hides /home/horn/Repos/el/emacs/lisp/org/ox-man
/home/horn/.emacs.d/elpa/org-20141215/org-plot hides /home/horn/Repos/el/emacs/lisp/org/org-plot
/home/horn/.emacs.d/elpa/org-20141215/ox hides /home/horn/Repos/el/emacs/lisp/org/ox
/home/horn/.emacs.d/elpa/org-20141215/ob-python hides /home/horn/Repos/el/emacs/lisp/org/ob-python
/home/horn/.emacs.d/elpa/org-20141215/ob-eval hides /home/horn/Repos/el/emacs/lisp/org/ob-eval
/home/horn/.emacs.d/elpa/org-20141215/ob-clojure hides /home/horn/Repos/el/emacs/lisp/org/ob-clojure
/home/horn/.emacs.d/elpa/org-20141215/ob-ocaml hides /home/horn/Repos/el/emacs/lisp/org/ob-ocaml
/home/horn/.emacs.d/elpa/org-20141215/ox-odt hides /home/horn/Repos/el/emacs/lisp/org/ox-odt
/home/horn/.emacs.d/elpa/org-20141215/org-compat hides /home/horn/Repos/el/emacs/lisp/org/org-compat
/home/horn/.emacs.d/elpa/org-20141215/org-list hides /home/horn/Repos/el/emacs/lisp/org/org-list
/home/horn/.emacs.d/elpa/org-20141215/ob-emacs-lisp hides /home/horn/Repos/el/emacs/lisp/org/ob-emacs-lisp
/home/horn/.emacs.d/elpa/org-20141215/org-entities hides /home/horn/Repos/el/emacs/lisp/org/org-entities
/home/horn/.emacs.d/elpa/org-20141215/ob-ref hides /home/horn/Repos/el/emacs/lisp/org/ob-ref
/home/horn/.emacs.d/elpa/org-20141215/ob-ditaa hides /home/horn/Repos/el/emacs/lisp/org/ob-ditaa
/home/horn/.emacs.d/elpa/org-20141215/ob-lilypond hides /home/horn/Repos/el/emacs/lisp/org/ob-lilypond
/home/horn/.emacs.d/elpa/org-20141215/ob-org hides /home/horn/Repos/el/emacs/lisp/org/ob-org
/home/horn/.emacs.d/elpa/org-20141215/org-footnote hides /home/horn/Repos/el/emacs/lisp/org/org-footnote
~/Repos/el/gnus/lisp/dig hides /home/horn/Repos/el/emacs/lisp/net/dig
~/Repos/el/gnus/lisp/hmac-md5 hides /home/horn/Repos/el/emacs/lisp/net/hmac-md5
~/Repos/el/gnus/lisp/ntlm hides /home/horn/Repos/el/emacs/lisp/net/ntlm
~/Repos/el/gnus/lisp/hmac-def hides /home/horn/Repos/el/emacs/lisp/net/hmac-def
~/Repos/el/gnus/lisp/sasl-ntlm hides /home/horn/Repos/el/emacs/lisp/net/sasl-ntlm
~/Repos/el/gnus/lisp/sasl-cram hides /home/horn/Repos/el/emacs/lisp/net/sasl-cram
~/Repos/el/gnus/lisp/dns hides /home/horn/Repos/el/emacs/lisp/net/dns
~/Repos/el/gnus/lisp/sasl hides /home/horn/Repos/el/emacs/lisp/net/sasl
~/Repos/el/gnus/lisp/tls hides /home/horn/Repos/el/emacs/lisp/net/tls
~/Repos/el/gnus/lisp/netrc hides /home/horn/Repos/el/emacs/lisp/net/netrc
~/Repos/el/gnus/lisp/sasl-digest hides /home/horn/Repos/el/emacs/lisp/net/sasl-digest
~/Repos/el/gnus/lisp/uudecode hides /home/horn/Repos/el/emacs/lisp/mail/uudecode
~/Repos/el/gnus/lisp/binhex hides /home/horn/Repos/el/emacs/lisp/mail/binhex
~/Repos/el/gnus/lisp/hashcash hides /home/horn/Repos/el/emacs/lisp/mail/hashcash
~/Repos/el/gnus/lisp/canlock hides /home/horn/Repos/el/emacs/lisp/gnus/canlock
~/Repos/el/gnus/lisp/nneething hides /home/horn/Repos/el/emacs/lisp/gnus/nneething
~/Repos/el/gnus/lisp/mm-encode hides /home/horn/Repos/el/emacs/lisp/gnus/mm-encode
~/Repos/el/gnus/lisp/mm-util hides /home/horn/Repos/el/emacs/lisp/gnus/mm-util
~/Repos/el/gnus/lisp/rfc2047 hides /home/horn/Repos/el/emacs/lisp/gnus/rfc2047
~/Repos/el/gnus/lisp/nnml hides /home/horn/Repos/el/emacs/lisp/gnus/nnml
~/Repos/el/gnus/lisp/gnus-cus hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-cus
~/Repos/el/gnus/lisp/gnus-range hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-range
~/Repos/el/gnus/lisp/gnus-int hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-int
~/Repos/el/gnus/lisp/gnus-cloud hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-cloud
~/Repos/el/gnus/lisp/spam-stat hides /home/horn/Repos/el/emacs/lisp/gnus/spam-stat
~/Repos/el/gnus/lisp/nnmh hides /home/horn/Repos/el/emacs/lisp/gnus/nnmh
~/Repos/el/gnus/lisp/gnus-mlspl hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-mlspl
~/Repos/el/gnus/lisp/deuglify hides /home/horn/Repos/el/emacs/lisp/gnus/deuglify
~/Repos/el/gnus/lisp/gnus-gravatar hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-gravatar
~/Repos/el/gnus/lisp/nngateway hides /home/horn/Repos/el/emacs/lisp/gnus/nngateway
~/Repos/el/gnus/lisp/ietf-drums hides /home/horn/Repos/el/emacs/lisp/gnus/ietf-drums
~/Repos/el/gnus/lisp/mail-parse hides /home/horn/Repos/el/emacs/lisp/gnus/mail-parse
~/Repos/el/gnus/lisp/gnus-salt hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-salt
~/Repos/el/gnus/lisp/nnimap hides /home/horn/Repos/el/emacs/lisp/gnus/nnimap
~/Repos/el/gnus/lisp/gnus-draft hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-draft
~/Repos/el/gnus/lisp/mail-source hides /home/horn/Repos/el/emacs/lisp/gnus/mail-source
~/Repos/el/gnus/lisp/messcompat hides /home/horn/Repos/el/emacs/lisp/gnus/messcompat
~/Repos/el/gnus/lisp/pop3 hides /home/horn/Repos/el/emacs/lisp/gnus/pop3
~/Repos/el/gnus/lisp/nnmaildir hides /home/horn/Repos/el/emacs/lisp/gnus/nnmaildir
~/Repos/el/gnus/lisp/nnheader hides /home/horn/Repos/el/emacs/lisp/gnus/nnheader
~/Repos/el/gnus/lisp/gnus-cite hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-cite
~/Repos/el/gnus/lisp/rfc2104 hides /home/horn/Repos/el/emacs/lisp/gnus/rfc2104
~/Repos/el/gnus/lisp/nndiary hides /home/horn/Repos/el/emacs/lisp/gnus/nndiary
~/Repos/el/gnus/lisp/gnus-diary hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-diary
~/Repos/el/gnus/lisp/nnfolder hides /home/horn/Repos/el/emacs/lisp/gnus/nnfolder
~/Repos/el/gnus/lisp/gnus-art hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-art
~/Repos/el/gnus/lisp/gnus-demon hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-demon
~/Repos/el/gnus/lisp/mml-sec hides /home/horn/Repos/el/emacs/lisp/gnus/mml-sec
~/Repos/el/gnus/lisp/nnir hides /home/horn/Repos/el/emacs/lisp/gnus/nnir
~/Repos/el/gnus/lisp/mm-partial hides /home/horn/Repos/el/emacs/lisp/gnus/mm-partial
~/Repos/el/gnus/lisp/gnus-registry hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-registry
~/Repos/el/gnus/lisp/gnus-icalendar hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-icalendar
~/Repos/el/gnus/lisp/compface hides /home/horn/Repos/el/emacs/lisp/gnus/compface
~/Repos/el/gnus/lisp/gnus-fun hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-fun
~/Repos/el/gnus/lisp/gnus-start hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-start
~/Repos/el/gnus/lisp/smiley hides /home/horn/Repos/el/emacs/lisp/gnus/smiley
~/Repos/el/gnus/lisp/gnus-picon hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-picon
~/Repos/el/gnus/lisp/spam-report hides /home/horn/Repos/el/emacs/lisp/gnus/spam-report
~/Repos/el/gnus/lisp/nntp hides /home/horn/Repos/el/emacs/lisp/gnus/nntp
~/Repos/el/gnus/lisp/nnnil hides /home/horn/Repos/el/emacs/lisp/gnus/nnnil
~/Repos/el/gnus/lisp/nndir hides /home/horn/Repos/el/emacs/lisp/gnus/nndir
~/Repos/el/gnus/lisp/gnus-srvr hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-srvr
~/Repos/el/gnus/lisp/smime hides /home/horn/Repos/el/emacs/lisp/gnus/smime
~/Repos/el/gnus/lisp/nnvirtual hides /home/horn/Repos/el/emacs/lisp/gnus/nnvirtual
~/Repos/el/gnus/lisp/gnus-notifications hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-notifications
~/Repos/el/gnus/lisp/nnspool hides /home/horn/Repos/el/emacs/lisp/gnus/nnspool
~/Repos/el/gnus/lisp/gnus-group hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-group
~/Repos/el/gnus/lisp/gnus-bcklg hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-bcklg
~/Repos/el/gnus/lisp/gnus-util hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-util
~/Repos/el/gnus/lisp/gnus-sieve hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-sieve
~/Repos/el/gnus/lisp/nndraft hides /home/horn/Repos/el/emacs/lisp/gnus/nndraft
~/Repos/el/gnus/lisp/nnagent hides /home/horn/Repos/el/emacs/lisp/gnus/nnagent
~/Repos/el/gnus/lisp/gnus-spec hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-spec
~/Repos/el/gnus/lisp/gnus-bookmark hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-bookmark
~/Repos/el/gnus/lisp/mml1991 hides /home/horn/Repos/el/emacs/lisp/gnus/mml1991
~/Repos/el/gnus/lisp/rfc2231 hides /home/horn/Repos/el/emacs/lisp/gnus/rfc2231
~/Repos/el/gnus/lisp/yenc hides /home/horn/Repos/el/emacs/lisp/gnus/yenc
~/Repos/el/gnus/lisp/gnus-undo hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-undo
~/Repos/el/gnus/lisp/ecomplete hides /home/horn/Repos/el/emacs/lisp/gnus/ecomplete
~/Repos/el/gnus/lisp/legacy-gnus-agent hides /home/horn/Repos/el/emacs/lisp/gnus/legacy-gnus-agent
~/Repos/el/gnus/lisp/utf7 hides /home/horn/Repos/el/emacs/lisp/gnus/utf7
~/Repos/el/gnus/lisp/rtree hides /home/horn/Repos/el/emacs/lisp/gnus/rtree
~/Repos/el/gnus/lisp/gnus-uu hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-uu
~/Repos/el/gnus/lisp/gnus-ml hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-ml
~/Repos/el/gnus/lisp/sieve hides /home/horn/Repos/el/emacs/lisp/gnus/sieve
~/Repos/el/gnus/lisp/gnus hides /home/horn/Repos/el/emacs/lisp/gnus/gnus
~/Repos/el/gnus/lisp/mml hides /home/horn/Repos/el/emacs/lisp/gnus/mml
~/Repos/el/gnus/lisp/message hides /home/horn/Repos/el/emacs/lisp/gnus/message
~/Repos/el/gnus/lisp/mml-smime hides /home/horn/Repos/el/emacs/lisp/gnus/mml-smime
~/Repos/el/gnus/lisp/gnus-eform hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-eform
~/Repos/el/gnus/lisp/gnus-agent hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-agent
~/Repos/el/gnus/lisp/gnus-logic hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-logic
~/Repos/el/gnus/lisp/mm-extern hides /home/horn/Repos/el/emacs/lisp/gnus/mm-extern
~/Repos/el/gnus/lisp/nndoc hides /home/horn/Repos/el/emacs/lisp/gnus/nndoc
~/Repos/el/gnus/lisp/sieve-manage hides /home/horn/Repos/el/emacs/lisp/gnus/sieve-manage
~/Repos/el/gnus/lisp/mm-decode hides /home/horn/Repos/el/emacs/lisp/gnus/mm-decode
~/Repos/el/gnus/lisp/starttls hides /home/horn/Repos/el/emacs/lisp/gnus/starttls
~/Repos/el/gnus/lisp/gnus-dired hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-dired
~/Repos/el/gnus/lisp/nnbabyl hides /home/horn/Repos/el/emacs/lisp/gnus/nnbabyl
~/Repos/el/gnus/lisp/nnmbox hides /home/horn/Repos/el/emacs/lisp/gnus/nnmbox
~/Repos/el/gnus/lisp/gnus-win hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-win
~/Repos/el/gnus/lisp/gnus-async hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-async
~/Repos/el/gnus/lisp/mm-url hides /home/horn/Repos/el/emacs/lisp/gnus/mm-url
~/Repos/el/gnus/lisp/gnus-html hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-html
~/Repos/el/gnus/lisp/gssapi hides /home/horn/Repos/el/emacs/lisp/gnus/gssapi
~/Repos/el/gnus/lisp/mml2015 hides /home/horn/Repos/el/emacs/lisp/gnus/mml2015
~/Repos/el/gnus/lisp/nnrss hides /home/horn/Repos/el/emacs/lisp/gnus/nnrss
~/Repos/el/gnus/lisp/gnus-mh hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-mh
~/Repos/el/gnus/lisp/gnus-sum hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-sum
~/Repos/el/gnus/lisp/nnweb hides /home/horn/Repos/el/emacs/lisp/gnus/nnweb
~/Repos/el/gnus/lisp/mail-prsvr hides /home/horn/Repos/el/emacs/lisp/gnus/mail-prsvr
~/Repos/el/gnus/lisp/nnmairix hides /home/horn/Repos/el/emacs/lisp/gnus/nnmairix
~/Repos/el/gnus/lisp/plstore hides /home/horn/Repos/el/emacs/lisp/gnus/plstore
~/Repos/el/gnus/lisp/rfc2045 hides /home/horn/Repos/el/emacs/lisp/gnus/rfc2045
~/Repos/el/gnus/lisp/gnus-msg hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-msg
~/Repos/el/gnus/lisp/spam-wash hides /home/horn/Repos/el/emacs/lisp/gnus/spam-wash
~/Repos/el/gnus/lisp/gnus-score hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-score
~/Repos/el/gnus/lisp/mm-uu hides /home/horn/Repos/el/emacs/lisp/gnus/mm-uu
~/Repos/el/gnus/lisp/spam hides /home/horn/Repos/el/emacs/lisp/gnus/spam
~/Repos/el/gnus/lisp/mm-view hides /home/horn/Repos/el/emacs/lisp/gnus/mm-view
~/Repos/el/gnus/lisp/sieve-mode hides /home/horn/Repos/el/emacs/lisp/gnus/sieve-mode
~/Repos/el/gnus/lisp/html2text hides /home/horn/Repos/el/emacs/lisp/gnus/html2text
~/Repos/el/gnus/lisp/gnus-ems hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-ems
~/Repos/el/gnus/lisp/registry hides /home/horn/Repos/el/emacs/lisp/gnus/registry
~/Repos/el/gnus/lisp/auth-source hides /home/horn/Repos/el/emacs/lisp/gnus/auth-source
~/Repos/el/gnus/lisp/gravatar hides /home/horn/Repos/el/emacs/lisp/gnus/gravatar
~/Repos/el/gnus/lisp/flow-fill hides /home/horn/Repos/el/emacs/lisp/gnus/flow-fill
~/Repos/el/gnus/lisp/gmm-utils hides /home/horn/Repos/el/emacs/lisp/gnus/gmm-utils
~/Repos/el/gnus/lisp/mailcap hides /home/horn/Repos/el/emacs/lisp/gnus/mailcap
~/Repos/el/gnus/lisp/gnus-delay hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-delay
~/Repos/el/gnus/lisp/mm-bodies hides /home/horn/Repos/el/emacs/lisp/gnus/mm-bodies
~/Repos/el/gnus/lisp/mm-archive hides /home/horn/Repos/el/emacs/lisp/gnus/mm-archive
~/Repos/el/gnus/lisp/rfc1843 hides /home/horn/Repos/el/emacs/lisp/gnus/rfc1843
~/Repos/el/gnus/lisp/gnus-kill hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-kill
~/Repos/el/gnus/lisp/qp hides /home/horn/Repos/el/emacs/lisp/gnus/qp
~/Repos/el/gnus/lisp/score-mode hides /home/horn/Repos/el/emacs/lisp/gnus/score-mode
~/Repos/el/gnus/lisp/gnus-topic hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-topic
~/Repos/el/gnus/lisp/gnus-cache hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-cache
~/Repos/el/gnus/lisp/nnmail hides /home/horn/Repos/el/emacs/lisp/gnus/nnmail
~/Repos/el/gnus/lisp/gnus-vm hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-vm
~/Repos/el/gnus/lisp/gnus-sync hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-sync
~/Repos/el/gnus/lisp/nnoo hides /home/horn/Repos/el/emacs/lisp/gnus/nnoo
~/Repos/el/gnus/lisp/nnregistry hides /home/horn/Repos/el/emacs/lisp/gnus/nnregistry
~/Repos/el/gnus/lisp/gnus-dup hides /home/horn/Repos/el/emacs/lisp/gnus/gnus-dup
~/Repos/el/gnus/lisp/parse-time hides /home/horn/Repos/el/emacs/lisp/calendar/parse-time
~/Repos/el/gnus/lisp/time-date hides /home/horn/Repos/el/emacs/lisp/calendar/time-date

Features:
(shadow emacsbug tramp-cache gnus-dired autorevert filenotify
cider-macroexpansion reftex-sel reftex-ref reftex-parse reftex-toc
texmathp preview prv-emacs auto-dictionary flyspell ispell tex-buf
reftex-dcr reftex-auc reftex reftex-vars font-latex latex tex-style tex
dbus crm tex-mode latexenc filecache shr-color color shr dom subr-x
pcase hippie-exp bs mailalias smtpmail sendmail nxml-uchnm rng-xsd
xsd-regexp rng-cmpct rng-nxml rng-valid rng-loc rng-uri rng-parse
nxml-parse rng-match rng-dt rng-util rng-pttrn nxml-ns nxml-mode
nxml-outln nxml-rap nxml-util nxml-glyph nxml-enc xmltok misearch
multi-isearch xterm url-http url-gw url-auth sort smiley gnus-cite qp
mm-archive gnus-async gnus-bcklg gnus-ml mule-diag vc-git diff-mode
jka-compr hl-line nndraft nnmh rot13 utf-7 gnutls network-stream nsm
starttls nnml nnnil gnus-agent gnus-srvr gnus-score score-mode nnvirtual
gnus-cache gnus-demon nntp spam spam-stat gnus-uu yenc gnus-msg
gnus-gravatar mail-extr gravatar gnus-topic nnir gnus-registry registry
eieio-base th-private company-files company-oddmuse company-keywords
company-etags company-gtags company-dabbrev-code company-dabbrev
company-capf company-cmake company-ropemacs company-xcode company-clang
company-semantic company-eclim company-template company-css company-nxml
company-bbdb highlight-parentheses company stratego-mode greql-mode
tg-mode generic preview-latex tex-site auto-loads cider tramp-sh
cider-mode cider-repl cider-eldoc cider-interaction apropos arc-mode
archive-mode cider-doc org-table cider-test cider-stacktrace
cider-client nrepl-client queue cider-util ewoc etags clojure-mode imenu
paredit aggressive-indent names edebug epa-file epa epg rdictcc
ox-reveal ox-latex ox-icalendar ox-html ox-ascii ox-publish ox
org-element google-contacts-message google-contacts derived url-cache
google-oauth google-contacts-gnus gnus-art mm-uu mml2015 mm-view
mml-smime smime dig gnus-sum gnus-group gnus-undo gnus-start gnus-cloud
nnimap nnmail mail-source tls utf7 netrc nnoo parse-time gnus-spec
gnus-int gnus-range gnus-win gnus gnus-ems gnus-compat nnheader em-term
term ehelp esh-opt esh-ext esh-util highlight-symbol boxquote rect
ecomplete message rfc822 mml mml-sec mm-decode mm-bodies mm-encode
mail-parse rfc2231 rfc2047 rfc2045 ietf-drums mailabbrev mail-utils
gmm-utils mailheader edit-server server yasnippet help-mode disp-table
browse-kill-ring recentf tree-widget wid-edit helm-projectile helm-files
image-dired tramp tramp-compat tramp-loaddefs trampver shell dired-x
dired-aux ffap helm-tags helm-bookmark helm-adaptive helm-info helm-net
browse-url xml url url-proxy url-privacy url-expand url-methods
url-history url-cookie url-domsuf url-util url-parse auth-source
gnus-util mm-util mail-prsvr password-cache url-vars mailcap bookmark pp
helm-help helm-org org org-macro org-footnote org-pcomplete pcomplete
org-list org-faces org-entities noutline outline org-version
ob-emacs-lisp ob ob-tangle ob-ref ob-lob ob-table ob-exp org-src ob-keys
ob-comint ob-core ob-eval org-compat org-macs org-loaddefs format-spec
cal-menu calendar cal-loaddefs helm-external helm-buffers
helm-match-plugin helm-grep helm-regexp helm-plugin helm-elscreen
helm-utils dired helm-locate helm helm-source eieio byte-opt bytecomp
byte-compile cl-extra cconv eieio-core helm-config async-bytecomp async
helm-aliases projectile ibuf-ext ibuffer pkg-info find-func lisp-mnt epl
grep compile comint ansi-color ring f s ucs-normalize thingatpt
easy-mmode cl-macs iedit help-macro iedit-lib cl gv cap-words superword
subword saveplace savehist paren icomplete mb-depth
smart-mode-line-respectful-theme smart-mode-line-light-theme
rich-minority smart-mode-line mule-util dash rx edmacro kmacro
cl-loaddefs cl-lib elec-pair gnus-load tsdh-light-theme
memory-usage-autoloads advice help-fns info easymenu package epg-config
time-date tooltip eldoc electric uniquify ediff-hook vc-hooks
lisp-float-type mwheel x-win x-dnd tool-bar dnd fontset image regexp-opt
fringe tabulated-list newcomment elisp-mode lisp-mode prog-mode register
page menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock
font-lock syntax facemenu font-core frame cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese hebrew
greek romanian slovak czech european ethiopic indian cyrillic chinese
case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer nadvice
loaddefs button faces cus-face macroexp files text-properties overlay
sha1 md5 base64 format env code-pages mule custom widget
hashtable-print-readable backquote make-network-process dbusbind
gfilenotify dynamic-setting system-font-setting font-render-setting
move-toolbar gtk x-toolkit x multi-tty emacs)

Memory information:
((conses 16 905837 165784)
 (symbols 48 63620 24)
 (miscs 40 1776 13622)
 (strings 32 211846 31876)
 (string-bytes 1 6973133)
 (vectors 16 88496)
 (vector-slots 8 2137319 192540)
 (floats 8 791 758)
 (intervals 56 7040 9061)
 (buffers 976 59)
 (heap 1024 133392 9009))





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
  2014-12-16 15:21 bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files Tassilo Horn
@ 2014-12-16 16:05 ` Eli Zaretskii
  2014-12-16 16:20   ` Eli Zaretskii
  2014-12-16 19:10   ` Tassilo Horn
  2014-12-16 16:39 ` martin rudalics
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 33+ messages in thread
From: Eli Zaretskii @ 2014-12-16 16:05 UTC (permalink / raw)
  To: Tassilo Horn; +Cc: 19393

> From: Tassilo Horn <tsdh@gnu.org>
> Date: Tue, 16 Dec 2014 16:21:10 +0100
> 
>   ftp://ftp.fu-berlin.de/pub/misc/movies/database/movies.list.gz
> 
> which contains all movies known to the international movie database
> (IMDb.com).  When I open that file using "emacs -Q movies.list.gz" (or
> unzip it first) and then do M-x describe-coding-system I can see that it
> is "t -- raw-text-unix".  As a result of this, the last movie in that
> file is displayed as "\374\347 (2012) 2012".
> 
> However, according to the `file' command, the file is plain ISO-8859.

Looks like some kind of bug, although with such a large file, it's not
easy to be sure.

> I also can't force Emacs to use ISO-8859 for that or the original file.
> `C-x RET f iso-8859-15 RET' results in a query that certain characters
> cannot be encoded using latin-9, e.g., \374 and \347, and I'm expected
> to choose another encoding.

That's not how you force Emacs to use a specific encoding when
visiting a file.  You should do this instead:

  C-x RET c iso-8859-15 RET C-x C-f movies.list RET

IOW, revisit the file, forcing Emacs to decode it as ISO-8859-15.
(The same works with the original compressed file.)





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
  2014-12-16 16:05 ` Eli Zaretskii
@ 2014-12-16 16:20   ` Eli Zaretskii
  2014-12-16 19:22     ` Tassilo Horn
  2014-12-16 19:10   ` Tassilo Horn
  1 sibling, 1 reply; 33+ messages in thread
From: Eli Zaretskii @ 2014-12-16 16:20 UTC (permalink / raw)
  To: tsdh; +Cc: 19393

> Date: Tue, 16 Dec 2014 18:05:38 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 19393@debbugs.gnu.org
> 
> > From: Tassilo Horn <tsdh@gnu.org>
> > Date: Tue, 16 Dec 2014 16:21:10 +0100
> > 
> >   ftp://ftp.fu-berlin.de/pub/misc/movies/database/movies.list.gz
> > 
> > which contains all movies known to the international movie database
> > (IMDb.com).  When I open that file using "emacs -Q movies.list.gz" (or
> > unzip it first) and then do M-x describe-coding-system I can see that it
> > is "t -- raw-text-unix".  As a result of this, the last movie in that
> > file is displayed as "\374\347 (2012) 2012".
> > 
> > However, according to the `file' command, the file is plain ISO-8859.
> 
> Looks like some kind of bug, although with such a large file, it's not
> easy to be sure.

Actually, I don't think this is a bug.  There are ISO-8859-15
characters in that file that are not part of ISO-8859-1, so Emacs will
not detect that encoding unless either (a) your locale dictates that
encoding, or (b) you change the preferences to prefer ISO-8859-15.

This is so with any 8-bit encoding -- EMacs cannot easily distinguish
between them, and needs some guidance.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
  2014-12-16 15:21 bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files Tassilo Horn
  2014-12-16 16:05 ` Eli Zaretskii
@ 2014-12-16 16:39 ` martin rudalics
  2014-12-16 19:26   ` Tassilo Horn
  2014-12-16 16:56 ` Andreas Schwab
  2014-12-16 18:49 ` Wolfgang Jenkner
  3 siblings, 1 reply; 33+ messages in thread
From: martin rudalics @ 2014-12-16 16:39 UTC (permalink / raw)
  To: Tassilo Horn, 19393

 > I've dowloaded the following file
 >
 >    ftp://ftp.fu-berlin.de/pub/misc/movies/database/movies.list.gz
 >
 > which contains all movies known to the international movie database
 > (IMDb.com).  When I open that file using "emacs -Q movies.list.gz" (or
 > unzip it first) and then do M-x describe-coding-system I can see that it
 > is "t -- raw-text-unix".  As a result of this, the last movie in that
 > file is displayed as "\374\347 (2012) 2012".

I usually delegate such problems to unicad.el.

martin





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
  2014-12-16 15:21 bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files Tassilo Horn
  2014-12-16 16:05 ` Eli Zaretskii
  2014-12-16 16:39 ` martin rudalics
@ 2014-12-16 16:56 ` Andreas Schwab
  2014-12-16 18:49 ` Wolfgang Jenkner
  3 siblings, 0 replies; 33+ messages in thread
From: Andreas Schwab @ 2014-12-16 16:56 UTC (permalink / raw)
  To: Tassilo Horn; +Cc: 19393

Tassilo Horn <tsdh@gnu.org> writes:

> However, according to the `file' command, the file is plain ISO-8859.

You can't take that seriously, since file doesn't check every character
in the file.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
  2014-12-16 15:21 bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files Tassilo Horn
                   ` (2 preceding siblings ...)
  2014-12-16 16:56 ` Andreas Schwab
@ 2014-12-16 18:49 ` Wolfgang Jenkner
  2014-12-16 19:36   ` Tassilo Horn
  3 siblings, 1 reply; 33+ messages in thread
From: Wolfgang Jenkner @ 2014-12-16 18:49 UTC (permalink / raw)
  To: Tassilo Horn; +Cc: 19393

On Tue, Dec 16 2014, Tassilo Horn wrote:

> I've dowloaded the following file
>
>   ftp://ftp.fu-berlin.de/pub/misc/movies/database/movies.list.gz
>
[...]
> I also can't force Emacs to use ISO-8859 for that or the original file.
> `C-x RET f iso-8859-15 RET' results in a query that certain characters
> cannot be encoded using latin-9, e.g., \374 and \347, and I'm expected
> to choose another encoding.
>
> So `file' and `iconv' say the file is valid latin-9 but Emacs seems to
> disagree.  Who is correct?  I tend towards file/iconv but I might be
> wrong.
>
> And shouldn't it be possible to force Emacs to a certain coding system?

Perhaps revert-buffer-with-coding-system will do what you want (i.e.,

C-x <return> r l a t i n - 1  <return> y e s <return>

should show letters with diacritical marks properly, but it took about
20 minutes on my old dual-core k8 system).

In any case, some bisecting shows that the first problem is the line

Jedna žena – jedan vek (2011)				2011

It seems to be encoded in Windows-1250 [1] instead.  The IMDb website
[2] has also problems with this title (at least in Firefox, the
problematic letters seem to be missing somehow).

[1] https://en.wikipedia.org/wiki/Windows-1250
[2] http://www.imdb.com/title/tt2087826/keywords

Wolfgang





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
  2014-12-16 16:05 ` Eli Zaretskii
  2014-12-16 16:20   ` Eli Zaretskii
@ 2014-12-16 19:10   ` Tassilo Horn
  1 sibling, 0 replies; 33+ messages in thread
From: Tassilo Horn @ 2014-12-16 19:10 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 19393

Eli Zaretskii <eliz@gnu.org> writes:

>> I also can't force Emacs to use ISO-8859 for that or the original file.
>> `C-x RET f iso-8859-15 RET' results in a query that certain characters
>> cannot be encoded using latin-9, e.g., \374 and \347, and I'm expected
>> to choose another encoding.
>
> That's not how you force Emacs to use a specific encoding when
> visiting a file.  You should do this instead:
>
>   C-x RET c iso-8859-15 RET C-x C-f movies.list RET
>
> IOW, revisit the file, forcing Emacs to decode it as ISO-8859-15.
> (The same works with the original compressed file.)

Ah, indeed, that works.

Bye,
Tassilo





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
  2014-12-16 16:20   ` Eli Zaretskii
@ 2014-12-16 19:22     ` Tassilo Horn
  0 siblings, 0 replies; 33+ messages in thread
From: Tassilo Horn @ 2014-12-16 19:22 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 19393

Eli Zaretskii <eliz@gnu.org> writes:

>> > However, according to the `file' command, the file is plain ISO-8859.
>> 
>> Looks like some kind of bug, although with such a large file, it's not
>> easy to be sure.
>
> Actually, I don't think this is a bug.  There are ISO-8859-15
> characters in that file that are not part of ISO-8859-1, so Emacs will
> not detect that encoding unless either (a) your locale dictates that
> encoding,

It doesn't.

> or (b) you change the preferences to prefer ISO-8859-15.

Is there a way to prefer ISO-8859-15 over ISO-8859-1?  The manual I can
only find the command `prefer-coding-system' which doesn't seem to do
what I want.  I wan't to reorder the "priority list for automatic
detection" so that ISO-8859-15 is before ISO-8859-1 but still UTF-8 is
the very first entry (as it's dictated by my locale).

> This is so with any 8-bit encoding -- EMacs cannot easily distinguish
> between them, and needs some guidance.

Ok, I see.  And as Wolfgang said, some chars in the file are encoded
wrongly using Windows-1250.  That probably adds to the problem.

Thanks for the explanation!

Bye,
Tassilo





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
  2014-12-16 16:39 ` martin rudalics
@ 2014-12-16 19:26   ` Tassilo Horn
  0 siblings, 0 replies; 33+ messages in thread
From: Tassilo Horn @ 2014-12-16 19:26 UTC (permalink / raw)
  To: martin rudalics; +Cc: 19393

martin rudalics <rudalics@gmx.at> writes:

>> I've dowloaded the following file
>>
>>    ftp://ftp.fu-berlin.de/pub/misc/movies/database/movies.list.gz
>>
>> which contains all movies known to the international movie database
>> (IMDb.com).  When I open that file using "emacs -Q movies.list.gz" (or
>> unzip it first) and then do M-x describe-coding-system I can see that it
>> is "t -- raw-text-unix".  As a result of this, the last movie in that
>> file is displayed as "\374\347 (2012) 2012".
>
> I usually delegate such problems to unicad.el.

Indeed, when using and enabling that, the file is read as latin-9.

Thanks,
Tassilo





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
  2014-12-16 18:49 ` Wolfgang Jenkner
@ 2014-12-16 19:36   ` Tassilo Horn
  2014-12-17 14:22     ` Wolfgang Jenkner
  2014-12-17 15:12     ` Wolfgang Jenkner
  0 siblings, 2 replies; 33+ messages in thread
From: Tassilo Horn @ 2014-12-16 19:36 UTC (permalink / raw)
  To: Wolfgang Jenkner; +Cc: 19393

Wolfgang Jenkner <wjenkner@inode.at> writes:

>> And shouldn't it be possible to force Emacs to a certain coding system?
>
> Perhaps revert-buffer-with-coding-system will do what you want (i.e.,
>
> C-x <return> r l a t i n - 1  <return> y e s <return>

Yes, that's the right command and not `C-x RET f' as I've thought.

> should show letters with diacritical marks properly,

It does.

> but it took about 20 minutes on my old dual-core k8 system).

Here it took about 2 seconds and it's not that I own the first practical
quantum computer.

> In any case, some bisecting shows that the first problem is the line
>
> Jedna žena – jedan vek (2011)				2011
>
> It seems to be encoded in Windows-1250 [1] instead.

Indeed.  How did you search for it?  I guess you didn't just scroll the
file with open eye.

Bye,
Tassilo





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
  2014-12-16 19:36   ` Tassilo Horn
@ 2014-12-17 14:22     ` Wolfgang Jenkner
  2014-12-17 15:50       ` Eli Zaretskii
  2014-12-17 15:12     ` Wolfgang Jenkner
  1 sibling, 1 reply; 33+ messages in thread
From: Wolfgang Jenkner @ 2014-12-17 14:22 UTC (permalink / raw)
  To: Tassilo Horn; +Cc: 19393

On Tue, Dec 16 2014, Tassilo Horn wrote:

>> but it took about 20 minutes on my old dual-core k8 system).
>
> Here it took about 2 seconds and it's not that I own the first practical
> quantum computer.

Thanks, that's strange...

Wolfgang





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
  2014-12-16 19:36   ` Tassilo Horn
  2014-12-17 14:22     ` Wolfgang Jenkner
@ 2014-12-17 15:12     ` Wolfgang Jenkner
  2014-12-17 15:46       ` Tassilo Horn
  1 sibling, 1 reply; 33+ messages in thread
From: Wolfgang Jenkner @ 2014-12-17 15:12 UTC (permalink / raw)
  To: Tassilo Horn; +Cc: 19393

On Tue, Dec 16 2014, Tassilo Horn wrote:

>> In any case, some bisecting shows that the first problem is the line
>>
>> Jedna žena – jedan vek (2011)				2011
>>
>> It seems to be encoded in Windows-1250 [1] instead.
>
> Indeed.  How did you search for it?  I guess you didn't just scroll the
> file with open eye.

Bisecting (to base 10 ;-)

$ cp movies.list /tmp/bad && cd /tmp

Then repeat the following 5 or 6 times.

$ split -n10 bad
$ emacs -Q x*
$ cp x... bad
$ rm x*

Just look for the indication of the buffer coding system in the mode
line to find the first bad file at each step.

Wolfgang





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
  2014-12-17 15:12     ` Wolfgang Jenkner
@ 2014-12-17 15:46       ` Tassilo Horn
  0 siblings, 0 replies; 33+ messages in thread
From: Tassilo Horn @ 2014-12-17 15:46 UTC (permalink / raw)
  To: Wolfgang Jenkner; +Cc: 19393

Wolfgang Jenkner <wjenkner@inode.at> writes:

>> Indeed.  How did you search for it?  I guess you didn't just scroll the
>> file with open eye.
>
> Bisecting (to base 10 ;-)
>
> $ cp movies.list /tmp/bad && cd /tmp
>
> Then repeat the following 5 or 6 times.
>
> $ split -n10 bad
> $ emacs -Q x*
> $ cp x... bad
> $ rm x*
>
> Just look for the indication of the buffer coding system in the mode
> line to find the first bad file at each step.

Ah, I see.  I hoped for some emacs command that lets me search for
characters displayed "in red", e.g., characters displayed as ^J or \374.

Bye,
Tassilo





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
  2014-12-17 14:22     ` Wolfgang Jenkner
@ 2014-12-17 15:50       ` Eli Zaretskii
  2014-12-17 16:02         ` Wolfgang Jenkner
  0 siblings, 1 reply; 33+ messages in thread
From: Eli Zaretskii @ 2014-12-17 15:50 UTC (permalink / raw)
  To: Wolfgang Jenkner; +Cc: 19393, tsdh

> From: Wolfgang Jenkner <wjenkner@inode.at>
> Date: Wed, 17 Dec 2014 15:22:19 +0100
> Cc: 19393@debbugs.gnu.org
> 
> On Tue, Dec 16 2014, Tassilo Horn wrote:
> 
> >> but it took about 20 minutes on my old dual-core k8 system).
> >
> > Here it took about 2 seconds and it's not that I own the first practical
> > quantum computer.
> 
> Thanks, that's strange...

What is the system where you observed the 20-minute delay?  And what
version of Emacs was that?





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
  2014-12-17 15:50       ` Eli Zaretskii
@ 2014-12-17 16:02         ` Wolfgang Jenkner
  2014-12-17 17:03           ` Eli Zaretskii
  0 siblings, 1 reply; 33+ messages in thread
From: Wolfgang Jenkner @ 2014-12-17 16:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 19393, tsdh

On Wed, Dec 17 2014, Eli Zaretskii wrote:

> What is the system where you observed the 20-minute delay?  And what
> version of Emacs was that?

FreeBSD 10 on amd64, but the emacs versions I have are more than a month
old, so I'll bootstrap from a current git checkout and try again.







^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
  2014-12-17 16:02         ` Wolfgang Jenkner
@ 2014-12-17 17:03           ` Eli Zaretskii
  2014-12-18  1:47             ` Wolfgang Jenkner
  0 siblings, 1 reply; 33+ messages in thread
From: Eli Zaretskii @ 2014-12-17 17:03 UTC (permalink / raw)
  To: Wolfgang Jenkner; +Cc: 19393, tsdh

> From: Wolfgang Jenkner <wjenkner@inode.at>
> Cc: 19393@debbugs.gnu.org,  tsdh@gnu.org
> Date: Wed, 17 Dec 2014 17:02:07 +0100
> 
> On Wed, Dec 17 2014, Eli Zaretskii wrote:
> 
> > What is the system where you observed the 20-minute delay?  And what
> > version of Emacs was that?
> 
> FreeBSD 10 on amd64

That's what I thought.  AFAIK, FreeBSD systems use mmap(2) explicitly
for buffer memory allocation, and that could be slow when we need to
repeatedly reallocate buffer text and memmove the text between old and
new.

> but the emacs versions I have are more than a month old, so I'll
> bootstrap from a current git checkout and try again.

If I'm right, this won't change the result.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
  2014-12-17 17:03           ` Eli Zaretskii
@ 2014-12-18  1:47             ` Wolfgang Jenkner
  2014-12-18 16:22               ` Eli Zaretskii
  0 siblings, 1 reply; 33+ messages in thread
From: Wolfgang Jenkner @ 2014-12-18  1:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 19393

On Wed, Dec 17 2014, Eli Zaretskii wrote:

>> On Wed, Dec 17 2014, Eli Zaretskii wrote:
>> 
>> > What is the system where you observed the 20-minute delay?  And what
>> > version of Emacs was that?
>> 
>> FreeBSD 10 on amd64
>
> That's what I thought.  AFAIK, FreeBSD systems use mmap(2) explicitly
> for buffer memory allocation, and that could be slow when we need to
> repeatedly reallocate buffer text and memmove the text between old and
> new.
>
>> but the emacs versions I have are more than a month old, so I'll
>> bootstrap from a current git checkout and try again.
>
> If I'm right, this won't change the result.

You are right, of course (it took around 15 minutes system+user time).

So, I tried

--8<---------------cut here---------------start------------->8---
diff --git a/configure.ac b/configure.ac
index 010abc8..de1c5e8 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2127,7 +2127,7 @@ fi
 
 use_mmap_for_buffers=no
 case "$opsys" in
-  cygwin|mingw32|freebsd|irix6-5) use_mmap_for_buffers=yes ;;
+  cygwin|mingw32|irix6-5) use_mmap_for_buffers=yes ;;
 esac
 
 AC_FUNC_MMAP
--8<---------------cut here---------------end--------------->8---

However, this still took around 10 minutes (I tested with emacs -Q in
both cases, of course).

I give samples of the recurring sequence of syscalls (as reported by
truss) in both cases below.

Here's the current default for FreeBSD.

  Should Emacs use the GNU version of malloc?             yes
  Should Emacs use a relocating allocator for buffers?    no
  Should Emacs use mmap(2) for buffer allocation?         yes

--8<---------------cut here---------------start------------->8---
sigprocmask(SIG_BLOCK,SIGINT|SIGALRM,0x0)	 = 0 (0x0)
clock_gettime(0,{1418846146.702726599 })	 = 0 (0x0)
ktimer_settime(0x3,0x1,0x7ffffffece50,0x0,0x0,0x0) = 0 (0x0)
sigprocmask(SIG_SETMASK,0x0,SIGINT|SIGALRM)	 = 0 (0x0)
nanosleep({0.000001000 })			 = 0 (0x0)
mmap(0x0,28815360,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34579980288 (0x80d20a000)
munmap(0x8108ec000,28798976)			 = 0 (0x0)
read(9,"\t????\n"Esperan\M-ga" (2002) {("...,65536) = 65536 (0x10000)
mmap(0x0,28831744,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34608795648 (0x80ed85000)
munmap(0x80d20a000,28815360)			 = 0 (0x0)
mmap(0x0,28848128,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34637627392 (0x810904000)
munmap(0x80ed85000,28831744)			 = 0 (0x0)
mmap(0x0,28864512,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34579980288 (0x80d20a000)
munmap(0x810904000,28848128)			 = 0 (0x0)
mmap(0x0,28880896,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34608844800 (0x80ed91000)
munmap(0x80d20a000,28864512)			 = 0 (0x0)
read(9," SportsCentury" (1999) {Seabiscu"...,65536) = 65536 (0x10000)
mmap(0x0,28897280,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34637725696 (0x81091c000)
munmap(0x80ed91000,28880896)			 = 0 (0x0)
mmap(0x0,28913664,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34579980288 (0x80d20a000)
munmap(0x81091c000,28897280)			 = 0 (0x0)
mmap(0x0,28930048,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34608893952 (0x80ed9d000)
munmap(0x80d20a000,28913664)			 = 0 (0x0)
mmap(0x0,28946432,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34637824000 (0x810934000)
munmap(0x80ed9d000,28930048)			 = 0 (0x0)
read(9,"a es mi historia" (2001) {La vid"...,65536) = 65536 (0x10000)
mmap(0x0,28962816,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34579980288 (0x80d20a000)
munmap(0x810934000,28946432)			 = 0 (0x0)
mmap(0x0,28979200,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34608943104 (0x80eda9000)
munmap(0x80d20a000,28962816)			 = 0 (0x0)
mmap(0x0,28999680,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34637922304 (0x81094c000)
munmap(0x80eda9000,28979200)			 = 0 (0x0)
mmap(0x0,29016064,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34579980288 (0x80d20a000)
munmap(0x81094c000,28999680)			 = 0 (0x0)
read(9,"\t1999\n"Esti showder" (1999) {("...,65536) = 65536 (0x10000)
mmap(0x0,29032448,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34608996352 (0x80edb6000)
munmap(0x80d20a000,29016064)			 = 0 (0x0)
mmap(0x0,29048832,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34638028800 (0x810966000)
munmap(0x80edb6000,29032448)			 = 0 (0x0)
mmap(0x0,29065216,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34579980288 (0x80d20a000)
munmap(0x810966000,29048832)			 = 0 (0x0)
mmap(0x0,29081600,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34609045504 (0x80edc2000)
munmap(0x80d20a000,29065216)			 = 0 (0x0)
read(9,"en Cuba}\t\t1978\n"Estudio 1" (1"...,65536) = 65536 (0x10000)
mmap(0x0,29097984,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34638127104 (0x81097e000)
munmap(0x80edc2000,29081600)			 = 0 (0x0)
mmap(0x0,29114368,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34579980288 (0x80d20a000)
munmap(0x81097e000,29097984)			 = 0 (0x0)
mmap(0x0,29130752,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34609094656 (0x80edce000)
munmap(0x80d20a000,29114368)			 = 0 (0x0)
mmap(0x0,29147136,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34638225408 (0x810996000)
munmap(0x80edce000,29130752)			 = 0 (0x0)
read(9,"07\n"Eterna Magia" (2007) {(2007"...,65536) = 65536 (0x10000)
mmap(0x0,29163520,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34579980288 (0x80d20a000)
munmap(0x810996000,29147136)			 = 0 (0x0)
mmap(0x0,29179904,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34609143808 (0x80edda000)
munmap(0x80d20a000,29163520)			 = 0 (0x0)
mmap(0x0,29196288,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34638323712 (0x8109ae000)
munmap(0x80edda000,29179904)			 = 0 (0x0)
mmap(0x0,29212672,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34579980288 (0x80d20a000)
munmap(0x8109ae000,29196288)			 = 0 (0x0)
read(9,")}\t1991\n"Eva y Ad\M-an, agenci"...,65536) = 65536 (0x10000)
mmap(0x0,29229056,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34609192960 (0x80ede6000)
munmap(0x80d20a000,29212672)			 = 0 (0x0)
mmap(0x0,29245440,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34638422016 (0x8109c6000)
munmap(0x80ede6000,29229056)			 = 0 (0x0)
mmap(0x0,29261824,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34579980288 (0x80d20a000)
munmap(0x8109c6000,29245440)			 = 0 (0x0)
mmap(0x0,29278208,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34609242112 (0x80edf2000)
munmap(0x80d20a000,29261824)			 = 0 (0x0)
read(9,"A. (#9.4)}\t2004\n"Everybody Lov"...,65536) = 65536 (0x10000)
mmap(0x0,29294592,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34638520320 (0x8109de000)
munmap(0x80edf2000,29278208)			 = 0 (0x0)
mmap(0x0,29310976,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34579980288 (0x80d20a000)
munmap(0x8109de000,29294592)			 = 0 (0x0)
mmap(0x0,29327360,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34609291264 (0x80edfe000)
munmap(0x80d20a000,29310976)			 = 0 (0x0)
mmap(0x0,29343744,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34638618624 (0x8109f6000)
munmap(0x80edfe000,29327360)			 = 0 (0x0)
read(9,"\t\t1988\n"Everyman" (1977) {Who"...,65536) = 65536 (0x10000)
mmap(0x0,29360128,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34579980288 (0x80d20a000)
munmap(0x8109f6000,29343744)			 = 0 (0x0)
mmap(0x0,29376512,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34609340416 (0x80ee0a000)
munmap(0x80d20a000,29360128)			 = 0 (0x0)
mmap(0x0,29392896,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34638716928 (0x810a0e000)
munmap(0x80ee0a000,29376512)			 = 0 (0x0)
mmap(0x0,29409280,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34579980288 (0x80d20a000)
munmap(0x810a0e000,29392896)			 = 0 (0x0)
read(9,"xclusive" (1997) {(#1.1)}\t\t\t"...,65536) = 65536 (0x10000)
mmap(0x0,29425664,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34609389568 (0x80ee16000)
munmap(0x80d20a000,29409280)			 = 0 (0x0)
mmap(0x0,29442048,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34638815232 (0x810a26000)
munmap(0x80ee16000,29425664)			 = 0 (0x0)
mmap(0x0,29458432,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34579980288 (0x80d20a000)
munmap(0x810a26000,29442048)			 = 0 (0x0)
mmap(0x0,29474816,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34609438720 (0x80ee22000)
munmap(0x80d20a000,29458432)			 = 0 (0x0)
read(9,"\n"Explorers: Adventures of the "...,65536) = 65536 (0x10000)
mmap(0x0,29491200,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34638913536 (0x810a3e000)
munmap(0x80ee22000,29474816)			 = 0 (0x0)
mmap(0x0,29507584,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34579980288 (0x80d20a000)
munmap(0x810a3e000,29491200)			 = 0 (0x0)
mmap(0x0,29523968,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34609487872 (0x80ee2e000)
munmap(0x80d20a000,29507584)			 = 0 (0x0)
mmap(0x0,29540352,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34639011840 (0x810a56000)
munmap(0x80ee2e000,29523968)			 = 0 (0x0)
read(9,"xtra" (1994) {(2011-05-03)}\t\t"...,65536) = 65536 (0x10000)
mmap(0x0,29556736,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34579980288 (0x80d20a000)
munmap(0x810a56000,29540352)			 = 0 (0x0)
mmap(0x0,29573120,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34609537024 (0x80ee3a000)
SIGNAL 14 (SIGALRM)
sigprocmask(SIG_SETMASK,SIGINT|SIGQUIT|SIGALRM|SIGCHLD|SIGIO|SIGPROF|SIGWINCH,0x0) = 0 (0x0)
sigreturn(0x7ffffffec630,0x7ffffffec630,0x301,0x0,0xfffffffffffffbc0,0x0) = 34609537064 (0x80ee3a028)
munmap(0x80d20a000,29556736)			 = 0 (0x0)
recvmsg(0x6,0x7ffffffecb80,0x0,0x1000,0x1c30000,0x0) ERR#35 'Resource temporarily unavailable'
--8<---------------cut here---------------end--------------->8---

And here's the version with the patch above applied.

  Should Emacs use the GNU version of malloc?             yes
  Should Emacs use a relocating allocator for buffers?    yes
  Should Emacs use mmap(2) for buffer allocation?         no

--8<---------------cut here---------------start------------->8---
sigprocmask(SIG_BLOCK,SIGINT|SIGALRM,0x0)	 = 0 (0x0)
clock_gettime(0,{1418846834.087766996 })	 = 0 (0x0)
ktimer_settime(0x3,0x1,0x7ffffffece90,0x0,0x0,0xd10fe8) = 0 (0x0)
sigprocmask(SIG_SETMASK,0x0,SIGINT|SIGALRM)	 = 0 (0x0)
nanosleep({0.000001000 })			 = 0 (0x0)
read(5,"n the Family" (1971) {Archie See"...,65536) = 65536 (0x10000)
break(0xeb9a000)				 = 0 (0x0)
read(5,"en" (1970) {(#1.5899)}\t\t\t1992"...,65536) = 65536 (0x10000)
break(0xebaa000)				 = 0 (0x0)
read(5,"\t\t2008\n"All My Children" (197"...,65536) = 65536 (0x10000)
break(0xebba000)				 = 0 (0x0)
read(5,"(1998) {False Convictions (#8.15"...,65536) = 65536 (0x10000)
break(0xebca000)				 = 0 (0x0)
read(5,"la lei\M-p" (2008) {Fimmti \M-~"...,65536) = 65536 (0x10000)
break(0xebda000)				 = 0 (0x0)
read(5,"014\n"Allt f\M-vr Sverige" (2011"...,65536) = 65536 (0x10000)
SIGNAL 14 (SIGALRM)
sigreturn(0x7ffffffeca70,0x10003,0x7ffffffeca70,0x7ffffffed478,0x41e8,0xd10fe8) = 25710504 (0x1884fa8)
recvmsg(0x4,0x7ffffffecbc0,0x0,0x1000,0x41e8,0xd10fe8) ERR#35 'Resource temporarily unavailable'
--8<---------------cut here---------------end--------------->8---





^ permalink raw reply related	[flat|nested] 33+ messages in thread

* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
  2014-12-18  1:47             ` Wolfgang Jenkner
@ 2014-12-18 16:22               ` Eli Zaretskii
  2014-12-18 16:36                 ` Wolfgang Jenkner
  0 siblings, 1 reply; 33+ messages in thread
From: Eli Zaretskii @ 2014-12-18 16:22 UTC (permalink / raw)
  To: Wolfgang Jenkner; +Cc: 19393

> From: Wolfgang Jenkner <wjenkner@inode.at>
> Cc: 19393@debbugs.gnu.org
> Date: Thu, 18 Dec 2014 02:47:41 +0100
> 
> > That's what I thought.  AFAIK, FreeBSD systems use mmap(2) explicitly
> > for buffer memory allocation, and that could be slow when we need to
> > repeatedly reallocate buffer text and memmove the text between old and
> > new.
> >
> >> but the emacs versions I have are more than a month old, so I'll
> >> bootstrap from a current git checkout and try again.
> >
> > If I'm right, this won't change the result.
> 
> You are right, of course (it took around 15 minutes system+user time).
> 
> So, I tried
> 
> --8<---------------cut here---------------start------------->8---
> diff --git a/configure.ac b/configure.ac
> index 010abc8..de1c5e8 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -2127,7 +2127,7 @@ fi
>  
>  use_mmap_for_buffers=no
>  case "$opsys" in
> -  cygwin|mingw32|freebsd|irix6-5) use_mmap_for_buffers=yes ;;
> +  cygwin|mingw32|irix6-5) use_mmap_for_buffers=yes ;;
>  esac
>  
>  AC_FUNC_MMAP
> --8<---------------cut here---------------end--------------->8---
> 
> However, this still took around 10 minutes (I tested with emacs -Q in
> both cases, of course).

That's expected: when you disable mmap, Emacs uses ralloc.c, which
still has this problem.

Btw, is this with the compressed file or after decompressing it?  My
guess is the former.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
  2014-12-18 16:22               ` Eli Zaretskii
@ 2014-12-18 16:36                 ` Wolfgang Jenkner
  2014-12-18 17:34                   ` Eli Zaretskii
  0 siblings, 1 reply; 33+ messages in thread
From: Wolfgang Jenkner @ 2014-12-18 16:36 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 19393

On Thu, Dec 18 2014, Eli Zaretskii wrote:

>> -  cygwin|mingw32|freebsd|irix6-5) use_mmap_for_buffers=yes ;;
>> +  cygwin|mingw32|irix6-5) use_mmap_for_buffers=yes ;;
[...]
>> However, this still took around 10 minutes (I tested with emacs -Q in
>> both cases, of course).
>
> That's expected: when you disable mmap, Emacs uses ralloc.c, which
> still has this problem.

Shouldn't other systems for which the native malloc is not used have
a similar problem then?

> Btw, is this with the compressed file or after decompressing it?  My
> guess is the former.

No, with the uncompressed file.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
  2014-12-18 16:36                 ` Wolfgang Jenkner
@ 2014-12-18 17:34                   ` Eli Zaretskii
  2014-12-20  3:21                     ` Wolfgang Jenkner
  0 siblings, 1 reply; 33+ messages in thread
From: Eli Zaretskii @ 2014-12-18 17:34 UTC (permalink / raw)
  To: Wolfgang Jenkner; +Cc: 19393

> From: Wolfgang Jenkner <wjenkner@inode.at>
> Cc: 19393@debbugs.gnu.org
> Date: Thu, 18 Dec 2014 17:36:19 +0100
> 
> On Thu, Dec 18 2014, Eli Zaretskii wrote:
> 
> >> -  cygwin|mingw32|freebsd|irix6-5) use_mmap_for_buffers=yes ;;
> >> +  cygwin|mingw32|irix6-5) use_mmap_for_buffers=yes ;;
> [...]
> >> However, this still took around 10 minutes (I tested with emacs -Q in
> >> both cases, of course).
> >
> > That's expected: when you disable mmap, Emacs uses ralloc.c, which
> > still has this problem.
> 
> Shouldn't other systems for which the native malloc is not used have
> a similar problem then?

There are almost none of them.  But yes, those which do should have a
similar problem.

> > Btw, is this with the compressed file or after decompressing it?  My
> > guess is the former.
> 
> No, with the uncompressed file.

Then it's probably some inefficiency in insert-file-contents, when it
is called to revert a buffer.  If you have time, please take a look
what happens there, I suspect we reallocate the buffer in very small
chunks, instead of doing it with larger increments.  (With compressed
files, it's hard to do, because the size of the uncompressed file is
not known in advance.)

Thanks.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
  2014-12-18 17:34                   ` Eli Zaretskii
@ 2014-12-20  3:21                     ` Wolfgang Jenkner
  2014-12-20  7:27                       ` Eli Zaretskii
  0 siblings, 1 reply; 33+ messages in thread
From: Wolfgang Jenkner @ 2014-12-20  3:21 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 19393

On Thu, Dec 18 2014, Eli Zaretskii wrote:

> Then it's probably some inefficiency in insert-file-contents, when it
> is called to revert a buffer.  If you have time, please take a look
> what happens there, I suspect we reallocate the buffer in very small
> chunks, instead of doing it with larger increments.  (With compressed
> files, it's hard to do, because the size of the uncompressed file is
> not known in advance.)

I have been looking into this with dtrace and what is sure is that
a large amount of data (increasing up to the order of magnitude of the
buffer size) is memcpy'd again and again as a result of mmap_realloc
being called by enlarge_buffer_text.  Apparently, the latter is called
for buffer gap handling which is triggered by decode_coding_c_string (or
rather decode_coding_object) in insert-file-contents.  So it seems that
the effect on memory of this innocent-looking loop there is enormously
magnified.

But I have to look at the source more closely (not that I expect to get
any idea how to fix this, though).





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
  2014-12-20  3:21                     ` Wolfgang Jenkner
@ 2014-12-20  7:27                       ` Eli Zaretskii
  2015-01-13 14:06                         ` Wolfgang Jenkner
  0 siblings, 1 reply; 33+ messages in thread
From: Eli Zaretskii @ 2014-12-20  7:27 UTC (permalink / raw)
  To: Wolfgang Jenkner; +Cc: 19393

> From: Wolfgang Jenkner <wjenkner@inode.at>
> Cc: 19393@debbugs.gnu.org
> Date: Sat, 20 Dec 2014 04:21:54 +0100
> 
> I have been looking into this with dtrace and what is sure is that
> a large amount of data (increasing up to the order of magnitude of the
> buffer size) is memcpy'd again and again as a result of mmap_realloc
> being called by enlarge_buffer_text.  Apparently, the latter is called
> for buffer gap handling which is triggered by decode_coding_c_string (or
> rather decode_coding_object) in insert-file-contents.  So it seems that
> the effect on memory of this innocent-looking loop there is enormously
> magnified.

Yes, that'd be my guess for the reason.

> But I have to look at the source more closely (not that I expect to get
> any idea how to fix this, though).

Since we know the size of the file, we could perhaps compute the new
buffer size up front (taking some conservative approximations, if
needed), and mmap_realloc it only once.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
  2014-12-20  7:27                       ` Eli Zaretskii
@ 2015-01-13 14:06                         ` Wolfgang Jenkner
  2015-01-13 16:25                           ` Eli Zaretskii
                                             ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Wolfgang Jenkner @ 2015-01-13 14:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 19393

[-- Attachment #1: Type: text/plain, Size: 1485 bytes --]

Here's a simple change in src/buffer.c that reduces the time to six
seconds or so, but only for newer versions of FreeBSD.

It takes advantage of the MAP_EXCL flag for mmap(2), which has been
recently added[1] and is also available in 10-STABLE and 10.1-RELEASE.

In percentage of user CPU time, the hotuser script[2] from the dtrace
toolkit shows a change from

[...]
emacs-25.0.50.1`decode_coding                             537   0.1%
emacs-25.0.50.1`produce_chars                            2109   0.4%
emacs-25.0.50.1`decode_coding_charset                    2544   0.5%
libc.so.7`memcpy                                       516884  98.9%

to

[...]
libc.so.7`memcpy                                          220   4.1%
bootstrap-emacs`decode_coding                             488   9.0%
bootstrap-emacs`produce_chars                            2100  38.8%
bootstrap-emacs`decode_coding_charset                    2501  46.2%

(the second column counts sample points, of which there are 1001 per
second for each CPU core)

The numbers are for the system compiler (clang 3.4.1) with default
optimizations, though they are even a bit better for gcc 4.9.

However, if the file in question is compressed
revert-buffer-with-coding-system still takes 4 minutes (the user time
being dominated to 98% by memmove).

[1] https://svnweb.freebsd.org/base?view=revision&revision=267630
[2] https://svnweb.freebsd.org/base/stable/10/cddl/contrib/dtracetoolkit/hotuser?revision=256281&view=co


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Use MAP_EXCL mmap flag. --]
[-- Type: text/x-diff, Size: 2394 bytes --]

From b0233ff2274e554339da3c4606ff7fb5fc961e82 Mon Sep 17 00:00:00 2001
From: Wolfgang Jenkner <wjenkner@inode.at>
Date: Tue, 23 Dec 2014 01:50:10 +0100
Subject: [PATCH] Actually use mmap_enlarge for FreeBSD 10.1 or newer.

* src/buffer.c (MAP_EXCL): Make sure it is always defined.
(MMAP_ALLOCATED_P, mmap_enlarge): Use it.
This alleviates a performance problem due to excessive use of
memcpy(3). (Bug#19393)
---
 src/ChangeLog |  8 ++++++++
 src/buffer.c  | 15 ++++++++++++---
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/src/ChangeLog b/src/ChangeLog
index 252dfd3..b526e28 100644
--- a/src/ChangeLog
+++ b/src/ChangeLog
@@ -1,3 +1,11 @@
+2014-12-24  Wolfgang Jenkner  <wjenkner@inode.at>
+
+	Actually use mmap_enlarge for FreeBSD 10.1 or newer.
+	* buffer.c (MAP_EXCL): Make sure it is always defined.
+	(MMAP_ALLOCATED_P, mmap_enlarge): Use it.
+	This alleviates a performance problem due to excessive use of
+	memcpy(3). (Bug#19393)
+
 2015-01-12  Paul Eggert  <eggert@cs.ucla.edu>
 
 	Port to 32-bit MingGW --with-wide-int
diff --git a/src/buffer.c b/src/buffer.c
index d0ffe67d9..8a97f3d 100644
--- a/src/buffer.c
+++ b/src/buffer.c
@@ -4683,10 +4683,19 @@ static bool mmap_initialized_p;
 
    Default is to conservatively assume the address range is occupied by
    something else.  This can be overridden by system configuration
-   files if system-specific means to determine this exists.  */
+   files if system-specific means to determine this exists.
+
+   However, if MAP_EXCL is defined assume that it is an mmap flag
+   which, combined with MAP_FIXED, has FreeBSD semantics, viz., the
+   mapping request will fail if a mapping already exists within the
+   range (the flag was first present in release 10.1).  */
+
+#ifndef MAP_EXCL
+#define MAP_EXCL 0
+#endif
 
 #ifndef MMAP_ALLOCATED_P
-#define MMAP_ALLOCATED_P(start, end) 1
+#define MMAP_ALLOCATED_P(start, end) (!MAP_EXCL)
 #endif
 
 /* Perform necessary initializations for the use of mmap.  */
@@ -4770,7 +4779,7 @@ mmap_enlarge (struct mmap_region *r, int npages)
 	  void *p;
 
 	  p = mmap (region_end, nbytes, PROT_READ | PROT_WRITE,
-		    MAP_ANON | MAP_PRIVATE | MAP_FIXED, mmap_fd, 0);
+		    MAP_ANON | MAP_EXCL | MAP_PRIVATE | MAP_FIXED, mmap_fd, 0);
 	  if (p == MAP_FAILED)
 	    ; /* fprintf (stderr, "mmap: %s\n", emacs_strerror (errno)); */
 	  else if (p != region_end)
-- 
2.2.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
  2015-01-13 14:06                         ` Wolfgang Jenkner
@ 2015-01-13 16:25                           ` Eli Zaretskii
  2015-01-13 17:12                             ` Wolfgang Jenkner
  2015-01-14 19:41                           ` Wolfgang Jenkner
  2020-09-07 21:30                           ` Lars Ingebrigtsen
  2 siblings, 1 reply; 33+ messages in thread
From: Eli Zaretskii @ 2015-01-13 16:25 UTC (permalink / raw)
  To: Wolfgang Jenkner; +Cc: 19393

> From: Wolfgang Jenkner <wjenkner@inode.at>
> Cc: 19393@debbugs.gnu.org
> Date: Tue, 13 Jan 2015 15:06:01 +0100
> 
> However, if the file in question is compressed
> revert-buffer-with-coding-system still takes 4 minutes (the user time
> being dominated to 98% by memmove).

Is the problem with compressed files due to the fact that the size is
unknown in advance?  If so, perhaps enlarging by more than was
requested (e.g., twice as large) will alleviate the problem?

Thanks.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
  2015-01-13 16:25                           ` Eli Zaretskii
@ 2015-01-13 17:12                             ` Wolfgang Jenkner
  2015-01-13 17:31                               ` Eli Zaretskii
  0 siblings, 1 reply; 33+ messages in thread
From: Wolfgang Jenkner @ 2015-01-13 17:12 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 19393

On Tue, Jan 13 2015, Eli Zaretskii wrote:

>> From: Wolfgang Jenkner <wjenkner@inode.at>
>> Cc: 19393@debbugs.gnu.org
>> Date: Tue, 13 Jan 2015 15:06:01 +0100
>> 
>> However, if the file in question is compressed
>> revert-buffer-with-coding-system still takes 4 minutes (the user time
>> being dominated to 98% by memmove).
>
> Is the problem with compressed files due to the fact that the size is
> unknown in advance?

I only know that loading the compressed file from disk with the same
coding system conversion as above takes just a few seconds, i.e., doing
something like

C-x RET c l a t i n - 1 <return> C-x C-f m o v i e s . l i s t . g z 

is fast (enough).

> If so, perhaps enlarging by more than was
> requested (e.g., twice as large) will alleviate the problem?

IIUC, this is your previous suggestion about improving
insert-file-contents itself?








^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
  2015-01-13 17:12                             ` Wolfgang Jenkner
@ 2015-01-13 17:31                               ` Eli Zaretskii
  0 siblings, 0 replies; 33+ messages in thread
From: Eli Zaretskii @ 2015-01-13 17:31 UTC (permalink / raw)
  To: Wolfgang Jenkner; +Cc: 19393

> From: Wolfgang Jenkner <wjenkner@inode.at>
> Cc: 19393@debbugs.gnu.org
> Date: Tue, 13 Jan 2015 18:12:54 +0100
> 
> > Is the problem with compressed files due to the fact that the size is
> > unknown in advance?
> 
> I only know that loading the compressed file from disk with the same
> coding system conversion as above takes just a few seconds, i.e., doing
> something like
> 
> C-x RET c l a t i n - 1 <return> C-x C-f m o v i e s . l i s t . g z 
> 
> is fast (enough).

Then it's probably not what I had in mind.

> > If so, perhaps enlarging by more than was
> > requested (e.g., twice as large) will alleviate the problem?
> 
> IIUC, this is your previous suggestion about improving
> insert-file-contents itself?

According to what you see, it sounds like determining the encoding is
what takes the time here, for some reason triggering massive
memmove's.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
  2015-01-13 14:06                         ` Wolfgang Jenkner
  2015-01-13 16:25                           ` Eli Zaretskii
@ 2015-01-14 19:41                           ` Wolfgang Jenkner
  2015-01-15 13:38                             ` Wolfgang Jenkner
  2020-09-07 21:30                           ` Lars Ingebrigtsen
  2 siblings, 1 reply; 33+ messages in thread
From: Wolfgang Jenkner @ 2015-01-14 19:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 19393

On Tue, Jan 13 2015, Wolfgang Jenkner wrote:

> Here's a simple change in src/buffer.c that reduces the time to six
> seconds or so, but only for newer versions of FreeBSD.
>
> It takes advantage of the MAP_EXCL flag for mmap(2), which has been
> recently added[1] and is also available in 10-STABLE and 10.1-RELEASE.

There remains the problem, though, that emacs on FreeBSD also uses
gmalloc and hence, IIUC, sbrk() for memory allocation, and at this point
I'm too ignorant about almost everything involved here to be confident
that mmap()ed pages can't overlap with the process (BSS) data segment
when MAP_EXCL | MAP_FIXED is among the flags.

Without the MAP_EXCL mmap flag they definitely can overlap, as the
following test program shows when it is _statically_ linked.

Here's the output when I run it:

r0 = 0x800663000
Cannot allocate memory
r2 = 0x800662000

-- >8 --
#include <sys/types.h>
#include <unistd.h>
#include <stdio.h>
#include <sys/mman.h>
#include <errno.h>

int
main ()
{
	int n;
	void *r0, *r1, *r2;

	n = getpagesize();

	r0 = mmap(NULL, n, PROT_READ | PROT_WRITE, MAP_ANON, -1, 0);

	if (r0 == MAP_FAILED || brk(r0) != 0 || sbrk(0) != r0)
		return (1);
	
	fprintf(stderr, "r0 = %p\n", r0);

	errno = 0;
	r1 = mmap(r0 - n, n, PROT_READ | PROT_WRITE,
		  MAP_ANON | MAP_EXCL | MAP_FIXED, -1, 0);
	if (r1 == MAP_FAILED)
		perror(NULL);
	else
		fprintf(stderr, "r1 = %p\n", r1);

	errno = 0;
	r2 = mmap(r0 - n, n, PROT_READ | PROT_WRITE,
		  MAP_ANON | MAP_FIXED, -1, 0);
	if (r2 == MAP_FAILED)
		perror(NULL);
	else
		fprintf(stderr, "r2 = %p\n", r2);


	return (0);
}





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
  2015-01-14 19:41                           ` Wolfgang Jenkner
@ 2015-01-15 13:38                             ` Wolfgang Jenkner
  2015-01-15 16:08                               ` Stefan Monnier
  0 siblings, 1 reply; 33+ messages in thread
From: Wolfgang Jenkner @ 2015-01-15 13:38 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 19393

On Wed, Jan 14 2015, Wolfgang Jenkner wrote:

> There remains the problem, though, that emacs on FreeBSD also uses
> gmalloc and hence, IIUC, sbrk() for memory allocation, and at this point
> I'm too ignorant about almost everything involved here to be confident
> that mmap()ed pages can't overlap with the process (BSS) data segment
> when MAP_EXCL | MAP_FIXED is among the flags.
>
> Without the MAP_EXCL mmap flag they definitely can overlap, as the
> following test program shows when it is _statically_ linked.
>
> Here's the output when I run it:
>
> r0 = 0x800663000
> Cannot allocate memory
> r2 = 0x800662000

However, I somehow forgot that, quite contrary to my test program,
src/buffer.c would use MAP_FIXED only when trying to add some other
pages on top of an existing region, the beginning of which was mmap'd
without MAP_FIXED.  Hence the new region could only reach into the data
segment if the old one was already there.  That is, the patch doesn't
change the current situation in this regard.

So I think that the patch would be OK, after all.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
  2015-01-15 13:38                             ` Wolfgang Jenkner
@ 2015-01-15 16:08                               ` Stefan Monnier
  2015-01-15 17:00                                 ` Wolfgang Jenkner
  0 siblings, 1 reply; 33+ messages in thread
From: Stefan Monnier @ 2015-01-15 16:08 UTC (permalink / raw)
  To: Wolfgang Jenkner; +Cc: 19393

> However, I somehow forgot that, quite contrary to my test program,
> src/buffer.c would use MAP_FIXED only when trying to add some other
> pages on top of an existing region, the beginning of which was mmap'd
> without MAP_FIXED.  Hence the new region could only reach into the data
> segment if the old one was already there.  That is, the patch doesn't
> change the current situation in this regard.

> So I think that the patch would be OK, after all.

Thanks Wolfgang for looking into this.  I'm really unfamiliar with that
code, so I can't help much, but hopefully someone else will be able to
take care of your patch,


        Stefan





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
  2015-01-15 16:08                               ` Stefan Monnier
@ 2015-01-15 17:00                                 ` Wolfgang Jenkner
  0 siblings, 0 replies; 33+ messages in thread
From: Wolfgang Jenkner @ 2015-01-15 17:00 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 19393

On Thu, Jan 15 2015, Stefan Monnier wrote:

>  but hopefully someone else will be able to
> take care of your patch,

You gave me a commit bit (but I haven't been very active since then)...





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
  2015-01-13 14:06                         ` Wolfgang Jenkner
  2015-01-13 16:25                           ` Eli Zaretskii
  2015-01-14 19:41                           ` Wolfgang Jenkner
@ 2020-09-07 21:30                           ` Lars Ingebrigtsen
  2020-09-10  0:43                             ` Wolfgang Jenkner
  2 siblings, 1 reply; 33+ messages in thread
From: Lars Ingebrigtsen @ 2020-09-07 21:30 UTC (permalink / raw)
  To: Wolfgang Jenkner; +Cc: 19393

Wolfgang Jenkner <wjenkner@inode.at> writes:

> * src/buffer.c (MAP_EXCL): Make sure it is always defined.
> (MMAP_ALLOCATED_P, mmap_enlarge): Use it.
> This alleviates a performance problem due to excessive use of
> memcpy(3). (Bug#19393)

[...]

> -		    MAP_ANON | MAP_PRIVATE | MAP_FIXED, mmap_fd, 0);
> +		    MAP_ANON | MAP_EXCL | MAP_PRIVATE | MAP_FIXED, mmap_fd, 0);

This patch apparently made loading huge files on FreeBSD a lot faster,
but as far as I can tell, it was never applied.

This was five years ago, though -- Wolfgang, is this still a problem on
FreeBSD?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
  2020-09-07 21:30                           ` Lars Ingebrigtsen
@ 2020-09-10  0:43                             ` Wolfgang Jenkner
  2020-09-10 13:17                               ` Lars Ingebrigtsen
  0 siblings, 1 reply; 33+ messages in thread
From: Wolfgang Jenkner @ 2020-09-10  0:43 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 19393

Lars Ingebrigtsen <larsi@gnus.org> wrote:

> This was five years ago, though -- Wolfgang, is this still a problem on
> FreeBSD?

No, AFAICT.

For the last four years or so, FreeBSD (like other non-glibc based
systems) has been able to use its native libc malloc instead of the
bundled gmalloc (first via HYBRID_MALLOC and now thanks to pdumper).

The test case described above in this bug report now takes only a few
seconds (both with or without compression).

My patch above should be consigned to oblivion.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files
  2020-09-10  0:43                             ` Wolfgang Jenkner
@ 2020-09-10 13:17                               ` Lars Ingebrigtsen
  0 siblings, 0 replies; 33+ messages in thread
From: Lars Ingebrigtsen @ 2020-09-10 13:17 UTC (permalink / raw)
  To: Wolfgang Jenkner; +Cc: 19393

Wolfgang Jenkner <wjenkner@inode.at> writes:

> Lars Ingebrigtsen <larsi@gnus.org> wrote:
>
>> This was five years ago, though -- Wolfgang, is this still a problem on
>> FreeBSD?
>
> No, AFAICT.
>
> For the last four years or so, FreeBSD (like other non-glibc based
> systems) has been able to use its native libc malloc instead of the
> bundled gmalloc (first via HYBRID_MALLOC and now thanks to pdumper).
>
> The test case described above in this bug report now takes only a few
> seconds (both with or without compression).
>
> My patch above should be consigned to oblivion.

OK.  :-)  Closing this bug report.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2020-09-10 13:17 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-16 15:21 bug#19393: 25.0.50; Emacs cannot determine coding system of ISO-8859 encoded files Tassilo Horn
2014-12-16 16:05 ` Eli Zaretskii
2014-12-16 16:20   ` Eli Zaretskii
2014-12-16 19:22     ` Tassilo Horn
2014-12-16 19:10   ` Tassilo Horn
2014-12-16 16:39 ` martin rudalics
2014-12-16 19:26   ` Tassilo Horn
2014-12-16 16:56 ` Andreas Schwab
2014-12-16 18:49 ` Wolfgang Jenkner
2014-12-16 19:36   ` Tassilo Horn
2014-12-17 14:22     ` Wolfgang Jenkner
2014-12-17 15:50       ` Eli Zaretskii
2014-12-17 16:02         ` Wolfgang Jenkner
2014-12-17 17:03           ` Eli Zaretskii
2014-12-18  1:47             ` Wolfgang Jenkner
2014-12-18 16:22               ` Eli Zaretskii
2014-12-18 16:36                 ` Wolfgang Jenkner
2014-12-18 17:34                   ` Eli Zaretskii
2014-12-20  3:21                     ` Wolfgang Jenkner
2014-12-20  7:27                       ` Eli Zaretskii
2015-01-13 14:06                         ` Wolfgang Jenkner
2015-01-13 16:25                           ` Eli Zaretskii
2015-01-13 17:12                             ` Wolfgang Jenkner
2015-01-13 17:31                               ` Eli Zaretskii
2015-01-14 19:41                           ` Wolfgang Jenkner
2015-01-15 13:38                             ` Wolfgang Jenkner
2015-01-15 16:08                               ` Stefan Monnier
2015-01-15 17:00                                 ` Wolfgang Jenkner
2020-09-07 21:30                           ` Lars Ingebrigtsen
2020-09-10  0:43                             ` Wolfgang Jenkner
2020-09-10 13:17                               ` Lars Ingebrigtsen
2014-12-17 15:12     ` Wolfgang Jenkner
2014-12-17 15:46       ` Tassilo Horn

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).