unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#12925: 24.1; string-make-unibyte instead of string-as-unibyte
@ 2012-11-18 17:45 Ethan Glasser-Camp
  2012-11-19  2:27 ` Stefan Monnier
  0 siblings, 1 reply; 12+ messages in thread
From: Ethan Glasser-Camp @ 2012-11-18 17:45 UTC (permalink / raw)
  To: 12925

This bug report will be sent to the Bug-GNU-Emacs mailing list
and the GNU bug tracker at debbugs.gnu.org.  Please check that
the From: line contains a valid email address.  After a delay of up
to one day, you should receive an acknowledgement at that address.

Please write in English if possible, as the Emacs maintainers
usually do not have translators for other languages.

Please describe exactly what actions triggered the bug, and
the precise symptoms of the bug.  If you can, give a recipe
starting from `emacs -Q':

This is more of a request for information than a bug report.

Consider this code:

(let ((s (string ?\u2019))) ;; RIGHT SINGLE QUOTATION MARK
     (with-temp-buffer 
       (set-buffer-multibyte nil) 
       (insert s) 
       (buffer-string)))

This returns a string with the character ^Y. Whereas, if you switch the
insert and set-buffer-multibyte calls:

(let ((s (string ?\u2019))) ;; RIGHT SINGLE QUOTATION MARK
     (with-temp-buffer 
       (insert s) 
       (set-buffer-multibyte nil) 
       (buffer-string)))

This returns "\342\200\231" (the bytes that make up this character in
utf-8).

The first behavior is documented at the info node "(elisp)Converting
Representations" -- every character is truncated to its low 8 bits. The
second behavior is documented in the following node, "(elisp)Selecting a
Representation" -- the same bytes are left in the buffer but they are
interpreted differently.

I believe that the second behavior is easier to explain and sometimes
useful and that the first one is not. So why does it exist? Why does
inserting multibyte text into a unibyte buffer corrupt it like this?

If Emacs crashed, and you have the Emacs process in the gdb debugger,
please include the output from the following gdb commands:
    `bt full' and `xbacktrace'.
For information about debugging Emacs, please read the file
/usr/share/emacs/24.1/etc/DEBUG.


In GNU Emacs 24.1.1 (x86_64-pc-linux-gnu, GTK+ Version 2.24.12)
 of 2012-09-22 on batsu, modified by Debian
Windowing system distributor `The X.Org Foundation', version 11.0.11300000
Configured using:
 `configure '--build' 'x86_64-linux-gnu' '--build' 'x86_64-linux-gnu'
 '--prefix=/usr' '--sharedstatedir=/var/lib' '--libexecdir=/usr/lib'
 '--localstatedir=/var/lib' '--infodir=/usr/share/info'
 '--mandir=/usr/share/man' '--with-pop=yes'
 '--enable-locallisppath=/etc/emacs24:/etc/emacs:/usr/local/share/emacs/24.1/site-lisp:/usr/local/share/emacs/site-lisp:/usr/share/emacs/24.1/site-lisp:/usr/share/emacs/site-lisp'
 '--with-crt-dir=/usr/lib/x86_64-linux-gnu' '--with-x=yes'
 '--with-x-toolkit=gtk' '--with-toolkit-scroll-bars'
 'build_alias=x86_64-linux-gnu' 'CFLAGS=-g -O2 -fstack-protector
 --param=ssp-buffer-size=4 -Wformat -Werror=format-security -Wall -O2'
 'CPPFLAGS=-D_FORTIFY_SOURCE=2''

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: en_US.UTF-8
  value of $XMODIFIERS: nil
  locale-coding-system: utf-8-unix
  default enable-multibyte-characters: t

Major mode: Fundamental

Minor modes in effect:
  diff-auto-refine-mode: t
  cua-mode: t
  global-ethan-wspace-mode: t
  ethan-wspace-mode: t
  ethan-wspace-clean-many-nls-eof-mode: t
  ethan-wspace-clean-no-nl-eof-mode: t
  ethan-wspace-clean-eol-mode: t
  ethan-wspace-clean-tabs-mode: t
  shell-dirtrack-mode: t
  recentf-mode: t
  show-paren-mode: t
  global-auto-revert-mode: t
  xterm-mouse-mode: t
  global-undo-tree-mode: t
  undo-tree-mode: t
  sml-modeline-mode: t
  me-minor-mode: t
  tooltip-mode: t
  mouse-wheel-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  size-indication-mode: t
  column-number-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent input:
i k e SPC u t f - 8 , SPC w e SPC r u n SPC i n t o 
SPC p r o b l e m s SPC w i t h SPC e m a c s ' s SPC 
M I M E S-SPC r o u t i s <backspace> n e s , SPC w 
h i c h SPC f o r c e SPC b u f f e r s SPC t o SPC 
b u SPC <backspace> <backspace> e SPC u n i b y t e 
. M-q SPC <backspace> SPC n o t m u c h - b o <backspace> 
<backspace> g e t - b o d y p a r t i <backspace> - 
i n t e r n a l SPC a l r e a d y SPC M-b M-b M-b M-b 
M-b <return> <return> C-e d o e s SPC t h i s , SPC 
<backspace> <backspace> . SPC B r i n g SPC w i t h 
- - n o t <backspace> <backspace> <backspace> <backspace> 
c u r r e n t - n o t m u c h - s h o w - m e s s g 
e <backspace> <backspace> a g e SPC i n t o SPC l i 
n e . M-q <up> C-e <up> <up> <backspace> M-< S-SPC 
<down> <backspace> <C-S-return> C-_ <up> C-d C-SPC 
<down> <down> <down> <down> <down> <down> <down> <down> 
<down> M-w M-x e m <tab> a c s <tab> - r e <tab> <M-backspace> 
<M-backspace> r e <tab> <backspace> <backspace> b u 
g <tab> <tab> <M-backspace> <M-backspace> m <backspace> 
e m <tab> <backspace> <backspace> r e <tab> p o <tab> 
r <tab> <return>

Recent messages:
Auto-saving...done
Mark set [2 times]
C-?:help M-p:pad M-o:open M-c:close M-b:blank M-s:string M-f:fill M-i:incr M-n:seq
Mark set
byte-code: End of buffer [2 times]
Auto-saving...done
Saving all Org-mode buffers...
(No files need saving)
Saving all Org-mode buffers... done
Making completion list... [7 times]

Load-path shadows:
/home/ethan/.emacs.d/el-get/scratch/el-get hides /home/ethan/.emacs.d/el-get/el-get/el-get
/home/ethan/.emacs.d/el-get/el-get/.dir-locals hides /home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/.dir-locals
/home/ethan/.emacs.d/el-get/el-get/.dir-locals hides /home/ethan/.emacs.d/elhome/site-lisp/upstream/magit.git/.dir-locals
/home/ethan/.emacs.d/el-get/scratch/scratch hides ~/.emacs.d/scratch
/home/ethan/.emacs.d/el-get/el-get/el-get-install hides ~/.emacs.d/el-get-install
/home/ethan/.emacs.d/el-get/browse-kill-ring/browse-kill-ring hides /usr/share/emacs24/site-lisp/emacs-goodies-el/browse-kill-ring
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/contrib/lisp/htmlize hides /usr/share/emacs24/site-lisp/emacs-goodies-el/htmlize
/home/ethan/.emacs.d/el-get/initsplit/initsplit hides /usr/share/emacs24/site-lisp/emacs-goodies-el/initsplit
~/.emacs.d/custom hides /usr/share/emacs/24.1/lisp/custom
/home/ethan/.emacs.d/el-get/package/elpa/css-mode-1.0/css-mode hides /usr/share/emacs/24.1/lisp/textmodes/css-mode
/home/ethan/.emacs.d/el-get/rst-mode/rst hides /usr/share/emacs/24.1/lisp/textmodes/rst
/usr/share/emacs24/site-lisp/dictionaries-common/ispell hides /usr/share/emacs/24.1/lisp/textmodes/ispell
/usr/share/emacs24/site-lisp/dictionaries-common/flyspell hides /usr/share/emacs/24.1/lisp/textmodes/flyspell
/home/ethan/.emacs.d/el-get/package/elpa/ruby-mode-1.1/ruby-mode hides /usr/share/emacs/24.1/lisp/progmodes/ruby-mode
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-footnote hides /usr/share/emacs/24.1/lisp/org/org-footnote
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-publish hides /usr/share/emacs/24.1/lisp/org/org-publish
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-ascii hides /usr/share/emacs/24.1/lisp/org/org-ascii
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-ledger hides /usr/share/emacs/24.1/lisp/org/ob-ledger
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-mobile hides /usr/share/emacs/24.1/lisp/org/org-mobile
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-scheme hides /usr/share/emacs/24.1/lisp/org/ob-scheme
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-sqlite hides /usr/share/emacs/24.1/lisp/org/ob-sqlite
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-dot hides /usr/share/emacs/24.1/lisp/org/ob-dot
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-plantuml hides /usr/share/emacs/24.1/lisp/org/ob-plantuml
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-mouse hides /usr/share/emacs/24.1/lisp/org/org-mouse
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-docbook hides /usr/share/emacs/24.1/lisp/org/org-docbook
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-irc hides /usr/share/emacs/24.1/lisp/org/org-irc
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-capture hides /usr/share/emacs/24.1/lisp/org/org-capture
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-pcomplete hides /usr/share/emacs/24.1/lisp/org/org-pcomplete
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-feed hides /usr/share/emacs/24.1/lisp/org/org-feed
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-octave hides /usr/share/emacs/24.1/lisp/org/ob-octave
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-exp hides /usr/share/emacs/24.1/lisp/org/org-exp
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-html hides /usr/share/emacs/24.1/lisp/org/org-html
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-latex hides /usr/share/emacs/24.1/lisp/org/ob-latex
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-mscgen hides /usr/share/emacs/24.1/lisp/org/ob-mscgen
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-matlab hides /usr/share/emacs/24.1/lisp/org/ob-matlab
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-css hides /usr/share/emacs/24.1/lisp/org/ob-css
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-org hides /usr/share/emacs/24.1/lisp/org/ob-org
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-latex hides /usr/share/emacs/24.1/lisp/org/org-latex
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-datetree hides /usr/share/emacs/24.1/lisp/org/org-datetree
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-compat hides /usr/share/emacs/24.1/lisp/org/org-compat
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-mks hides /usr/share/emacs/24.1/lisp/org/org-mks
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-comint hides /usr/share/emacs/24.1/lisp/org/ob-comint
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-maxima hides /usr/share/emacs/24.1/lisp/org/ob-maxima
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-special-blocks hides /usr/share/emacs/24.1/lisp/org/org-special-blocks
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-wl hides /usr/share/emacs/24.1/lisp/org/org-wl
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-ocaml hides /usr/share/emacs/24.1/lisp/org/ob-ocaml
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-ruby hides /usr/share/emacs/24.1/lisp/org/ob-ruby
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-beamer hides /usr/share/emacs/24.1/lisp/org/org-beamer
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-protocol hides /usr/share/emacs/24.1/lisp/org/org-protocol
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-list hides /usr/share/emacs/24.1/lisp/org/org-list
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-bbdb hides /usr/share/emacs/24.1/lisp/org/org-bbdb
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-docview hides /usr/share/emacs/24.1/lisp/org/org-docview
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-w3m hides /usr/share/emacs/24.1/lisp/org/org-w3m
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-keys hides /usr/share/emacs/24.1/lisp/org/ob-keys
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-R hides /usr/share/emacs/24.1/lisp/org/ob-R
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-taskjuggler hides /usr/share/emacs/24.1/lisp/org/org-taskjuggler
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-awk hides /usr/share/emacs/24.1/lisp/org/ob-awk
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-entities hides /usr/share/emacs/24.1/lisp/org/org-entities
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-agenda hides /usr/share/emacs/24.1/lisp/org/org-agenda
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-table hides /usr/share/emacs/24.1/lisp/org/ob-table
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob hides /usr/share/emacs/24.1/lisp/org/ob
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-ditaa hides /usr/share/emacs/24.1/lisp/org/ob-ditaa
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-tangle hides /usr/share/emacs/24.1/lisp/org/ob-tangle
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-remember hides /usr/share/emacs/24.1/lisp/org/org-remember
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-rmail hides /usr/share/emacs/24.1/lisp/org/org-rmail
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-sql hides /usr/share/emacs/24.1/lisp/org/ob-sql
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-ref hides /usr/share/emacs/24.1/lisp/org/ob-ref
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-vm hides /usr/share/emacs/24.1/lisp/org/org-vm
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-habit hides /usr/share/emacs/24.1/lisp/org/org-habit
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-lisp hides /usr/share/emacs/24.1/lisp/org/ob-lisp
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org hides /usr/share/emacs/24.1/lisp/org/org
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-faces hides /usr/share/emacs/24.1/lisp/org/org-faces
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-inlinetask hides /usr/share/emacs/24.1/lisp/org/org-inlinetask
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-colview hides /usr/share/emacs/24.1/lisp/org/org-colview
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-sass hides /usr/share/emacs/24.1/lisp/org/ob-sass
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-id hides /usr/share/emacs/24.1/lisp/org/org-id
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-calc hides /usr/share/emacs/24.1/lisp/org/ob-calc
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-exp-blocks hides /usr/share/emacs/24.1/lisp/org/org-exp-blocks
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-gnuplot hides /usr/share/emacs/24.1/lisp/org/ob-gnuplot
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-mac-message hides /usr/share/emacs/24.1/lisp/org/org-mac-message
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-lob hides /usr/share/emacs/24.1/lisp/org/ob-lob
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-python hides /usr/share/emacs/24.1/lisp/org/ob-python
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-archive hides /usr/share/emacs/24.1/lisp/org/org-archive
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-eval hides /usr/share/emacs/24.1/lisp/org/ob-eval
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-plot hides /usr/share/emacs/24.1/lisp/org/org-plot
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-clock hides /usr/share/emacs/24.1/lisp/org/org-clock
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-timer hides /usr/share/emacs/24.1/lisp/org/org-timer
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-exp hides /usr/share/emacs/24.1/lisp/org/ob-exp
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-sh hides /usr/share/emacs/24.1/lisp/org/ob-sh
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-info hides /usr/share/emacs/24.1/lisp/org/org-info
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-attach hides /usr/share/emacs/24.1/lisp/org/org-attach
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-asymptote hides /usr/share/emacs/24.1/lisp/org/ob-asymptote
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/contrib/babel/langs/ob-fortran hides /usr/share/emacs/24.1/lisp/org/ob-fortran
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-icalendar hides /usr/share/emacs/24.1/lisp/org/org-icalendar
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-lilypond hides /usr/share/emacs/24.1/lisp/org/ob-lilypond
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-indent hides /usr/share/emacs/24.1/lisp/org/org-indent
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-mhe hides /usr/share/emacs/24.1/lisp/org/org-mhe
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-clojure hides /usr/share/emacs/24.1/lisp/org/ob-clojure
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-screen hides /usr/share/emacs/24.1/lisp/org/ob-screen
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-perl hides /usr/share/emacs/24.1/lisp/org/ob-perl
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-ctags hides /usr/share/emacs/24.1/lisp/org/org-ctags
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/contrib/lisp/org-odt hides /usr/share/emacs/24.1/lisp/org/org-odt
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-crypt hides /usr/share/emacs/24.1/lisp/org/org-crypt
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-xoxo hides /usr/share/emacs/24.1/lisp/org/org-xoxo
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-js hides /usr/share/emacs/24.1/lisp/org/ob-js
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/contrib/lisp/org-lparse hides /usr/share/emacs/24.1/lisp/org/org-lparse
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-java hides /usr/share/emacs/24.1/lisp/org/ob-java
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-src hides /usr/share/emacs/24.1/lisp/org/org-src
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-C hides /usr/share/emacs/24.1/lisp/org/ob-C
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-freemind hides /usr/share/emacs/24.1/lisp/org/org-freemind
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-macs hides /usr/share/emacs/24.1/lisp/org/org-macs
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-mew hides /usr/share/emacs/24.1/lisp/org/org-mew
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-haskell hides /usr/share/emacs/24.1/lisp/org/ob-haskell
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-gnus hides /usr/share/emacs/24.1/lisp/org/org-gnus
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/ob-emacs-lisp hides /usr/share/emacs/24.1/lisp/org/ob-emacs-lisp
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-jsinfo hides /usr/share/emacs/24.1/lisp/org/org-jsinfo
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-table hides /usr/share/emacs/24.1/lisp/org/org-table
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/contrib/lisp/org-eshell hides /usr/share/emacs/24.1/lisp/org/org-eshell
/home/ethan/.emacs.d/elhome/site-lisp/upstream/org-mode.git/lisp/org-bibtex hides /usr/share/emacs/24.1/lisp/org/org-bibtex
/home/ethan/.emacs.d/el-get/el-get/.dir-locals hides /usr/share/emacs/24.1/lisp/gnus/.dir-locals

Features:
(shadow emacsbug cua-rect hi-lock shr-color color shr browse-url
gnus-art mm-uu mml2015 epg-config gnus-sum nnoo gnus-group gnus-undo
nnmail mail-source gnus-start gnus-spec gnus-int gnus-range gnus-win
gnus gnus-ems nnheader vc-bzr conf-mode dired-aux tramp-cmds face-remap
mailalias ielm sendmail multi-isearch skeleton sh-script sort mail-extr
mule-util notmuch notmuch-message notmuch-maildir-fcc notmuch-hello
notmuch-show notmuch-print notmuch-crypto notmuch-mua rfc2368
notmuch-address notmuch-wash diff-mode coolj notmuch-query goto-addr
icalendar notmuch-tag crm notmuch-lib json message rfc822 mml mailabbrev
mail-utils gmm-utils mailheader mm-view mml-smime mml-sec smime dig
mm-decode mm-bodies mm-encode mail-parse rfc2231 rfc2047 rfc2045
ietf-drums executable image-file org-irc org-capture vc-git flyspell
ispell bibtex diary-lib diary-loaddefs org noutline outline cal-menu
calendar cal-loaddefs ffap hl-line idle-highlight css-mode-autoloads
find-file-in-project-autoloads idle-highlight-autoloads
inf-ruby-autoloads rainbow-mode-autoloads ruby-electric-autoloads
ruby-mode-autoloads log-edit pcvs-util add-log ethan-misc elide-head
cua-base info color-theme ido tramp-cache tramp-sh tramp tramp-compat
shell pcomplete format-spec tramp-loaddefs recentf tree-widget paren
autorevert xt-mouse imenu thingatpt uniquify ethan-el-get .loaddefs
twittering-mode url url-proxy url-privacy url-expand url-methods
url-history url-cookie url-util url-parse auth-source eieio assoc
gnus-util password-cache url-vars mm-util mail-prsvr mailcap xml
yasnippet undo-tree diff rst compile comint ansi-color ring newcomment
whole-line-or-region browse-kill-ring java-mode-indent-annotations iedit
rect paredit edmacro kmacro rainbow-mode windmove byte-code-cache
initsplit byte-opt warnings advice advice-preload cus-edit cus-start
cus-load wid-edit find-func el-get el-get-autoloads el-get-list-packages
el-get-notify help-mode easymenu view el-get-dependencies el-get-build
el-get-status pp el-get-recipes el-get-byte-compile el-get-methods
el-get-fossil el-get-svn el-get-pacman el-get-github-zip
el-get-github-tar el-get-http-zip el-get-http-tar el-get-hg
el-get-git-svn el-get-fink el-get-emacswiki el-get-http
el-get-emacsmirror el-get-github el-get-git el-get-elpa package
tabulated-list el-get-darcs el-get-cvs el-get-bzr el-get-brew
el-get-builtin el-get-apt-get el-get-custom el-get-core autoload
help-fns bytecomp byte-compile cconv macroexp cl dired regexp-opt
emacs-goodies-el emacs-goodies-custom emacs-goodies-loaddefs easy-mmode
time-date tooltip ediff-hook vc-hooks lisp-float-type mwheel x-win x-dnd
tool-bar dnd fontset image fringe lisp-mode register page menu-bar
rfn-eshadow timer select scroll-bar mouse jit-lock font-lock syntax
facemenu font-core frame cham georgian utf-8-lang misc-lang vietnamese
tibetan thai tai-viet lao korean japanese hebrew greek romanian slovak
czech european ethiopic indian cyrillic chinese case-table epa-hook
jka-cmpr-hook help simple abbrev minibuffer loaddefs button faces
cus-face files text-properties overlay sha1 md5 base64 format env
code-pages mule custom widget hashtable-print-readable backquote
make-network-process dbusbind dynamic-setting system-font-setting
font-render-setting move-toolbar gtk x-toolkit x multi-tty emacs)





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#12925: 24.1; string-make-unibyte instead of string-as-unibyte
  2012-11-18 17:45 bug#12925: 24.1; string-make-unibyte instead of string-as-unibyte Ethan Glasser-Camp
@ 2012-11-19  2:27 ` Stefan Monnier
  2021-06-01  7:02   ` Lars Ingebrigtsen
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan Monnier @ 2012-11-19  2:27 UTC (permalink / raw)
  To: Ethan Glasser-Camp; +Cc: 12925

> Why does inserting multibyte text into a unibyte buffer corrupt it
> like this?

Because the right thing (i.e. signaling an error) was not backward
compatible with broken code that assumed that chars can be presented
with 8bit (i.e. code written in the glory days of latin-N, koi-8, ...).

We could/should probably try to do the right thing now, since such
broken code is probably much less common.


        Stefan





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#12925: 24.1; string-make-unibyte instead of string-as-unibyte
  2012-11-19  2:27 ` Stefan Monnier
@ 2021-06-01  7:02   ` Lars Ingebrigtsen
  2021-06-01 11:56     ` Eli Zaretskii
  0 siblings, 1 reply; 12+ messages in thread
From: Lars Ingebrigtsen @ 2021-06-01  7:02 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 12925, Ethan Glasser-Camp

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> Why does inserting multibyte text into a unibyte buffer corrupt it
>> like this?
>
> Because the right thing (i.e. signaling an error) was not backward
> compatible with broken code that assumed that chars can be presented
> with 8bit (i.e. code written in the glory days of latin-N, koi-8, ...).
>
> We could/should probably try to do the right thing now, since such
> broken code is probably much less common.

(Now eight years later.)

So the suggestion is to make inserting multibyte strings into a unibyte
buffer signal an error (instead of inserting the lower byte of
characters).

Has anybody experimented with doing this and seeing whether this signals
a lot of errors in daily usage?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#12925: 24.1; string-make-unibyte instead of string-as-unibyte
  2021-06-01  7:02   ` Lars Ingebrigtsen
@ 2021-06-01 11:56     ` Eli Zaretskii
  2021-06-01 13:45       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-06-02  5:07       ` Lars Ingebrigtsen
  0 siblings, 2 replies; 12+ messages in thread
From: Eli Zaretskii @ 2021-06-01 11:56 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: monnier, 12925, ethan.glasser.camp

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Date: Tue, 01 Jun 2021 09:02:13 +0200
> Cc: 12925@debbugs.gnu.org, Ethan Glasser-Camp <ethan.glasser.camp@gmail.com>
> 
> Stefan Monnier <monnier@iro.umontreal.ca> writes:
> 
> >> Why does inserting multibyte text into a unibyte buffer corrupt it
> >> like this?
> >
> > Because the right thing (i.e. signaling an error) was not backward
> > compatible with broken code that assumed that chars can be presented
> > with 8bit (i.e. code written in the glory days of latin-N, koi-8, ...).
> >
> > We could/should probably try to do the right thing now, since such
> > broken code is probably much less common.
> 
> (Now eight years later.)
> 
> So the suggestion is to make inserting multibyte strings into a unibyte
> buffer signal an error (instead of inserting the lower byte of
> characters).
> 
> Has anybody experimented with doing this and seeing whether this signals
> a lot of errors in daily usage?

Why not make both methods do the same: insert the bytes of the
multibyte text into the unibyte buffer?

Making the buffer unibyte after insertion is a PITA, because it could
be very slow if the text in the buffer is long.  That's why people may
wish to do it the other way around: making an empty buffer unibyte is
a snap.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#12925: 24.1; string-make-unibyte instead of string-as-unibyte
  2021-06-01 11:56     ` Eli Zaretskii
@ 2021-06-01 13:45       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-06-01 14:03         ` Eli Zaretskii
  2021-06-02  5:07       ` Lars Ingebrigtsen
  1 sibling, 1 reply; 12+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-06-01 13:45 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, 12925, ethan.glasser.camp

> Why not make both methods do the same: insert the bytes of the
> multibyte text into the unibyte buffer?

AFAIK it's rather unusual to need to insert a text that's multibyte into
a buffer that's unibyte.  And in those cases, the right behavior is not
always the same (sometimes it should covert using something like
locale-coding-system, sometimes it should preserve the actual
byte-sequence used internally, sometimes it should signal an error, ...).

So I think, as much as possible, we should refrain from guessing and
rather request that the coder call `encode-coding-string` or something
like that explicitly to say what they want.

> Making the buffer unibyte after insertion is a PITA, because it could
> be very slow if the text in the buffer is long.

Agreed.  In my book `set-buffer-multibyte` should signal an error if the
buffer is not empty (yes, I know it's not going to happen, but I think
it's the direction we should be headed).


        Stefan






^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#12925: 24.1; string-make-unibyte instead of string-as-unibyte
  2021-06-01 13:45       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-06-01 14:03         ` Eli Zaretskii
  2021-06-01 14:25           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 12+ messages in thread
From: Eli Zaretskii @ 2021-06-01 14:03 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: larsi, 12925, ethan.glasser.camp

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Lars Ingebrigtsen <larsi@gnus.org>,  12925@debbugs.gnu.org,
>   ethan.glasser.camp@gmail.com
> Date: Tue, 01 Jun 2021 09:45:07 -0400
> 
> > Why not make both methods do the same: insert the bytes of the
> > multibyte text into the unibyte buffer?
> 
> AFAIK it's rather unusual to need to insert a text that's multibyte into
> a buffer that's unibyte.

Most possibly, people don't know the text is multibyte.  Or don't
care.

> And in those cases, the right behavior is not always the same
> (sometimes it should covert using something like
> locale-coding-system, sometimes it should preserve the actual
> byte-sequence used internally, sometimes it should signal an error,
> ...).

What I mean is: if we think the current behavior is broken, then what
I suggest is at least less broken (and sometimes might just be TRT).
At the very least what I suggest is reversible, whereas neither the
current behavior nor what you suggest is.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#12925: 24.1; string-make-unibyte instead of string-as-unibyte
  2021-06-01 14:03         ` Eli Zaretskii
@ 2021-06-01 14:25           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-06-01 15:26             ` Eli Zaretskii
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-06-01 14:25 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, 12925, ethan.glasser.camp

>> > Why not make both methods do the same: insert the bytes of the
>> > multibyte text into the unibyte buffer?
>> AFAIK it's rather unusual to need to insert a text that's multibyte into
>> a buffer that's unibyte.
> Most possibly, people don't know the text is multibyte.
> Or don't care.

If they don't know or don't care, then the best we can do is signal an
error to try and wake them up: they *should* know and they *should*
care, otherwise it's a bit like inserting in "any buffer you like,
I don't care".

>> And in those cases, the right behavior is not always the same
>> (sometimes it should covert using something like
>> locale-coding-system, sometimes it should preserve the actual
>> byte-sequence used internally, sometimes it should signal an error,
>> ...).
> What I mean is: if we think the current behavior is broken, then what
> I suggest is at least less broken (and sometimes might just be TRT).

I doubt it's less broken: sometimes it will be TRT, other times it will
be worse than what we have.

> At the very least what I suggest is reversible, whereas neither the
> current behavior nor what you suggest is.

My point is that we shouldn't even get into the position of having to
make such arbitrary choices: we should signal an error before we
get there.


        Stefan






^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#12925: 24.1; string-make-unibyte instead of string-as-unibyte
  2021-06-01 14:25           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-06-01 15:26             ` Eli Zaretskii
  0 siblings, 0 replies; 12+ messages in thread
From: Eli Zaretskii @ 2021-06-01 15:26 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: larsi, 12925, ethan.glasser.camp

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: larsi@gnus.org,  12925@debbugs.gnu.org,  ethan.glasser.camp@gmail.com
> Date: Tue, 01 Jun 2021 10:25:17 -0400
> 
> > What I mean is: if we think the current behavior is broken, then what
> > I suggest is at least less broken (and sometimes might just be TRT).
> 
> I doubt it's less broken: sometimes it will be TRT, other times it will
> be worse than what we have.
> 
> > At the very least what I suggest is reversible, whereas neither the
> > current behavior nor what you suggest is.
> 
> My point is that we shouldn't even get into the position of having to
> make such arbitrary choices: we should signal an error before we
> get there.

Well, then we still disagree.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#12925: 24.1; string-make-unibyte instead of string-as-unibyte
  2021-06-01 11:56     ` Eli Zaretskii
  2021-06-01 13:45       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-06-02  5:07       ` Lars Ingebrigtsen
  2021-06-02 12:07         ` Eli Zaretskii
  1 sibling, 1 reply; 12+ messages in thread
From: Lars Ingebrigtsen @ 2021-06-02  5:07 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: monnier, 12925, ethan.glasser.camp

Eli Zaretskii <eliz@gnu.org> writes:

> Why not make both methods do the same: insert the bytes of the
> multibyte text into the unibyte buffer?

I think it's still common to have raw bytes in multibyte buffers.
Inserting data from these buffers into unibyte buffers works fine.
(That's the rationale for inserting the "lower byte" in these
situations.)

So I don't think we should change this to insert the multibyte text,
because that'd break stuff.

The question is what to do when inserting multibyte characters in
unibyte buffers, and I think that's always an error (i.e., it's never
what the person who wrote the code wanted to happen).  I think we should
start off by doing a demoted warning thing, and then segue into
signalling an error at a later date.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#12925: 24.1; string-make-unibyte instead of string-as-unibyte
  2021-06-02  5:07       ` Lars Ingebrigtsen
@ 2021-06-02 12:07         ` Eli Zaretskii
  2021-06-02 13:09           ` Lars Ingebrigtsen
  0 siblings, 1 reply; 12+ messages in thread
From: Eli Zaretskii @ 2021-06-02 12:07 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: monnier, 12925, ethan.glasser.camp

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: monnier@iro.umontreal.ca,  12925@debbugs.gnu.org,
>   ethan.glasser.camp@gmail.com
> Date: Wed, 02 Jun 2021 07:07:25 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Why not make both methods do the same: insert the bytes of the
> > multibyte text into the unibyte buffer?
> 
> I think it's still common to have raw bytes in multibyte buffers.
> Inserting data from these buffers into unibyte buffers works fine.
> (That's the rationale for inserting the "lower byte" in these
> situations.)
> 
> So I don't think we should change this to insert the multibyte text,
> because that'd break stuff.

And signaling an error won't break stuff?

> The question is what to do when inserting multibyte characters in
> unibyte buffers, and I think that's always an error (i.e., it's never
> what the person who wrote the code wanted to happen).

Now I'm confused: you have just explained above that it should
continue working.  What am I missing?

Please note that I wasn't talking about inserting raw bytes, whether
they come from unibyte or multibyte buffers, I was talking about
inserting multibyte text that represents human-readable characters.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#12925: 24.1; string-make-unibyte instead of string-as-unibyte
  2021-06-02 12:07         ` Eli Zaretskii
@ 2021-06-02 13:09           ` Lars Ingebrigtsen
  2021-06-02 13:36             ` Eli Zaretskii
  0 siblings, 1 reply; 12+ messages in thread
From: Lars Ingebrigtsen @ 2021-06-02 13:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: monnier, 12925, ethan.glasser.camp

Eli Zaretskii <eliz@gnu.org> writes:

> Please note that I wasn't talking about inserting raw bytes, whether
> they come from unibyte or multibyte buffers, I was talking about
> inserting multibyte text that represents human-readable characters.

OK, then we're in violent agreement there.  I was simply pointing out
that we can't change insertion of multibyte text in the simple way you
seemed to be suggesting (i.e., just insert the bytes in the multibyte
string, because a multibyte raw character is represented by several
bytes (is it two or three? I forget)).

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#12925: 24.1; string-make-unibyte instead of string-as-unibyte
  2021-06-02 13:09           ` Lars Ingebrigtsen
@ 2021-06-02 13:36             ` Eli Zaretskii
  0 siblings, 0 replies; 12+ messages in thread
From: Eli Zaretskii @ 2021-06-02 13:36 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: monnier, 12925, ethan.glasser.camp

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: monnier@iro.umontreal.ca,  12925@debbugs.gnu.org,
>   ethan.glasser.camp@gmail.com
> Date: Wed, 02 Jun 2021 15:09:35 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Please note that I wasn't talking about inserting raw bytes, whether
> > they come from unibyte or multibyte buffers, I was talking about
> > inserting multibyte text that represents human-readable characters.
> 
> OK, then we're in violent agreement there.  I was simply pointing out
> that we can't change insertion of multibyte text in the simple way you
> seemed to be suggesting (i.e., just insert the bytes in the multibyte
> string

Yes, we need special handling of raw bytes, as usual.

> because a multibyte raw character is represented by several bytes
> (is it two or three? I forget)).

2 or 5.






^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-06-02 13:36 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-18 17:45 bug#12925: 24.1; string-make-unibyte instead of string-as-unibyte Ethan Glasser-Camp
2012-11-19  2:27 ` Stefan Monnier
2021-06-01  7:02   ` Lars Ingebrigtsen
2021-06-01 11:56     ` Eli Zaretskii
2021-06-01 13:45       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-06-01 14:03         ` Eli Zaretskii
2021-06-01 14:25           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-06-01 15:26             ` Eli Zaretskii
2021-06-02  5:07       ` Lars Ingebrigtsen
2021-06-02 12:07         ` Eli Zaretskii
2021-06-02 13:09           ` Lars Ingebrigtsen
2021-06-02 13:36             ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).