From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Unibyte strings in Lisp data structures Date: Tue, 13 Jul 2010 17:28:50 +0300 Message-ID: <83aapvsbfh.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: lo.gmane.org X-Trace: dough.gmane.org 1279031475 29813 80.91.229.12 (13 Jul 2010 14:31:15 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Tue, 13 Jul 2010 14:31:15 +0000 (UTC) Cc: emacs-devel@gnu.org To: Kenichi Handa Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Jul 13 16:31:11 2010 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1OYgVh-0005IQ-IS for ged-emacs-devel@m.gmane.org; Tue, 13 Jul 2010 16:31:01 +0200 Original-Received: from localhost ([127.0.0.1]:48013 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OYgVg-0003db-W9 for ged-emacs-devel@m.gmane.org; Tue, 13 Jul 2010 10:31:01 -0400 Original-Received: from [140.186.70.92] (port=40457 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OYgVa-0003bT-At for emacs-devel@gnu.org; Tue, 13 Jul 2010 10:30:55 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OYgVY-0003Ov-Kj for emacs-devel@gnu.org; Tue, 13 Jul 2010 10:30:54 -0400 Original-Received: from mtaout22.012.net.il ([80.179.55.172]:48266) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OYgVY-0003Ob-Dz for emacs-devel@gnu.org; Tue, 13 Jul 2010 10:30:52 -0400 Original-Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0L5I00I002XSWK00@a-mtaout22.012.net.il> for emacs-devel@gnu.org; Tue, 13 Jul 2010 17:30:50 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([77.127.120.144]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0L5I00IOA2ZCC520@a-mtaout22.012.net.il>; Tue, 13 Jul 2010 17:30:49 +0300 (IDT) X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 (beta) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:127181 Archived-At: Take a look at jka-compr-compression-info-list: each compression method has a magic signature there, which is the 9th element of the vector describing that compression method. Now evaluate this: (multibyte-string-p (aref (car jka-compr-compression-info-list) 9)) => nil These magic signatures are unibyte strings. But why are they unibyte? What code decides that they should be unibyte, when Emacs reads jka-cmpr-hook.el? Can we rely on the fact that these strings will always be unibyte? I bumped into this while debugging a problem in rmailmm.el: saving attachments whose file names end in .gz produces a file that is gzip-compressed twice. I finally traced this to this fragment in jka-compr.el: ;; If the contents to be written out ;; are properly compressed already, ;; don't try to compress them over again. (not (and magic (equal (if (stringp start) (substring start 0 (min (length start) (length magic))) (let* ((from (or start (point-min))) (to (min (or end (point-max)) (+ from (length magic))))) (buffer-substring from to))) magic)))) This test failed, because `magic' is a unibyte string, while buffer-substring was returning a multibyte string. The fix seems to be easy: modify rmail-mime-save to make the temporary buffer it uses be a unibyte buffer. But then I started to wonder how come `magic' is a unibyte string, and can I rely on that? There is, of course, the alternative to convert both strings to unibyte and compare that. Still, I think it would be good to know how come these strings are unibyte to begin with.