From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Jason Rumney Newsgroups: gmane.emacs.devel Subject: Crash in titdic-convert with DOS line ends Date: Tue, 05 Feb 2008 01:31:34 +0000 Message-ID: <47A7BC76.4010703@gnu.org> References: <47A3D99B.5020700@gnu.org> <47A59FD7.5010004@gmail.com> <47A64A34.5020308@gnu.org> <47A75237.8020801@gmail.com> <47A7A032.4050608@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1202175113 8385 80.91.229.12 (5 Feb 2008 01:31:53 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 5 Feb 2008 01:31:53 +0000 (UTC) Cc: emacs-devel@gnu.org, Kenichi Handa To: Andreas Schwab Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Feb 05 02:32:15 2008 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1JMCfW-0004Fc-F6 for ged-emacs-devel@m.gmane.org; Tue, 05 Feb 2008 02:32:14 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JMCf4-00008o-3G for ged-emacs-devel@m.gmane.org; Mon, 04 Feb 2008 20:31:46 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1JMCez-00006g-Em for emacs-devel@gnu.org; Mon, 04 Feb 2008 20:31:41 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1JMCex-000055-KT for emacs-devel@gnu.org; Mon, 04 Feb 2008 20:31:40 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JMCex-00004w-EF for emacs-devel@gnu.org; Mon, 04 Feb 2008 20:31:39 -0500 Original-Received: from mk-outboundfilter-4.mail.uk.tiscali.com ([212.74.114.32]) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1JMCex-0003BB-Fk for emacs-devel@gnu.org; Mon, 04 Feb 2008 20:31:39 -0500 Original-X-Trace: 315609/mk-outboundfilter-2.mail.uk.tiscali.com/F2S/$ACCEPTED/freedom2Surf-customers/83.67.23.108 X-SBRS: None X-RemoteIP: 83.67.23.108 X-IP-MAIL-FROM: jasonr@gnu.org X-IP-BHB: Once X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAACJLp0dTQxds/2dsb2JhbAAIrR4 X-IP-Direction: OUT Original-Received: from i-83-67-23-108.freedom2surf.net (HELO [127.0.0.1]) ([83.67.23.108]) by smtp.f2s.tiscali.co.uk with ESMTP/TLS/DHE-RSA-AES256-SHA; 05 Feb 2008 01:31:37 +0000 User-Agent: Thunderbird 2.0.0.9 (Windows/20071031) In-Reply-To: <47A7A032.4050608@gnu.org> X-detected-kernel: by monty-python.gnu.org: Genre and OS details not recognized. X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:88208 Archived-At: Jason Rumney wrote: > Some of the Big5 encoded files cannot be processed if they have DOS > line ends. I haven't yet figured out why. > ETZY.tit, PY-b5.tit, TONEPY.tit and ZOZY.tit have this problem, others > do not. Now that I am debugging this, ETZY.tit does not crash Emacs, while 4Corner.tit does. It appears to be a problem with any DOS line ends in a Big5 file that is inserted into a unibyte buffer, but some other condition needs to be present to trigger the crash. But the following shows that there is definitely a problem with DOS line ends in unibyte buffers ;; Evaluate the following 2 forms in *scratch*. The first converts a .tit file to DOS line ends, the second reads ;; it into a unibyte buffer as raw-text in the same way that titdic-convert does. (with-temp-buffer (let ((coding-system-for-read 'cn-big5) (coding-system-for-write 'cn-big5-dos)) (insert-file-contents (expand-file-name "CXTERM-DIC/4Corner.tit" (file-name-directory (locate-library "leim-list")))) (write-file "/tmp/test.tit"))) (set-buffer-multibyte nil) (let ((coding-system-for-read 'raw-text)) (insert-file-contents "/tmp/test.tit")) ;; If Emacs does not crash, note the ^M on the ends of some lines. When Emacs crashes, it always happens in decode_eol (several levels deep from insert-file-contents), on this line: > if (*p == '\r' && p[1] == '\n') p appears to have overrun the buffer. (gdb) print p $35 = (unsigned char *) 0x2707000
(gdb) print pbeg $39 = ( unsigned char *) 0x26f9f30 "# HANZI input table for cxterm\n# Generated from ETZY.cit by cit2tit\n# To be used by cxterm, convert me to .cit format first\n# .cit version 1\nENCODE:\tBIG5\nMULTICHOICE:\tYES\nPROMPT:\t\244\244\244\345\277 \351\244J\241i\255\312\244\321\252`\255\265\241j\n"... (gdb) print pend $40 = ( unsigned char *) 0x27043bb "a\264\303\254\341\305`\272\372\255\276\262\360\3 46\262\311`\370\332\r\nvx83\t\272\336\300]\262\360\265_\337F\327E\336\307\353\33 5\r\nvx84\t\272D\263e\304\351\305\370\341\350\277d\306|\253a\306[\311c\366\355\3 66\360\336\363\367\353\371u\325\341\325V\330\371\361q\371\312\r\nvx93\t\272u\263 O\303X\343l\276@\325\245\347R\r\nvx94\t\264\322\355k\r\nvxa\\040\t\245\312\254A\ 250\355\275\275\276h\251K\265\301\357~\321\353\323\354\363\274\320g\337\242\332\ 341\337\262\341A\342\336\346\352\357\317\340a\355\356\r\nvxa3\t\271\350\324l\r\n vxa4\t\261\276\250\366\273o\337h\326"... Some of this looks suspicious, but I don't know enough to say for sure if it is corrupt... (gdb) print *coding $41 = { id = 10, common_flags = 5376, mode = 2, spec = { iso_2022 = { flags = 106, current_invocation = {112, 51}, current_designation = {34, 32, 34, 31248}, single_shifting = 34, bol = 41 }, ccl = 0x6a, utf_16 = { bom = 106, endian = 112, surrogate = 51 }, emacs_mule_full_support = 106 }, max_charset_id = 0, safe_charsets = 0x170f4e4 "\303\277", src_multibyte = 0, dst_multibyte = 0, head_ascii = -1, produced = 42123, produced_char = 42123, consumed = 42123, consumed_char = 42123, errors = 0, error_positions = 0x22, result = CODING_RESULT_SUCCESS, src_pos = -42123, src_pos_byte = -42123, src_chars = 42123, src_bytes = 42123, src_object = 26925060, source = 0x26fa700 "---+----+----+----+----+----+----+----+\nCOMMENT | (SPACE BAR)", ' ' , "|\nCOMMENT |", ' ' , "\263\261\245\255", ' ' , "|\nCOMMENT + ", '-' ..., dst_pos = 1, dst_pos_byte = 1, dst_bytes = 2000, dst_object = 26925060, destination = 0x26f9f30 "# HANZI input table for cxterm\n# Generated from ETZY .cit by cit2tit\n# To be used by cxterm, convert me to .cit format first\n# .cit version 1\nENCODE:\tBIG5\nMULTICHOICE:\tYES\nPROMPT:\t\244\244\244\345\277\351\ 244J\241i\255\312\244\321\252`\255\265\241j\n"..., chars_at_source = 1, charbuf = 0x80ab40, charbuf_size = 16384, charbuf_used = 0, annotated = 0, carryover = "\352m\000\000\031]\000\000\226O\000\000\270}\000\000\204c\000\000 \aW\000\000\226x\000\000\000\223\000\000\300`\000\000o\226\000\000\325\203\000\0 00\032\216\000\000\306h\000\000&\207\000\000\"\000\000\000)\000\000", carryover_bytes = 0, default_char = 32, detector = 0, decoder = 0x116d3ba , encoder = 0x116d3f6 }