From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Dmitry Antipov Newsgroups: gmane.emacs.devel Subject: Huge file adventure (+patch) Date: Mon, 07 Oct 2013 13:47:26 +0400 Message-ID: <5252832E.5060804@yandex.ru> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------050405010600080202030505" X-Trace: ger.gmane.org 1381139272 4120 80.91.229.3 (7 Oct 2013 09:47:52 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 7 Oct 2013 09:47:52 +0000 (UTC) To: Emacs development discussions Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Oct 07 11:47:53 2013 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VT7Pq-00073I-U3 for ged-emacs-devel@m.gmane.org; Mon, 07 Oct 2013 11:47:51 +0200 Original-Received: from localhost ([::1]:58406 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VT7Pq-0005GM-Bl for ged-emacs-devel@m.gmane.org; Mon, 07 Oct 2013 05:47:50 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:48451) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VT7Pg-00056g-Si for emacs-devel@gnu.org; Mon, 07 Oct 2013 05:47:47 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VT7PX-0005vY-4W for emacs-devel@gnu.org; Mon, 07 Oct 2013 05:47:40 -0400 Original-Received: from forward18.mail.yandex.net ([95.108.253.143]:55882) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VT7PW-0005vL-Kd for emacs-devel@gnu.org; Mon, 07 Oct 2013 05:47:31 -0400 Original-Received: from smtp16.mail.yandex.net (smtp16.mail.yandex.net [95.108.252.16]) by forward18.mail.yandex.net (Yandex) with ESMTP id 9380B1782048 for ; Mon, 7 Oct 2013 13:47:27 +0400 (MSK) Original-Received: from smtp16.mail.yandex.net (localhost [127.0.0.1]) by smtp16.mail.yandex.net (Yandex) with ESMTP id 76A266A07E7 for ; Mon, 7 Oct 2013 13:47:27 +0400 (MSK) Original-Received: from unknown (unknown [37.139.80.10]) by smtp16.mail.yandex.net (nwsmtp/Yandex) with ESMTP id agzRU6Hyk1-lRECKKFF; Mon, 7 Oct 2013 13:47:27 +0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail; t=1381139247; bh=kHEgenmCqhuTN4rWovlqiZa96YvmdrFjNyD7q2IuHZA=; h=Message-ID:Date:From:User-Agent:MIME-Version:To:Subject: Content-Type; b=mLGlXsRvCquS/GO8h01tJcBjRn7J22v91QAjMC29NIswG9ffTKzyviTtTR+FbJciL 0jMYCGeXGaSkIZnqpN5hsnffL9LiamYaXrfFlfWC9H85MsCu79JUGmgyZ48v4tirlT jikh0WlLo8HPl/AppiMqK7bR0jdcNgUsFVSgPHsY= Authentication-Results: smtp16.mail.yandex.net; dkim=pass header.i=@yandex.ru User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.0 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] [fuzzy] X-Received-From: 95.108.253.143 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:163942 Archived-At: This is a multi-part message in MIME format. --------------050405010600080202030505 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Recently I have got installed 16GB RAM in my laptop and, of course, have tried to open 8GB ASCII text file to see what happens (now it's just 1/2 of RAM, so why not?). After opening, I did some editing, then wait for auto-save and ... got "Memory exhausted" message. Short investigation quickly shows that I need ... 24GB of RAM to handle such a file :-(. Here is why: 1) Fdo_auto_save calls Fwrite_region for the whole file, which issues e_write (.., start=1, end=8G, ...). 2) CODING_REQUIRE_ENCODING decides that it's time to do some encoding, and encode_coding_object allocates 8G destination buffer (from coding.c): 8335 else if (EQ (dst_object, Qt)) 8336 { 8337 ptrdiff_t dst_bytes = max (1, coding->src_chars); 8338 coding->dst_object = Qnil; 8339 coding->destination = xmalloc (dst_bytes); /* HERE */ 8340 coding->dst_bytes = dst_bytes; 8341 coding->dst_multibyte = 0; 8342 } 3) Finally encode_coding_object tries to create 8G Lisp string to hold the result (from coding.c): 8351 if (EQ (dst_object, Qt)) 8352 { 8353 if (BUFFERP (coding->dst_object)) 8354 coding->dst_object = Fbuffer_string (); 8355 else 8356 { 8357 coding->dst_object 8358 = make_unibyte_string ((char *) coding->destination, 8359 coding->produced); /* HERE */ 8360 xfree (coding->destination); 8361 } 8362 } So 8G for buffer text + 8G for encoding buffer + 8G for the result. Since `coding->destination' is freed immediately after creating Lisp string, I need 16G for some period of time but with short 24G peak. And, of course, there is a patch to address an issues described above. Comments are very welcome because I'm not hooked too much in coding machinery (yet). Dmitry --------------050405010600080202030505 Content-Type: text/x-patch; name="e_write_and_encode.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="e_write_and_encode.patch" === modified file 'src/coding.c' --- src/coding.c 2013-08-26 05:20:59 +0000 +++ src/coding.c 2013-10-07 09:16:14 +0000 @@ -5761,6 +5761,7 @@ coding->safe_charsets = SDATA (val); coding->default_char = XINT (CODING_ATTR_DEFAULT_CHAR (attrs)); coding->carryover_bytes = 0; + coding->raw_destination = 0; coding_type = CODING_ATTR_TYPE (attrs); if (EQ (coding_type, Qundecided)) @@ -8354,10 +8355,18 @@ coding->dst_object = Fbuffer_string (); else { - coding->dst_object - = make_unibyte_string ((char *) coding->destination, - coding->produced); - xfree (coding->destination); + /* This is used to avoid creating huge Lisp string. + NOTE: caller who set `raw_destination' is also + responsible to free `destination' buffer. */ + if (coding->raw_destination) + coding->dst_object = Qnil; + else + { + coding->dst_object + = make_unibyte_string ((char *) coding->destination, + coding->produced); + xfree (coding->destination); + } } } === modified file 'src/coding.h' --- src/coding.h 2013-08-30 12:17:44 +0000 +++ src/coding.h 2013-10-07 08:21:30 +0000 @@ -512,6 +512,10 @@ `charbuf', but at `src_object'. */ unsigned chars_at_source : 1; + /* Nonzero if the result of conversion is in `destination' + buffer rather than in `dst_object'. */ + unsigned raw_destination : 1; + /* Set to 1 if charbuf contains an annotation. */ unsigned annotated : 1; === modified file 'src/fileio.c' --- src/fileio.c 2013-09-11 05:03:23 +0000 +++ src/fileio.c 2013-10-07 09:17:51 +0000 @@ -5263,6 +5263,10 @@ return 1; } +/* Maximum number of characters that the next + function encodes per one loop iteration. */ + +enum { E_WRITE_MAX = 8 * 1024 * 1024 }; /* Write text in the range START and END into descriptor DESC, encoding them with coding system CODING. If STRING is nil, START @@ -5289,9 +5293,16 @@ coding->src_multibyte = SCHARS (string) < SBYTES (string); if (CODING_REQUIRE_ENCODING (coding)) { - encode_coding_object (coding, string, - start, string_char_to_byte (string, start), - end, string_char_to_byte (string, end), Qt); + ptrdiff_t nchars = min (end - start, E_WRITE_MAX); + + /* Avoid creating huge Lisp string in encode_coding_object. */ + if (nchars == E_WRITE_MAX) + coding->raw_destination = 1; + + encode_coding_object + (coding, string, start, string_char_to_byte (string, start), + start + nchars, string_char_to_byte (string, start + nchars), + Qt); } else { @@ -5308,8 +5319,15 @@ coding->src_multibyte = (end - start) < (end_byte - start_byte); if (CODING_REQUIRE_ENCODING (coding)) { - encode_coding_object (coding, Fcurrent_buffer (), - start, start_byte, end, end_byte, Qt); + ptrdiff_t nchars = min (end - start, E_WRITE_MAX); + + /* Likewise. */ + if (nchars == E_WRITE_MAX) + coding->raw_destination = 1; + + encode_coding_object + (coding, Fcurrent_buffer (), start, start_byte, + start + nchars, CHAR_TO_BYTE (start + nchars), Qt); } else { @@ -5330,11 +5348,19 @@ if (coding->produced > 0) { - char *buf = (STRINGP (coding->dst_object) - ? SSDATA (coding->dst_object) - : (char *) BYTE_POS_ADDR (coding->dst_pos_byte)); + char *buf = (coding->raw_destination ? (char *) coding->destination + : (STRINGP (coding->dst_object) + ? SSDATA (coding->dst_object) + : (char *) BYTE_POS_ADDR (coding->dst_pos_byte))); coding->produced -= emacs_write_sig (desc, buf, coding->produced); + if (coding->raw_destination) + { + /* We're responsible to free this, see + encode_coding_object to check why. */ + xfree (coding->destination); + coding->raw_destination = 0; + } if (coding->produced) return 0; } --------------050405010600080202030505--