From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Eric Abrahamsen <eric@ericabrahamsen.net>
Newsgroups: gmane.emacs.help
Subject: Re: search across linebreaks
Date: Mon, 18 Feb 2013 11:52:58 +0800
Message-ID: <87vc9q9t2d.fsf@ericabrahamsen.net>
References: <878v6nbd1i.fsf@ericabrahamsen.net>
	<D2FA74E3555F429E9E79990ED7891A16@us.oracle.com>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain
X-Trace: ger.gmane.org 1361159287 5902 80.91.229.3 (18 Feb 2013 03:48:07 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Mon, 18 Feb 2013 03:48:07 +0000 (UTC)
To: help-gnu-emacs@gnu.org
Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Mon Feb 18 04:48:29 2013
Return-path: <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>
Envelope-to: geh-help-gnu-emacs@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>)
	id 1U7HiP-0004RR-3n
	for geh-help-gnu-emacs@m.gmane.org; Mon, 18 Feb 2013 04:48:29 +0100
Original-Received: from localhost ([::1]:50345 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>)
	id 1U7Hi4-0004px-TT
	for geh-help-gnu-emacs@m.gmane.org; Sun, 17 Feb 2013 22:48:08 -0500
Original-Received: from eggs.gnu.org ([208.118.235.92]:46763)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <geh-help-gnu-emacs@m.gmane.org>) id 1U7Hhw-0004pd-Sg
	for help-gnu-emacs@gnu.org; Sun, 17 Feb 2013 22:48:04 -0500
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <geh-help-gnu-emacs@m.gmane.org>) id 1U7Hht-0004X1-WF
	for help-gnu-emacs@gnu.org; Sun, 17 Feb 2013 22:48:00 -0500
Original-Received: from plane.gmane.org ([80.91.229.3]:38602)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <geh-help-gnu-emacs@m.gmane.org>) id 1U7Hht-0004Wg-P0
	for help-gnu-emacs@gnu.org; Sun, 17 Feb 2013 22:47:57 -0500
Original-Received: from list by plane.gmane.org with local (Exim 4.69)
	(envelope-from <geh-help-gnu-emacs@m.gmane.org>) id 1U7HiB-0004MZ-Cu
	for help-gnu-emacs@gnu.org; Mon, 18 Feb 2013 04:48:15 +0100
Original-Received: from 114.250.105.255 ([114.250.105.255])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <help-gnu-emacs@gnu.org>; Mon, 18 Feb 2013 04:48:15 +0100
Original-Received: from eric by 114.250.105.255 with local (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <help-gnu-emacs@gnu.org>; Mon, 18 Feb 2013 04:48:15 +0100
X-Injected-Via-Gmane: http://gmane.org/
Original-Lines: 49
Original-X-Complaints-To: usenet@ger.gmane.org
X-Gmane-NNTP-Posting-Host: 114.250.105.255
User-Agent: Gnus/5.130006 (Ma Gnus v0.6) Emacs/24.2 (gnu/linux)
Cancel-Lock: sha1:hWF9sNZIYKnAoBiviZo+32NTZ8s=
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
	recognized.
X-Received-From: 80.91.229.3
X-BeenThere: help-gnu-emacs@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Users list for the GNU Emacs text editor <help-gnu-emacs.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/help-gnu-emacs>
List-Post: <mailto:help-gnu-emacs@gnu.org>
List-Help: <mailto:help-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=subscribe>
Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.help:89137
Archived-At: <http://permalink.gmane.org/gmane.emacs.help/89137>

"Drew Adams" <drew.adams@oracle.com> writes:

>> I'm going to need to do a large scale search-and-replace on a 
>> series of text files, using a sort of dictionary or hash-table of 
>> search terms and their replacement. The text files are filled
>> to the usual fill column.  The search terms may be broken across
>> linebreaks, and I'm not sure of the best way to handle this.
>> If it was regular English words I could probably manage a
>> programmatic version of `isearch-toggle-word', but in
>> this case these are solid strings, and might be broken anywhere.
>> 
>> The two solutions I can think of are: 1) break up the characters
>> in the search string and insert "\n?" between each one to create
>> regexps to search on, and 2) unfill the whole file at the start
>> of the procedure and then refill it afterwards. Neither of these
>> seems like a great idea -- does anyone have any brighter ideas?
>
> What's not clear is whether any of the newline chars are significant.  From what
> you wrote I'm guessing no: they can all be ignored or just removed.  But in that
> case, filling would mean filling one big paragraph.
>
> Or perhaps consecutive newlines (\n\n) are significant, separating paragraphs?
> In that case, you could remove all newlines except one for each consecutive
> group (i.e., paragraph separation).
>
> Assuming no newlines are significant (or only one of consecutive ones is), the
> two solutions you propose sound reasonable to me.  Which of them to use might
> depend on size etc. - relative time to remove newlines and later refill vs the
> \n? regexp match time.

Thanks to all! Sed is something I've considered learning, but given its
learning curve, and the time I've already put into elisp, (and the fact
that I'm not even "supposed" to be a programmer in the first place!)
I'll probably go with an in-emacs solution.

For the unfill solution, I was thinking of actually running through the
file with fill-paragraph and a giant fill-column value, rather than just
deleting newlines, but I'm hesitating. These are org-mode files, and
fill-paragraph ought not to wreck them, but still...

This is a one-time bulk operation -- I'm translating a bunch of key
terms -- so the expense of the operation isn't that big a deal. The
consecutive newline question is a good one: definitely only one in a
row, but then there's potential indentation whitespace on the left...
I'll think I'll give this one a shot for now.

Thanks for the food for thought,

E