unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* bulk replacement on region, buffer, file?
@ 2015-12-10  2:21 Tom Roche
  2015-12-10  3:21 ` Emanuel Berg
  2015-12-10 16:40 ` Bob Proulx
  0 siblings, 2 replies; 5+ messages in thread
From: Tom Roche @ 2015-12-10  2:21 UTC (permalink / raw)
  To: help-gnu-emacs


I would appreciate pointers to code that enables "bulk replacement" of numerous string tuples ({to-replace, replace-with}) in a single call. What I mean, why I ask:

I frequently scrape blocks of text from PDFs into Emacs text buffers. After I do so, I usually want to replace lots of strings in the buffer. E.g. (using '|' to delimit the strings),

|CO 2| -> |CO2|
|- | -> ||
|“| -> |"|
|”| -> |"|
|[weird unicodes used for bulleting]| -> |*|

which I do manually by calling `M-x replace-string` or similar interactive or regexp function. I'd prefer instead to call something that

1. could be called on a region (if selected) or buffer (if not)

2. could read from a user-editable property file of replacement tuples (like those above), similar to `abbrev_defs` but without some constraints of the latter that annoy in this usecase. E.g. (unless I'm missing something), I cannot use `abbrev` to replace the space-delimited 'CO 2' with 'CO2'.

3. would, for every {to-replace, replace-with} tuple in the file,

* if `to-replace` found, replace every instance with `replace-with`
* if `to-replace` not found, goto next tuple

Is there elisp to do this? Alternatively, pointers to non-elisp (that I could invoke on a buffer's file and then `revert-buffer`) would also be appreciated. (And, yes, I know this sounds easy to write, but I have other priorities at present and no wish to reinvent any well-working wheels.)

Apologies if this is a FAQ, but a brief websearch found nothing that looked useful.

TIA, Tom Roche <Tom_Roche@pobox.com>



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bulk replacement on region, buffer, file?
       [not found] <mailman.1767.1449714096.31583.help-gnu-emacs@gnu.org>
@ 2015-12-10  3:13 ` Pascal J. Bourguignon
  2015-12-15  4:16   ` Tom Roche
  0 siblings, 1 reply; 5+ messages in thread
From: Pascal J. Bourguignon @ 2015-12-10  3:13 UTC (permalink / raw)
  To: help-gnu-emacs

Tom Roche <Tom_Roche@pobox.com> writes:

> I would appreciate pointers to code that enables "bulk replacement" of
> numerous string tuples ({to-replace, replace-with}) in a single
> call. What I mean, why I ask:
>
> I frequently scrape blocks of text from PDFs into Emacs text
> buffers. After I do so, I usually want to replace lots of strings in
> the buffer. E.g. (using '|' to delimit the strings),
>
> |CO 2| -> |CO2|
> |- | -> ||
> |“| -> |"|
> |”| -> |"|
> |[weird unicodes used for bulleting]| -> |*|
>
> which I do manually by calling `M-x replace-string` or similar
> interactive or regexp function. I'd prefer instead to call something
> that
>
> 1. could be called on a region (if selected) or buffer (if not)

You can use functions that are not designed to work on a region,
restricting them to a narrowed region with narrow-to-region. (This is
why it is important to always use point-min and point-max, and not eg. 0
and buffer-size, because point-min and point-max take into account the
narrowing).

   (save-excursion
     (narrow-to-region start end)
     ...)



> 2. could read from a user-editable property file of replacement tuples
> (like those above), similar to `abbrev_defs` but without some
> constraints of the latter that annoy in this usecase. E.g. (unless I'm
> missing something), I cannot use `abbrev` to replace the
> space-delimited 'CO 2' with 'CO2'.

You can read lisp sexps from files with:

  (with-file "~/.your-replacements.sexp"
     (goto-char (point-min)) ; in case the file is already open.
     (read (current-buffer)))

> 3. would, for every {to-replace, replace-with} tuple in the file,
>
> * if `to-replace` found, replace every instance with `replace-with`
> * if `to-replace` not found, goto next tuple
>
> Is there elisp to do this? 

Yes.

I use:

    (progn (goto-char (point-min))
           (replace-multiple-strings
            '(("CO 2" . "CO2")
              ("- " . "")
              ("“" . "\"")
              ("”" . "\"")
              ("[weird unicodes used for bulleting]" . "*"))))
  

So wrapping all together:

    (save-excursion
     (narrow-to-region start end)
     (goto-char (point-min))
     (replace-multiple-strings
      (with-file "~/.your-replacements.sexp"
        (goto-char (point-min)) ; in case the file is already open.
        (read (current-buffer)))))


with-file and replace-multiple-strings are found in pjb-emacs.el
https://github.com/informatimago/emacs/blob/master/pjb-emacs.el


-- 
__Pascal Bourguignon__                 http://www.informatimago.com/
“The factory of the future will have only two employees, a man and a
dog. The man will be there to feed the dog. The dog will be there to
keep the man from touching the equipment.” -- Carl Bass CEO Autodesk


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bulk replacement on region, buffer, file?
  2015-12-10  2:21 Tom Roche
@ 2015-12-10  3:21 ` Emanuel Berg
  2015-12-10 16:40 ` Bob Proulx
  1 sibling, 0 replies; 5+ messages in thread
From: Emanuel Berg @ 2015-12-10  3:21 UTC (permalink / raw)
  To: help-gnu-emacs

Tom Roche <Tom_Roche@pobox.com> writes:

> I would appreciate pointers to code that enables
> "bulk replacement" of numerous string tuples
> ({to-replace, replace-with}) in a single call.
> What I mean, why I ask:
>
> I frequently scrape blocks of text from PDFs into
> Emacs text buffers. After I do so, I usually want to
> replace lots of strings in the buffer. E.g. (using
> '|' to delimit the strings),
>
> |CO 2| -> |CO2|
> |- | -> ||
> |“| -> |"|
> |”| -> |"|
> |[weird unicodes used for bulleting]| -> |*|

I hear you - everything is fair in the struggle against
those goofy chars! Down with unicode!

(Except: putting them as a quote when they aren't!)

Aaanyway...

Probably best way is to use set functions - another
good way tho is recursion. And I'm not just saying
that...

(defun replace-strings (tuple-list)
  (when tuple-list
      (let*((tuple          (car  tuple-list))
            (rest           (cdr  tuple-list))
            (replace-match  (car  tuple))
            (replace-string (cadr tuple)) )
        (goto-char (point-min))
        (while (re-search-forward replace-match (point-max) t) ; NOERROR
          (replace-match replace-string) )
        (replace-strings rest) )))

;; Eval this to fix the below typos:

(replace-strings '(("Robb Hall"     "Rob Hall")
                   ("Scott Ficsher" "Scott Fischer") ))

;; Robb Hall
;;
;; Scott Ficsher
;;
;; Robb Hall
;;
;; Scott Ficsher

-- 
underground experts united
http://user.it.uu.se/~embe8573




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bulk replacement on region, buffer, file?
  2015-12-10  2:21 Tom Roche
  2015-12-10  3:21 ` Emanuel Berg
@ 2015-12-10 16:40 ` Bob Proulx
  1 sibling, 0 replies; 5+ messages in thread
From: Bob Proulx @ 2015-12-10 16:40 UTC (permalink / raw)
  To: help-gnu-emacs

Tom Roche wrote:
> I would appreciate pointers to code that enables "bulk replacement"
> of numerous string tuples ({to-replace, replace-with}) in a single
> call. What I mean, why I ask:

To handle the UTF-8 translations I like 'iconv'.  It handles many
different types of transliterations.

  $ echo '“foo”' | iconv -f UTF-8 -t ASCII//TRANSLIT
  "foo"

If it were me I would do a first pass using iconv to transliterate
characters in the first pass and then perform the other replacements
you want in a second pass.

Bob



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bulk replacement on region, buffer, file?
  2015-12-10  3:13 ` bulk replacement on region, buffer, file? Pascal J. Bourguignon
@ 2015-12-15  4:16   ` Tom Roche
  0 siblings, 0 replies; 5+ messages in thread
From: Tom Roche @ 2015-12-15  4:16 UTC (permalink / raw)
  To: help-gnu-emacs


summary: elisp newbie needs help fixing code @ https://bitbucket.org/tlroche/elisp_bulk_replacement

details:

Apologies for the delay in replying:

Tom Roche[1]
>>> I would appreciate pointers to code that enables "bulk replacement" of numerous string tuples ({to-replace, replace-with}) in a single call[, such that it]

>>> 1. could be called on a region (if selected) or buffer (if not)

>>> 2. could read from a user-editable property file of replacement tuples [...]

>>> 3. would, for every {to-replace, replace-with} tuple in the file,
>>> * if `to-replace` found, replace every instance with `replace-with`
>>> * if `to-replace` not found, goto next tuple

Pascal J. Bourguignon[2]
>> wrapping all together:

>>   (save-excursion
>>    (narrow-to-region start end)
>>    (goto-char (point-min))
>>    (replace-multiple-strings
>>     (with-file "~/.your-replacements.sexp"
>>       (goto-char (point-min)) ; in case the file is already open.
>>       (read (current-buffer)))))

>> with-file and replace-multiple-strings are found in pjb-emacs.el[6]

I've got 3 buffers open (among many others :-), with

1. one buffer on file[3] containing some {to-replace, replace-with} tuples as sexp's, open locally @ filepath=`$HOME/.emacs.d/tlr_bulk_replacements.sexp`

2. another buffer containing text to be bulk-replaced (interspersed with other text). A sample from that buffer ("suitable for testing") is @ [4]

3. yet another buffer[5] containing

* the relevant bits of pjb-emacs.el[6]
* the path to the sexp's file as `BULK-REPLACE-TUPLES-FILEPATH`
* my attempt to transcribe the desired 'wrapping all together' function

However, when I run `M-x bulk-replace-current-buffer-with-tuples-from-file` (defined in the code) in the sample-text buffer[4], I get the error

*Messages*
> save-excursion: Symbol's value as variable is void: start

So how to make the code[5] set `start` and `end` appropriately, such that

* if function is called with a region set, `start`==region start && `end`==region end
* if function is called without a region set, `start`==buffer start && `end`==buffer end

? Your assistance is appreciated, Tom Roche <Tom_Roche@pobox.com>

[1]: http://lists.gnu.org/archive/html/help-gnu-emacs/2015-12/msg00077.html
[2]: http://lists.gnu.org/archive/html/help-gnu-emacs/2015-12/msg00079.html
[3]: https://bitbucket.org/tlroche/elisp_bulk_replacement/src/HEAD/sample_replacements.sexp
[4]: https://bitbucket.org/tlroche/elisp_bulk_replacement/src/HEAD/sample_input.txt
[5]: https://bitbucket.org/tlroche/elisp_bulk_replacement/src/HEAD/test_code.el
[6]: https://github.com/informatimago/emacs/blob/master/pjb-emacs.el



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-12-15  4:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <mailman.1767.1449714096.31583.help-gnu-emacs@gnu.org>
2015-12-10  3:13 ` bulk replacement on region, buffer, file? Pascal J. Bourguignon
2015-12-15  4:16   ` Tom Roche
2015-12-10  2:21 Tom Roche
2015-12-10  3:21 ` Emanuel Berg
2015-12-10 16:40 ` Bob Proulx

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).