search and replace codepoints

unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed

* search and replace codepoints
@ 2014-10-24 18:50 Haines Brown
  2014-10-24 19:33 ` Eli Zaretskii
       [not found] ` <mailman.11972.1414179242.1147.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 5+ messages in thread
From: Haines Brown @ 2014-10-24 18:50 UTC (permalink / raw)
  To: help-gnu-emacs

I have frequently pasted hyphenated material into a large .bib file
which sometimes turns out to be a codepoint that LaTeX can't compile.

The typed or pasted hyphen that does not cause a problem looks like
this:

  character: - (displayed as -) (codepoint 45, #o55, #x2d)
  preferred charset: ascii (ASCII (ISO646 IRV))
  code point in charset: 0x2D
  category: .:Base, a:ASCII, l:Latin, r:Roman
  buffer code: #x2D
  file code: #x2D (encoded by coding system utf-8-unix)

The pasted hyphen that LaTeX can't compile looks like this:

  character:  (displayed as ) (codepoint 173, #o255, #xad)
  preferred charset: unicode (Unicode (ISO10646))
  code point in charset: 0xAD
  category: b:Arabic, h:Korean, j:Japanese, l:Latin
  buffer code: #xC2 #xAD
  file code: #xC2 #xAD (encoded by coding system utf-8-unix)

How do I do a search/replace to replace instances of the latter with the
former? What values should I use? Why is the unicode character not
identified with the usual U+...?

Haines Brown

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: search and replace codepoints
  2014-10-24 18:50 search and replace codepoints Haines Brown
@ 2014-10-24 19:33 ` Eli Zaretskii
       [not found] ` <mailman.11972.1414179242.1147.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 5+ messages in thread
From: Eli Zaretskii @ 2014-10-24 19:33 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Haines Brown <haines@histomat.net>
> Date: Fri, 24 Oct 2014 14:50:23 -0400
> 
> The pasted hyphen that LaTeX can't compile looks like this:
> 
>   character:  (displayed as ) (codepoint 173, #o255, #xad)
>   preferred charset: unicode (Unicode (ISO10646))
>   code point in charset: 0xAD
>   category: b:Arabic, h:Korean, j:Japanese, l:Latin
>   buffer code: #xC2 #xAD
>   file code: #xC2 #xAD (encoded by coding system utf-8-unix)
> 
> How do I do a search/replace to replace instances of the latter with the
> former?

Just replace it with M-%, as you would any other character.

> What values should I use?

The one shown above, of course.

> Why is the unicode character not identified with the usual U+...?

The codepoint #xad _is_ the Unicode codepoint, u+00AD.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: search and replace codepoints
       [not found] ` <mailman.11972.1414179242.1147.help-gnu-emacs@gnu.org>
@ 2014-10-24 20:15   ` Haines Brown
  2014-10-24 21:11     ` Álvar Ibeas
  2014-10-25  6:27     ` Eli Zaretskii
  0 siblings, 2 replies; 5+ messages in thread
From: Haines Brown @ 2014-10-24 20:15 UTC (permalink / raw)
  To: help-gnu-emacs

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Haines Brown <haines@histomat.net>
>> Date: Fri, 24 Oct 2014 14:50:23 -0400
>> How do I do a search/replace to replace instances of the latter with the
>> former?
>
> Just replace it with M-%, as you would any other character.
>
>> What values should I use?
>
> The one shown above, of course.

Many codings were shown above (45, #055, #x2d, 0x2d for just one
character). Because none of them worked, I asked the question. 



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: search and replace codepoints
  2014-10-24 20:15   ` Haines Brown
@ 2014-10-24 21:11     ` Álvar Ibeas
  2014-10-25  6:27     ` Eli Zaretskii
  1 sibling, 0 replies; 5+ messages in thread
From: Álvar Ibeas @ 2014-10-24 21:11 UTC (permalink / raw)
  To: help-gnu-emacs

Hello,

>>> How do I do a search/replace to replace instances of the latter with the
>>> former?

You may copy the  character into the kill ring to yank it when prompted
for the first argument of replace-string. You can also evaluate the
following:

(replace-string "\u00ad" "-")

The purpose of the soft hyphen is to mark a possible hyphenation break.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: search and replace codepoints
  2014-10-24 20:15   ` Haines Brown
  2014-10-24 21:11     ` Álvar Ibeas
@ 2014-10-25  6:27     ` Eli Zaretskii
  1 sibling, 0 replies; 5+ messages in thread
From: Eli Zaretskii @ 2014-10-25  6:27 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Haines Brown <haines@histomat.net>
> Date: Fri, 24 Oct 2014 16:15:15 -0400
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> From: Haines Brown <haines@histomat.net>
> >> Date: Fri, 24 Oct 2014 14:50:23 -0400
> >> How do I do a search/replace to replace instances of the latter with the
> >> former?
> >
> > Just replace it with M-%, as you would any other character.
> >
> >> What values should I use?
> >
> > The one shown above, of course.
> 
> Many codings were shown above (45, #055, #x2d, 0x2d for just one
> character).

No, I meant only the values I quoted in my message:

>   character:  (displayed as ) (codepoint 173, #o255, #xad)

They are all the same value, decimal 173, shown in decimal, in octal,
and in hex.

> Because none of them worked, I asked the question. 

How did you try using them in a replace command?  What I had in mind
was use "C-x 8 RET", which allows you to type the codepoint in hex.




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-10-25  6:27 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-24 18:50 search and replace codepoints Haines Brown
2014-10-24 19:33 ` Eli Zaretskii
     [not found] ` <mailman.11972.1414179242.1147.help-gnu-emacs@gnu.org>
2014-10-24 20:15   ` Haines Brown
2014-10-24 21:11     ` Álvar Ibeas
2014-10-25  6:27     ` Eli Zaretskii

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).