* decode-coding-string question @ 2008-08-14 21:02 Ted Zlatanov 2008-08-14 22:19 ` Dmitry Dzhus 2008-08-14 22:20 ` David Golden 0 siblings, 2 replies; 15+ messages in thread From: Ted Zlatanov @ 2008-08-14 21:02 UTC (permalink / raw) To: help-gnu-emacs This should decode to нуль but doesn't (I get the same string instead): (decode-coding-string "íîëü" 'cp1251) Am I missing something obvious? Do I need to encode the string to something else? Ted ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: decode-coding-string question 2008-08-14 21:02 decode-coding-string question Ted Zlatanov @ 2008-08-14 22:19 ` Dmitry Dzhus 2008-08-15 7:37 ` Eli Zaretskii ` (2 more replies) 2008-08-14 22:20 ` David Golden 1 sibling, 3 replies; 15+ messages in thread From: Dmitry Dzhus @ 2008-08-14 22:19 UTC (permalink / raw) To: help-gnu-emacs Ted Zlatanov wrote: > This should decode to нуль but doesn't (I get the same string instead): > > (decode-coding-string "íîëü" 'cp1251) > > Am I missing something obvious? Do I need to encode the string to > something else? 0. «íóëü», not «íîëü» 1. (decode-coding-string (string-make-unibyte "íóëü") 'cp1251) -- Happy Hacking. http://sphinx.net.ru む ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: decode-coding-string question 2008-08-14 22:19 ` Dmitry Dzhus @ 2008-08-15 7:37 ` Eli Zaretskii 2008-08-15 15:54 ` Ted Zlatanov [not found] ` <mailman.16762.1218785885.18990.help-gnu-emacs@gnu.org> 2 siblings, 0 replies; 15+ messages in thread From: Eli Zaretskii @ 2008-08-15 7:37 UTC (permalink / raw) To: help-gnu-emacs > From: Dmitry Dzhus <dima@sphinx.net.ru> > Date: Fri, 15 Aug 2008 02:19:11 +0400 > > Ted Zlatanov wrote: > > > This should decode to нуль but doesn't (I get the same string instead): > > > > (decode-coding-string "íîëü" 'cp1251) > > > > Am I missing something obvious? Do I need to encode the string to > > something else? > > 0. «íóëü», not «íîëü» That's not important, the original problem remains, even with a different spelling of the word. > 1. (decode-coding-string (string-make-unibyte "íóëü") 'cp1251) Yes, that's it: decode-coding-string works reliably only on unibyte strings. In multibyte strings, random byte values will be interpreted according to rules that are appropriate for Emacs internal representation of text in buffers and strings, not according to what we humans expect. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: decode-coding-string question 2008-08-14 22:19 ` Dmitry Dzhus 2008-08-15 7:37 ` Eli Zaretskii @ 2008-08-15 15:54 ` Ted Zlatanov 2008-08-15 17:04 ` Eli Zaretskii [not found] ` <mailman.16821.1218819933.18990.help-gnu-emacs@gnu.org> [not found] ` <mailman.16762.1218785885.18990.help-gnu-emacs@gnu.org> 2 siblings, 2 replies; 15+ messages in thread From: Ted Zlatanov @ 2008-08-15 15:54 UTC (permalink / raw) To: help-gnu-emacs On Fri, 15 Aug 2008 02:19:11 +0400 Dmitry Dzhus <dima@sphinx.net.ru> wrote: DD> (decode-coding-string (string-make-unibyte "íóëü") 'cp1251) Thanks. There should probably be a specific function for this: (decode-coding-string-as-unibyte "íóëü" 'cp1251) ditto for decode-coding-region. Should I add it or is that not generally useful? A flag is not as good because both functions have several flags already. Ted ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: decode-coding-string question 2008-08-15 15:54 ` Ted Zlatanov @ 2008-08-15 17:04 ` Eli Zaretskii [not found] ` <mailman.16821.1218819933.18990.help-gnu-emacs@gnu.org> 1 sibling, 0 replies; 15+ messages in thread From: Eli Zaretskii @ 2008-08-15 17:04 UTC (permalink / raw) To: help-gnu-emacs > From: Ted Zlatanov <tzz@lifelogs.com> > Date: Fri, 15 Aug 2008 10:54:20 -0500 > > There should probably be a specific function for this: > > (decode-coding-string-as-unibyte "íóëü" 'cp1251) > > ditto for decode-coding-region. Should I add it or is that not > generally useful? Personally, I think it's not useful, since decode-coding-region and decode-coding-string are used only on unibyte text. But feel free to raise this on emacs-devel. ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <mailman.16821.1218819933.18990.help-gnu-emacs@gnu.org>]
* Re: decode-coding-string question [not found] ` <mailman.16821.1218819933.18990.help-gnu-emacs@gnu.org> @ 2008-08-18 13:58 ` Ted Zlatanov 2008-08-18 16:09 ` Nikolaj Schumacher ` (3 more replies) 0 siblings, 4 replies; 15+ messages in thread From: Ted Zlatanov @ 2008-08-18 13:58 UTC (permalink / raw) To: help-gnu-emacs On Fri, 15 Aug 2008 20:04:42 +0300 Eli Zaretskii <eliz@gnu.org> wrote: >> From: Ted Zlatanov <tzz@lifelogs.com> >> Date: Fri, 15 Aug 2008 10:54:20 -0500 >> >> There should probably be a specific function for this: >> >> (decode-coding-string-as-unibyte "íóëü" 'cp1251) >> >> ditto for decode-coding-region. Should I add it or is that not >> generally useful? EZ> Personally, I think it's not useful, since decode-coding-region and EZ> decode-coding-string are used only on unibyte text. But feel free to EZ> raise this on emacs-devel. How would you recommend decoding text from particular encodings? Given text like the one shown above in a buffer, only decode-coding-region seems to DTRT, and it's not interactive. Context: I have a file full of CP1251 data and don't want to use Perl's Encode module because I'm stubborn and think Emacs should handle it :) On Fri, 15 Aug 2008 20:06:59 +0400 Dmitry Dzhus <dima@sphinx.net.ru> wrote: DD> That was nitpicking somewhat irrelevant to unibyte-multibyte problem: DD> Ted expected to get «нуль», and it's «íóëü», though the string he DD> originally provided — «íîëü» — decodes to «ноль»; however, both «нуль» DD> and «ноль» mean «zero». Yes, I was translating from Russian and knew the text said "zero" but didn't remember the correct spelling. Thanks for checking. Ted ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: decode-coding-string question 2008-08-18 13:58 ` Ted Zlatanov @ 2008-08-18 16:09 ` Nikolaj Schumacher 2008-08-18 17:45 ` David Golden ` (2 subsequent siblings) 3 siblings, 0 replies; 15+ messages in thread From: Nikolaj Schumacher @ 2008-08-18 16:09 UTC (permalink / raw) To: Ted Zlatanov; +Cc: help-gnu-emacs Ted Zlatanov <tzz@lifelogs.com> wrote: > Context: I have a file full of CP1251 data and don't want to use Perl's > Encode module because I'm stubborn and think Emacs should handle it :) Maybe `file-coding-system-alist' or `coding-system-for-read' are of help? regards, Nikolaj Schumacher ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: decode-coding-string question 2008-08-18 13:58 ` Ted Zlatanov 2008-08-18 16:09 ` Nikolaj Schumacher @ 2008-08-18 17:45 ` David Golden 2008-08-18 19:11 ` Eli Zaretskii [not found] ` <mailman.16993.1219086714.18990.help-gnu-emacs@gnu.org> 3 siblings, 0 replies; 15+ messages in thread From: David Golden @ 2008-08-18 17:45 UTC (permalink / raw) To: help-gnu-emacs Ted Zlatanov wrote: > Context: I have a file full of CP1251 data and don't want to use > Perl's Encode module because I'm stubborn and think Emacs should > handle it :) Just in case: If you have a file full of cp1251, and you know it's cp1251, it' s usually best to just open it as cp1251 in the first place! C-x RET c cp1251 C-x C-f myfile.txt It's typically only if you've got a file full of fragments in different encodings (horrible mail spool formats and the like) that you want to decode and reencode particular subregions of whole files. If you've already opened a file and its encoding is misdetected, you can also hit C-x RET r cp1251 to "revert" the buffer to the file reopened in the specified encoding. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: decode-coding-string question 2008-08-18 13:58 ` Ted Zlatanov 2008-08-18 16:09 ` Nikolaj Schumacher 2008-08-18 17:45 ` David Golden @ 2008-08-18 19:11 ` Eli Zaretskii 2008-08-19 8:34 ` Kevin Rodgers [not found] ` <mailman.16993.1219086714.18990.help-gnu-emacs@gnu.org> 3 siblings, 1 reply; 15+ messages in thread From: Eli Zaretskii @ 2008-08-18 19:11 UTC (permalink / raw) To: help-gnu-emacs > From: Ted Zlatanov <tzz@lifelogs.com> > Date: Mon, 18 Aug 2008 08:58:55 -0500 > > How would you recommend decoding text from particular encodings? Given > text like the one shown above in a buffer, only decode-coding-region > seems to DTRT, and it's not interactive. If you mean interactively, i.e. you visited a buffer and then discovered that it was decoded incorrectly, and the actual encoding is different, then "C-x RET c cp1251 RET M-x revert-buffer RET" should do what you want, I think. > Context: I have a file full of CP1251 data and don't want to use Perl's > Encode module because I'm stubborn and think Emacs should handle it :) What about the rest of the file? is it encoded in some other encoding? If not, then the above recipe should do. If it doesn't, please tell more details. > On Fri, 15 Aug 2008 20:06:59 +0400 Dmitry Dzhus <dima@sphinx.net.ru> wrote: > > DD> That was nitpicking somewhat irrelevant to unibyte-multibyte problem: > DD> Ted expected to get «нуль», and it's «íóëü», though the string he > DD> originally provided — «íîëü» — decodes to «ноль»; however, both «нуль» > DD> and «ноль» mean «zero». > > Yes, I was translating from Russian and knew the text said "zero" but > didn't remember the correct spelling. AFAIK, both spellings are right. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: decode-coding-string question 2008-08-18 19:11 ` Eli Zaretskii @ 2008-08-19 8:34 ` Kevin Rodgers 0 siblings, 0 replies; 15+ messages in thread From: Kevin Rodgers @ 2008-08-19 8:34 UTC (permalink / raw) To: help-gnu-emacs Eli Zaretskii wrote: >> From: Ted Zlatanov <tzz@lifelogs.com> >> Date: Mon, 18 Aug 2008 08:58:55 -0500 >> >> How would you recommend decoding text from particular encodings? Given >> text like the one shown above in a buffer, only decode-coding-region >> seems to DTRT, and it's not interactive. > > If you mean interactively, i.e. you visited a buffer and then > discovered that it was decoded incorrectly, and the actual encoding is > different, then "C-x RET c cp1251 RET M-x revert-buffer RET" should do > what you want, I think. aka `C-x RET r cp1251 RET', right? -- Kevin Rodgers Denver, Colorado, USA ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <mailman.16993.1219086714.18990.help-gnu-emacs@gnu.org>]
* Re: decode-coding-string question [not found] ` <mailman.16993.1219086714.18990.help-gnu-emacs@gnu.org> @ 2008-08-18 20:07 ` Ted Zlatanov 2008-08-18 23:01 ` David Golden 0 siblings, 1 reply; 15+ messages in thread From: Ted Zlatanov @ 2008-08-18 20:07 UTC (permalink / raw) To: help-gnu-emacs On Mon, 18 Aug 2008 22:11:37 +0300 Eli Zaretskii <eliz@gnu.org> wrote: >> From: Ted Zlatanov <tzz@lifelogs.com> >> Date: Mon, 18 Aug 2008 08:58:55 -0500 >> >> How would you recommend decoding text from particular encodings? Given >> text like the one shown above in a buffer, only decode-coding-region >> seems to DTRT, and it's not interactive. EZ> If you mean interactively, i.e. you visited a buffer and then EZ> discovered that it was decoded incorrectly, and the actual encoding is EZ> different, then "C-x RET c cp1251 RET M-x revert-buffer RET" should do EZ> what you want, I think. >> Context: I have a file full of CP1251 data and don't want to use Perl's >> Encode module because I'm stubborn and think Emacs should handle it :) EZ> What about the rest of the file? is it encoded in some other encoding? EZ> If not, then the above recipe should do. If it doesn't, please tell EZ> more details. I often have to open mangled files with mixed-up encodings; it's convenient to set the coding-system after I look at the text, and only apply it to a region. On Mon, 18 Aug 2008 18:45:29 +0100 David Golden <david.golden@oceanfree.net> wrote: [similar advice] I see it's an uncommon operation, so I'll use the make-unibyte-string recipe Eli and others recommended. Thanks Ted ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: decode-coding-string question 2008-08-18 20:07 ` Ted Zlatanov @ 2008-08-18 23:01 ` David Golden 2008-08-19 13:48 ` Ted Zlatanov 0 siblings, 1 reply; 15+ messages in thread From: David Golden @ 2008-08-18 23:01 UTC (permalink / raw) To: help-gnu-emacs Ted Zlatanov wrote: > I often have to open mangled files with mixed-up encodings; it's > convenient to set the coding-system after I look at the text, and only > apply it to a region. > Uhm. M-x recode-region also exists (at least in CVS emacs, no idea when it was introduced). ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: decode-coding-string question 2008-08-18 23:01 ` David Golden @ 2008-08-19 13:48 ` Ted Zlatanov 0 siblings, 0 replies; 15+ messages in thread From: Ted Zlatanov @ 2008-08-19 13:48 UTC (permalink / raw) To: help-gnu-emacs On Tue, 19 Aug 2008 00:01:07 +0100 David Golden <david.golden@oceanfree.net> wrote: DG> Ted Zlatanov wrote: >> I often have to open mangled files with mixed-up encodings; it's >> convenient to set the coding-system after I look at the text, and only >> apply it to a region. >> DG> M-x recode-region DG> also exists (at least in CVS emacs, no idea when it was introduced). That's exactly what I needed (and it's nicely interactive too). I did apropos for 'decod' and didn't see 'recode-*'. Thanks Ted ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <mailman.16762.1218785885.18990.help-gnu-emacs@gnu.org>]
* Re: decode-coding-string question [not found] ` <mailman.16762.1218785885.18990.help-gnu-emacs@gnu.org> @ 2008-08-15 16:06 ` Dmitry Dzhus 0 siblings, 0 replies; 15+ messages in thread From: Dmitry Dzhus @ 2008-08-15 16:06 UTC (permalink / raw) To: help-gnu-emacs Eli Zaretskii wrote: >> From: Dmitry Dzhus <dima@sphinx.net.ru> >> Date: Fri, 15 Aug 2008 02:19:11 +0400 >> >> Ted Zlatanov wrote: >> >> > This should decode to нуль but doesn't (I get the same string instead): >> > >> > (decode-coding-string "íîëü" 'cp1251) >> > >> > Am I missing something obvious? Do I need to encode the string to >> > something else? >> >> 0. «íóëü», not «íîëü» > > That's not important, the original problem remains, even with a > different spelling of the word. That was nitpicking somewhat irrelevant to unibyte-multibyte problem: Ted expected to get «нуль», and it's «íóëü», though the string he originally provided — «íîëü» — decodes to «ноль»; however, both «нуль» and «ноль» mean «zero». -- Happy Hacking. http://sphinx.net.ru む ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: decode-coding-string question 2008-08-14 21:02 decode-coding-string question Ted Zlatanov 2008-08-14 22:19 ` Dmitry Dzhus @ 2008-08-14 22:20 ` David Golden 1 sibling, 0 replies; 15+ messages in thread From: David Golden @ 2008-08-14 22:20 UTC (permalink / raw) To: help-gnu-emacs Ted Zlatanov wrote: > This should decode to нуль but doesn't (I get the same string > instead): > > (decode-coding-string "íîëü" 'cp1251) > > Am I missing something obvious? Do I need to encode the string to > something else? > Guessing you're using a new multibyte/unicode emacs, and noting that I do not currently fully understand emacs encoding handling, but... probably - try: (decode-coding-string (encode-coding-string "íîëü" 'iso-8859-1) 'cp1251) ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2008-08-19 13:48 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-08-14 21:02 decode-coding-string question Ted Zlatanov 2008-08-14 22:19 ` Dmitry Dzhus 2008-08-15 7:37 ` Eli Zaretskii 2008-08-15 15:54 ` Ted Zlatanov 2008-08-15 17:04 ` Eli Zaretskii [not found] ` <mailman.16821.1218819933.18990.help-gnu-emacs@gnu.org> 2008-08-18 13:58 ` Ted Zlatanov 2008-08-18 16:09 ` Nikolaj Schumacher 2008-08-18 17:45 ` David Golden 2008-08-18 19:11 ` Eli Zaretskii 2008-08-19 8:34 ` Kevin Rodgers [not found] ` <mailman.16993.1219086714.18990.help-gnu-emacs@gnu.org> 2008-08-18 20:07 ` Ted Zlatanov 2008-08-18 23:01 ` David Golden 2008-08-19 13:48 ` Ted Zlatanov [not found] ` <mailman.16762.1218785885.18990.help-gnu-emacs@gnu.org> 2008-08-15 16:06 ` Dmitry Dzhus 2008-08-14 22:20 ` David Golden
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).