unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#1502: CR/LF Unicode Problem
@ 2008-12-05 22:33 Lafleur, Henry
  2008-12-06  8:02 ` Eli Zaretskii
  2008-12-07 15:58 ` Jason Rumney
  0 siblings, 2 replies; 6+ messages in thread
From: Lafleur, Henry @ 2008-12-05 22:33 UTC (permalink / raw)
  To: bug-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 5867 bytes --]

Hi,

You guys do a great job. Thanks for all the hard work.

I was blaming the .NET framework on this, but it appears to be an issue
with Emacs.

When I load a Unicode file (UTF-8) where some lines in CR/LF and some
lines end in LF, in hexl-mode the CR/LF EOL's appear as CR/CR/LF and the
LF EOL's appear as CR/LF. See this thread for more information: 

http://social.msdn.microsoft.com/Forums/en-US/netfxbcl/thread/8ef5b69d-1
35d-4584-ae1a-1caeb4afc846/#page:1

If I save the file in hexl mode, it will save it with the extra CRs,
thus modifying the file more than anticipated.

Thanks,

Henry Lafleur               |      ,__o 
Project Lead                |    _-\_<, 
Canrig Enterprise Solutions |   (*)/'(*) 
a division of Canrig Drilling Technology Ltd.
Fax:       281-774-5640
Support: 1-866-433-4345
mailto:Henry.Lafleur@canrig.com 
http://www.mywells.com/ 

In GNU Emacs 22.1.1 (i386-mingw-nt5.1.2600)
 of 2007-06-02 on RELEASE
Windowing system distributor `Microsoft Corp.', version 5.1.2600
configured using `configure --with-gcc (3.4) --cflags
-Ic:/gnuwin32/include'

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: ENU
  locale-coding-system: cp1252
  default-enable-multibyte-characters: t

Major mode: Hexl

Minor modes in effect:
  ruler-mode: t
  hl-line-mode: t
  encoded-kbd-mode: t
  tooltip-mode: t
  tool-bar-mode: t
  mouse-wheel-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  unify-8859-on-encoding-mode: t
  utf-translate-cjk-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: identity

Recent input:
S e e SPC t h i s SPC t h r e a d SPC f o r SPC m o 
r e SPC i n f o r m a t i o n : SPC <return> C-v M-v 
<up> <up> <up> <down> <return> C-y <help-echo> <down-mouse-1> 
<mouse-movement> <mouse-movement> <drag-mouse-1> <down-mouse-1> 
<mouse-movement> <mouse-movement> <drag-mouse-1> <down-mouse-1> 
<mouse-1> <down-mouse-1> <mouse-1> <wheel-down> <return> 
<return> <up> <up> <up> <up> <down> <down> Y o u SPC 
g u y s SPC d o SPC a SPC g r e a t SPC j o b SPC w 
<backspace> <backspace> <backspace> <backspace> <backspace> 
<backspace> <backspace> <backspace> <backspace> <backspace> 
<backspace> <backspace> <backspace> <backspace> <backspace> 
<backspace> <backspace> <backspace> <backspace> <backspace> 
<backspace> <backspace> <backspace> <backspace> <backspace> 
<delete> <delete> <down-mouse-3> <mouse-3> <wheel-down> 
<double-wheel-down> <down> <down> <down> <up> <up> 
<up> <down> <down-mouse-1> <mouse-movement> <mouse-movement> 
<help-echo> <mouse-movement> <mouse-movement> <drag-mouse-1> 
<mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> 
<mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> 
<mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> 
<mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> 
<mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> 
<mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> 
<mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> 
<mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> 
<mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> 
<mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> 
<mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> 
<mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> 
<mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> 
<mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> 
<mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> 
<mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> 
<mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> 
<mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> 
<mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> 
<mouse-1> <mouse-1> <mouse-1> <mouse-1> <mouse-1> <down-mouse-1> 
<mouse-1> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <menu-bar> <file> <open-file> 
<help-echo> <help-echo> M-x h e x l - m o d <tab> <return> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<menu-bar> <help-menu> <report-emacs-bug>

Recent messages:
Loading emacsbug...done
Loading help-mode...done
Auto-saving...done
Mark set
Auto-saving...done
Loading sql...
Loading easymenu...done
Loading sql...done
Loading hexl...done
Loading mule-util...done




-----------------------------------------
CANRIG EMAIL NOTICE - This transmission may be strictly
confidential. If you are not the intended recipient of this
message, you may not disclose, print, copy, or disseminate this
information. If you have received this in error, please reply and
notify the sender (only) and delete the message. Unauthorized
interception of this e-mail is a violation of federal criminal law.
This communication does not reflect an intention by the sender or
the sender's principal to conduct a transaction or make any
agreement by electronic means. Nothing contained in this message or
in any attachment shall satisfy the requirements for a writing, and
nothing contained herein shall constitute a contract or electronic
signature under the Electronic Signatures in Global and National
Commerce Act, any version of the Uniform Electronic Transactions
Act, or any other statute governing electronic transactions.

[-- Attachment #2: Type: text/html, Size: 12518 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#1502: CR/LF Unicode Problem
  2008-12-05 22:33 bug#1502: CR/LF Unicode Problem Lafleur, Henry
@ 2008-12-06  8:02 ` Eli Zaretskii
       [not found]   ` <2905B00E9FC955468D756D43D7BF175475BBA4@USHOUXMB02.nabors.com>
  2008-12-07 15:58 ` Jason Rumney
  1 sibling, 1 reply; 6+ messages in thread
From: Eli Zaretskii @ 2008-12-06  8:02 UTC (permalink / raw)
  To: Lafleur, Henry, 1502; +Cc: bug-gnu-emacs

> Date: Fri, 5 Dec 2008 16:33:53 -0600
> From: "Lafleur, Henry" <Henry.LaFleur@canrig.com>
> Cc: 
> 
> When I load a Unicode file (UTF-8) where some lines in CR/LF and some
> lines end in LF, in hexl-mode the CR/LF EOL's appear as CR/CR/LF and the
> LF EOL's appear as CR/LF. See this thread for more information: 
> 
> http://social.msdn.microsoft.com/Forums/en-US/netfxbcl/thread/8ef5b69d-135d-4584-ae1a-1caeb4afc846/#page:1
> 
> If I save the file in hexl mode, it will save it with the extra CRs,
> thus modifying the file more than anticipated.

Can you post a shortest example of a file that exhibits this behavior?
I didn't see it in the thread you were citing; sorry if I missed
something.







^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#1502: FW: bug#1502: CR/LF Unicode Problem
       [not found]   ` <2905B00E9FC955468D756D43D7BF175475BBA4@USHOUXMB02.nabors.com>
@ 2008-12-06 17:27     ` Henry Lafleur
  0 siblings, 0 replies; 6+ messages in thread
From: Henry Lafleur @ 2008-12-06 17:27 UTC (permalink / raw)
  To: eliz, 1502


[-- Attachment #1.1: Type: text/plain, Size: 2583 bytes --]

Eli,

This is a file that is something like what I was describing. I'll get you
the error-causing file on Monday. I created this with Emacs/Linux, so I'm
not sure if it is correct--but on Linux it does not seem to exhibit the
problem (i.e. the CR's appended before all the LF's with Unicode files).

Thanks,

Henry.

On Sat, Dec 6, 2008 at 11:16 AM, Lafleur, Henry <Henry.LaFleur@canrig.com>wrote:

>
>
> Henry Lafleur               |    ,__o
> Project Lead                |  _-\_<,
> Canrig Enterprise Solutions | (*)/'(*)
> Fax:    281-774-5640
> mailto:Henry.Lafleur@canrig.com <Henry.Lafleur@canrig.com>
> http://www.canrig.com/
>
>
>
> -----Original Message-----
> From: Eli Zaretskii [mailto:eliz@gnu.org <eliz@gnu.org>]
> Sent: Sat 12/6/2008 2:02 AM
> To: Lafleur, Henry; 1502@emacsbugs.donarmstrong.com
> Cc: bug-gnu-emacs@gnu.org
> Subject: Re: bug#1502: CR/LF Unicode Problem
>
> > Date: Fri, 5 Dec 2008 16:33:53 -0600
> > From: "Lafleur, Henry" <Henry.LaFleur@canrig.com>
> > Cc:
> >
> > When I load a Unicode file (UTF-8) where some lines in CR/LF and some
> > lines end in LF, in hexl-mode the CR/LF EOL's appear as CR/CR/LF and the
> > LF EOL's appear as CR/LF. See this thread for more information:
> >
> >
> http://social.msdn.microsoft.com/Forums/en-US/netfxbcl/thread/8ef5b69d-135d-4584-ae1a-1caeb4afc846/#page:1
> >
> > If I save the file in hexl mode, it will save it with the extra CRs,
> > thus modifying the file more than anticipated.
>
> Can you post a shortest example of a file that exhibits this behavior?
> I didn't see it in the thread you were citing; sorry if I missed
> something.
>
>
>   ------------------------------
>
> * CANRIG EMAIL NOTICE - This transmission may be strictly confidential. If
> you are not the intended recipient of this message, you may not disclose,
> print, copy, or disseminate this information. If you have received this in
> error, please reply and notify the sender (only) and delete the message.
> Unauthorized interception of this e-mail is a violation of federal criminal
> law. This communication does not reflect an intention by the sender or the
> sender's principal to conduct a transaction or make any agreement by
> electronic means. Nothing contained in this message or in any attachment
> shall satisfy the requirements for a writing, and nothing contained herein
> shall constitute a contract or electronic signature under the Electronic
> Signatures in Global and National Commerce Act, any version of the Uniform
> Electronic Transactions Act, or any other statute governing electronic
> transactions. *
>

[-- Attachment #1.2: Type: text/html, Size: 4281 bytes --]

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: sample.sql --]
[-- Type: text/x-sql; name=sample.sql, Size: 209 bytes --]

This is a test
his is a tes
INSERT INTO continent (ENGLISH, RU) VALUES ('NORTH AMERICA','AMÉRICA DEL NORTE');
INSERT INTO continent (ENGLISH, RU) VALUES ('SOUTH AMERICA','AMÉRICA DEL SUR');

is is a tes

^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#1502: CR/LF Unicode Problem
  2008-12-05 22:33 bug#1502: CR/LF Unicode Problem Lafleur, Henry
  2008-12-06  8:02 ` Eli Zaretskii
@ 2008-12-07 15:58 ` Jason Rumney
  2008-12-08 21:26   ` Lafleur, Henry
  1 sibling, 1 reply; 6+ messages in thread
From: Jason Rumney @ 2008-12-07 15:58 UTC (permalink / raw)
  To: Lafleur, Henry, 1502

Lafleur, Henry wrote:
>
> When I load a Unicode file (UTF-8) where some lines in CR/LF and some 
> lines end in LF, in hexl-mode the CR/LF EOL's appear as CR/CR/LF and 
> the LF EOL's appear as CR/LF. See this thread for more information:
>
> _http://social.msdn.microsoft.com/Forums/en-US/netfxbcl/thread/8ef5b69d-135d-4584-ae1a-1caeb4afc846/#page:1_ 
>
>
> If I save the file in hexl mode, it will save it with the extra CRs, 
> thus modifying the file more than anticipated.
>

I don't see this with your sample.sql file using Emacs 22.3 here.
Do you still see the bug if you start Emacs from the command line as: 
emacs -Q







^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#1502: CR/LF Unicode Problem
  2008-12-07 15:58 ` Jason Rumney
@ 2008-12-08 21:26   ` Lafleur, Henry
       [not found]     ` <handler.1502.B1502.122877160131909.ackinfo@emacsbugs.donarmstrong.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Lafleur, Henry @ 2008-12-08 21:26 UTC (permalink / raw)
  To: Jason Rumney, 1502

Jason,

I don't see the problem in the sample.sql on Windows either. I was just
trying to get something to you guys over the weekend. The other file
uploaded, though, exhibits the problem (RRTestAppend1.sql).

Thanks,

Henry. 

-----Original Message-----
From: Jason Rumney [mailto:jasonrumney@gmail.com] On Behalf Of Jason
Rumney
Sent: Sunday, December 07, 2008 9:58 AM
To: Lafleur, Henry; 1502@emacsbugs.donarmstrong.com
Subject: Re: bug#1502: CR/LF Unicode Problem

Lafleur, Henry wrote:
>
> When I load a Unicode file (UTF-8) where some lines in CR/LF and some 
> lines end in LF, in hexl-mode the CR/LF EOL's appear as CR/CR/LF and 
> the LF EOL's appear as CR/LF. See this thread for more information:
>
> _http://social.msdn.microsoft.com/Forums/en-US/netfxbcl/thread/8ef5b69
> d-135d-4584-ae1a-1caeb4afc846/#page:1_
>
>
> If I save the file in hexl mode, it will save it with the extra CRs, 
> thus modifying the file more than anticipated.
>

I don't see this with your sample.sql file using Emacs 22.3 here.
Do you still see the bug if you start Emacs from the command line as: 
emacs -Q


-----------------------------------------
CANRIG EMAIL NOTICE - This transmission may be strictly
confidential. If you are not the intended recipient of this
message, you may not disclose, print, copy, or disseminate this
information. If you have received this in error, please reply and
notify the sender (only) and delete the message. Unauthorized
interception of this e-mail is a violation of federal criminal law.
This communication does not reflect an intention by the sender or
the sender's principal to conduct a transaction or make any
agreement by electronic means. Nothing contained in this message or
in any attachment shall satisfy the requirements for a writing, and
nothing contained herein shall constitute a contract or electronic
signature under the Electronic Signatures in Global and National
Commerce Act, any version of the Uniform Electronic Transactions
Act, or any other statute governing electronic transactions.






^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#1502: Info received (bug#1502: CR/LF Unicode Problem)
       [not found]     ` <handler.1502.B1502.122877160131909.ackinfo@emacsbugs.donarmstrong.com>
@ 2008-12-08 22:11       ` Lafleur, Henry
  0 siblings, 0 replies; 6+ messages in thread
From: Lafleur, Henry @ 2008-12-08 22:11 UTC (permalink / raw)
  To: 1502

One more thing to throw in the mix, I have this in my .emacs file for
UTF-16:

;;
;; Auto-detect UTF-16 files
;;

    ;; Add missing support functions
    (defun utf-16-le-pre-write-conversion (start end) nil)
    (defun utf-16-be-pre-write-conversion (start end) nil)

    ;; Set up auto-load of UTF-16 files using the appropriate coding
system.
    (setq coding-category-utf-16-le 'utf-16-le)
    (push 'coding-category-utf-16-le coding-category-list)

    ;; Detect endianness of UTF-16 containing a Byte Order Mark U+FEFF
    ;; Detect EOL mode by looking for CR/LF on the first line
    (add-to-list 'auto-coding-regexp-alist '("^\xFF\xFE.*\x0D\x00$" .
utf-16-le-dos) t)
    (add-to-list 'auto-coding-regexp-alist '("^\xFE\xFF.*\x0D\x00$" .
utf-16-be-dos) t)
    (add-to-list 'auto-coding-regexp-alist '("^\xFF\xFE" . utf-16-le) t)
    (add-to-list 'auto-coding-regexp-alist '("^\xFE\xFF" . utf-16-be) t)

I have no idea if this would cause the problem. These files start with
xEF xBB xBF, which shouldn't match the byte order marks above.

Thanks,

Henry. 

-----Original Message-----
From: Emacs bug Tracking System [mailto:don@donarmstrong.com] 
Sent: Monday, December 08, 2008 3:35 PM
To: Lafleur, Henry
Subject: bug#1502: Info received (bug#1502: CR/LF Unicode Problem)


Thank you for the additional information you have supplied regarding
this bug report.

This is an automatically generated reply to let you know your message
has been received.

Your message is being forwarded to the package maintainers and other
interested parties for their attention; they will reply in due course.

Your message has been sent to the package maintainer(s):
 Emacs Bugs <bug-gnu-emacs@gnu.org>

If you wish to submit further information on this problem, please send
it to 1502@emacsbugs.donarmstrong.com, as before.

Please do not send mail to don@donarmstrong.com unless you wish to
report a problem with the Bug-tracking system.


--
1502: http://emacsbugs.donarmstrong.com/cgi-bin/bugreport.cgi?bug=1502
Emacs Bug Tracking System
Contact don@donarmstrong.com with problems

-----------------------------------------
CANRIG EMAIL NOTICE - This transmission may be strictly
confidential. If you are not the intended recipient of this
message, you may not disclose, print, copy, or disseminate this
information. If you have received this in error, please reply and
notify the sender (only) and delete the message. Unauthorized
interception of this e-mail is a violation of federal criminal law.
This communication does not reflect an intention by the sender or
the sender's principal to conduct a transaction or make any
agreement by electronic means. Nothing contained in this message or
in any attachment shall satisfy the requirements for a writing, and
nothing contained herein shall constitute a contract or electronic
signature under the Electronic Signatures in Global and National
Commerce Act, any version of the Uniform Electronic Transactions
Act, or any other statute governing electronic transactions.






^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-12-08 22:11 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-05 22:33 bug#1502: CR/LF Unicode Problem Lafleur, Henry
2008-12-06  8:02 ` Eli Zaretskii
     [not found]   ` <2905B00E9FC955468D756D43D7BF175475BBA4@USHOUXMB02.nabors.com>
2008-12-06 17:27     ` bug#1502: FW: " Henry Lafleur
2008-12-07 15:58 ` Jason Rumney
2008-12-08 21:26   ` Lafleur, Henry
     [not found]     ` <handler.1502.B1502.122877160131909.ackinfo@emacsbugs.donarmstrong.com>
2008-12-08 22:11       ` bug#1502: Info received (bug#1502: CR/LF Unicode Problem) Lafleur, Henry

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).