unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* delete-trailing-whitespace and binary files
@ 2006-05-21 18:46 Kim F. Storm
  2006-05-21 18:50 ` Eli Zaretskii
  2006-05-22 15:11 ` Richard Stallman
  0 siblings, 2 replies; 15+ messages in thread
From: Kim F. Storm @ 2006-05-21 18:46 UTC (permalink / raw)



I have delete-trailing-whitespace in before-save-hook, but that
was a real disaster editing etc/spook.lines ... as soon as
I saved the file, all the trailing NUL-characters were deleted.

What can be done about this?

1) If file is binary (how to check that?), don't delete whitespace
   when run from a hook (how to check that?).

Is there a "file-binary-p" function which tests if
a file contains any non-printable characters?

2) Don't consider NUL to be whitespace.

3) ?

-- 
Kim F. Storm <storm@cua.dk> http://www.cua.dk

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: delete-trailing-whitespace and binary files
  2006-05-21 18:46 delete-trailing-whitespace and binary files Kim F. Storm
@ 2006-05-21 18:50 ` Eli Zaretskii
  2006-05-22 15:11 ` Richard Stallman
  1 sibling, 0 replies; 15+ messages in thread
From: Eli Zaretskii @ 2006-05-21 18:50 UTC (permalink / raw)
  Cc: emacs-devel

> From: storm@cua.dk (Kim F. Storm)
> Date: Sun, 21 May 2006 20:46:11 +0200
> 
> 1) If file is binary (how to check that?)

Look for null characters.  (That's what Grep and a few other programs
do.)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: delete-trailing-whitespace and binary files
  2006-05-21 18:46 delete-trailing-whitespace and binary files Kim F. Storm
  2006-05-21 18:50 ` Eli Zaretskii
@ 2006-05-22 15:11 ` Richard Stallman
  2006-05-26 22:31   ` Kim F. Storm
  1 sibling, 1 reply; 15+ messages in thread
From: Richard Stallman @ 2006-05-22 15:11 UTC (permalink / raw)
  Cc: emacs-devel

    1) If file is binary (how to check that?), don't delete whitespace
       when run from a hook (how to check that?).

That is so complex that I think it would be undesirable.

    2) Don't consider NUL to be whitespace.

That seems like a good idea.  Is there EVER a reason for
delete-trailing-whitespace to delete NUL?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: delete-trailing-whitespace and binary files
  2006-05-22 15:11 ` Richard Stallman
@ 2006-05-26 22:31   ` Kim F. Storm
  2006-05-27  3:20     ` Stefan Monnier
  0 siblings, 1 reply; 15+ messages in thread
From: Kim F. Storm @ 2006-05-26 22:31 UTC (permalink / raw)
  Cc: emacs-devel

Richard Stallman <rms@gnu.org> writes:

>     1) If file is binary (how to check that?), don't delete whitespace
>        when run from a hook (how to check that?).
>
> That is so complex that I think it would be undesirable.
>
>     2) Don't consider NUL to be whitespace.
>
> That seems like a good idea.  Is there EVER a reason for
> delete-trailing-whitespace to delete NUL?

If the syntax table says that NUL is whitespace, I guess it should do it...


I looked a little further at this.

Opening file spook.lines selects text-mode.

text-mode is derived from Fundamental-mode, which has
a somewhat peculiar (IMO) interpretation of "whitespace":

C-@ .. SPC	  	which means: whitespace
DEL .. ÿ	  	which means: whitespace
   		  	which means: whitespace
      .. ​	  	which means: whitespace
<<default>>	  	which means: whitespace


So delete-trailing-whitespace just does what it's supposed to do.

Is there a better major-mode to choose for spook.lines (e.g. via a
file local variable)?

-- 
Kim F. Storm <storm@cua.dk> http://www.cua.dk

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: delete-trailing-whitespace and binary files
  2006-05-26 22:31   ` Kim F. Storm
@ 2006-05-27  3:20     ` Stefan Monnier
  2006-05-30 20:10       ` Kevin Rodgers
  0 siblings, 1 reply; 15+ messages in thread
From: Stefan Monnier @ 2006-05-27  3:20 UTC (permalink / raw)
  Cc: rms, emacs-devel

> text-mode is derived from Fundamental-mode, which has
> a somewhat peculiar (IMO) interpretation of "whitespace":

> C-@ .. SPC	  	which means: whitespace
> DEL .. ÿ	  	which means: whitespace
>    		  	which means: whitespace
>       .. ​	  	which means: whitespace
> <<default>>	  	which means: whitespace

This looks wrong.  Why should all the control chars be considered whitespace
in text-mode?  Rather than try to use another major mode I think we should
fix the above.


        Stefan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: delete-trailing-whitespace and binary files
  2006-05-27  3:20     ` Stefan Monnier
@ 2006-05-30 20:10       ` Kevin Rodgers
  2006-06-01 14:23         ` Kim F. Storm
  0 siblings, 1 reply; 15+ messages in thread
From: Kevin Rodgers @ 2006-05-30 20:10 UTC (permalink / raw)


Stefan Monnier wrote:
>> text-mode is derived from Fundamental-mode, which has
>> a somewhat peculiar (IMO) interpretation of "whitespace":
> 
>> C-@ .. SPC	  	which means: whitespace
>> DEL .. ÿ	  	which means: whitespace
>>    		  	which means: whitespace
>>       .. ​	  	which means: whitespace
>> <<default>>	  	which means: whitespace
> 
> This looks wrong.  Why should all the control chars be considered whitespace
> in text-mode?  Rather than try to use another major mode I think we should
> fix the above.

That's what I thought, but I don't see a more appropriate choice in the
Syntax Class Table node of the Emacs Lisp manual, which says this:

| "Whitespace characters" (designated by ` ' or `-') separate symbols
| and words from each other.

Thanks,
-- 
Kevin

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: delete-trailing-whitespace and binary files
  2006-05-30 20:10       ` Kevin Rodgers
@ 2006-06-01 14:23         ` Kim F. Storm
  2006-06-02  3:13           ` Richard Stallman
  0 siblings, 1 reply; 15+ messages in thread
From: Kim F. Storm @ 2006-06-01 14:23 UTC (permalink / raw)
  Cc: emacs-devel

Kevin Rodgers <ihs_4664@yahoo.com> writes:

> Stefan Monnier wrote:
>>> text-mode is derived from Fundamental-mode, which has
>>> a somewhat peculiar (IMO) interpretation of "whitespace":
>>
>>> C-@ .. SPC	  	which means: whitespace
>>> DEL .. ÿ	  	which means: whitespace
>>>    		  	which means: whitespace
>>>       .. ​	  	which means: whitespace
>>> <<default>>	  	which means: whitespace
>>
>> This looks wrong.  Why should all the control chars be considered whitespace
>> in text-mode?  Rather than try to use another major mode I think we should
>> fix the above.
>
> That's what I thought, but I don't see a more appropriate choice in the
> Syntax Class Table node of the Emacs Lisp manual, which says this:
>
> | "Whitespace characters" (designated by ` ' or `-') separate symbols
> | and words from each other.

Maybe the solution for delete-trailing-whitespace is to explicitly
only consider "normal" whitespace characters, i.e. don't use the
syntax table ?

-- 
Kim F. Storm <storm@cua.dk> http://www.cua.dk

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: delete-trailing-whitespace and binary files
  2006-06-01 14:23         ` Kim F. Storm
@ 2006-06-02  3:13           ` Richard Stallman
  2006-06-02  8:28             ` David Kastrup
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Stallman @ 2006-06-02  3:13 UTC (permalink / raw)
  Cc: ihs_4664, emacs-devel

    >> This looks wrong.  Why should all the control chars be considered whitespace
    >> in text-mode?  Rather than try to use another major mode I think we should
    >> fix the above.
    >
    > That's what I thought, but I don't see a more appropriate choice in the
    > Syntax Class Table node of the Emacs Lisp manual, which says this:

We could give them Word syntax.

Since they don't normally appear in proper English text, it is hard to
argue that any choice of syntax is fundamentally wrong.  It is just a
matter of what seems marginally better.  This problem is an argument
that Word is marginally better than Whitespace.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: delete-trailing-whitespace and binary files
  2006-06-02  3:13           ` Richard Stallman
@ 2006-06-02  8:28             ` David Kastrup
  2006-06-02 22:39               ` Richard Stallman
  0 siblings, 1 reply; 15+ messages in thread
From: David Kastrup @ 2006-06-02  8:28 UTC (permalink / raw)
  Cc: ihs_4664, emacs-devel, Kim F. Storm

Richard Stallman <rms@gnu.org> writes:

>     >> This looks wrong.  Why should all the control chars be
>     >> considered whitespace in text-mode?  Rather than try to use
>     >> another major mode I think we should fix the above.
>     >
>     > That's what I thought, but I don't see a more appropriate
>     > choice in the Syntax Class Table node of the Emacs Lisp
>     > manual, which says this:
>
> We could give them Word syntax.
>
> Since they don't normally appear in proper English text, it is hard to
> argue that any choice of syntax is fundamentally wrong.  It is just a
> matter of what seems marginally better.  This problem is an argument
> that Word is marginally better than Whitespace.

I think "punctuation" would make more sense.  This designates stuff
that is neither a component of words nor of symbols, and not
whitespace.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: delete-trailing-whitespace and binary files
  2006-06-02  8:28             ` David Kastrup
@ 2006-06-02 22:39               ` Richard Stallman
  2006-06-02 22:45                 ` David Kastrup
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Stallman @ 2006-06-02 22:39 UTC (permalink / raw)
  Cc: ihs_4664, emacs-devel, storm

    I think "punctuation" would make more sense.  This designates stuff
    that is neither a component of words nor of symbols, and not
    whitespace.

Ok with me.  Would you like to do it?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: delete-trailing-whitespace and binary files
  2006-06-02 22:39               ` Richard Stallman
@ 2006-06-02 22:45                 ` David Kastrup
  2006-06-04  2:24                   ` Richard Stallman
  0 siblings, 1 reply; 15+ messages in thread
From: David Kastrup @ 2006-06-02 22:45 UTC (permalink / raw)
  Cc: ihs_4664, emacs-devel, storm

Richard Stallman <rms@gnu.org> writes:

>     I think "punctuation" would make more sense.  This designates stuff
>     that is neither a component of words nor of symbols, and not
>     whitespace.
>
> Ok with me.  Would you like to do it?

I am not at all familiar with the code in question and would not be
comfortable trying to find and change it.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: delete-trailing-whitespace and binary files
  2006-06-02 22:45                 ` David Kastrup
@ 2006-06-04  2:24                   ` Richard Stallman
  2006-06-06 20:25                     ` Stuart D. Herring
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Stallman @ 2006-06-04  2:24 UTC (permalink / raw)
  Cc: ihs_4664, storm, emacs-devel

Does this patch do the job?

*** syntax.c	01 May 2006 16:16:58 -0400	1.189
--- syntax.c	03 Jun 2006 18:43:27 -0400	
***************
*** 3122,3127 ****
--- 3122,3133 ----
  
    Vstandard_syntax_table = Fmake_char_table (Qsyntax_table, temp);
  
+   /* Control characters should not be whitespace.  */
+   temp = XVECTOR (Vsyntax_code_object)->contents[(int) Spunct];
+   for (i = 1; i <= ' ' - 1; i++)
+     SET_RAW_SYNTAX_ENTRY (Vstandard_syntax_table, i, temp);
+   SET_RAW_SYNTAX_ENTRY (Vstandard_syntax_table, 0177, temp);
+ 
    temp = XVECTOR (Vsyntax_code_object)->contents[(int) Sword];
    for (i = 'a'; i <= 'z'; i++)
      SET_RAW_SYNTAX_ENTRY (Vstandard_syntax_table, i, temp);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: delete-trailing-whitespace and binary files
  2006-06-04  2:24                   ` Richard Stallman
@ 2006-06-06 20:25                     ` Stuart D. Herring
  2006-06-06 21:16                       ` Stuart D. Herring
  0 siblings, 1 reply; 15+ messages in thread
From: Stuart D. Herring @ 2006-06-06 20:25 UTC (permalink / raw)
  Cc: emacs-devel

> +   /* Control characters should not be whitespace.  */
> +   temp = XVECTOR (Vsyntax_code_object)->contents[(int) Spunct];
> +   for (i = 1; i <= ' ' - 1; i++)
> +     SET_RAW_SYNTAX_ENTRY (Vstandard_syntax_table, i, temp);
> +   SET_RAW_SYNTAX_ENTRY (Vstandard_syntax_table, 0177, temp);

This seems wrong to me in two respects.  It doesn't do anything to NUL,
which was the original problem character; instead, it marks as punctuation
^I (TAB), ^J (LFD), ^L (FFD), and ^M (CR), which should probably be left
as whitespace.  One could also make a case that such characters as ^?
(VTB) should remain whitespace, although I've never seen anyone use that
one.

Davis

-- 
This product is sold by volume, not by mass.  If it appears too dense or
too sparse, it is because mass-energy conversion has occurred during
shipping.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: delete-trailing-whitespace and binary files
  2006-06-06 20:25                     ` Stuart D. Herring
@ 2006-06-06 21:16                       ` Stuart D. Herring
  2006-06-07  2:20                         ` Richard Stallman
  0 siblings, 1 reply; 15+ messages in thread
From: Stuart D. Herring @ 2006-06-06 21:16 UTC (permalink / raw)
  Cc: emacs-devel

I wrote:

> This seems wrong to me in two respects.  It doesn't do anything to NUL,
> which was the original problem character; instead, it marks as punctuation
> ^I (TAB), ^J (LFD), ^L (FFD), and ^M (CR), which should probably be left
> as whitespace.  One could also make a case that such characters as ^?
> (VTB) should remain whitespace, although I've never seen anyone use that
> one.

I didn't mean ^? (DEL); I just hadn't looked up which character was
vertical tab and forgot to fill it in before sending.  I should have
written ^I (HT: horizontal tab, or TAB), ^J (LF: line feed, or newline),
^L (FF: form feed), ^M (CR: carriage return), and ^K (VT: vertical tab);
one out of five ASCII abbreviations isn't bad, is it?

Davis

-- 
This product is sold by volume, not by mass.  If it appears too dense or
too sparse, it is because mass-energy conversion has occurred during
shipping.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: delete-trailing-whitespace and binary files
  2006-06-06 21:16                       ` Stuart D. Herring
@ 2006-06-07  2:20                         ` Richard Stallman
  0 siblings, 0 replies; 15+ messages in thread
From: Richard Stallman @ 2006-06-07  2:20 UTC (permalink / raw)
  Cc: emacs-devel

    I didn't mean ^? (DEL); I just hadn't looked up which character was
    vertical tab and forgot to fill it in before sending.  I should have
    written ^I (HT: horizontal tab, or TAB), ^J (LF: line feed, or newline),
    ^L (FF: form feed), ^M (CR: carriage return), and ^K (VT: vertical tab);
    one out of five ASCII abbreviations isn't bad, is it?

Yes, you're right (though I don't see a need to include vertical tab;
nobody really uses that).

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2006-06-07  2:20 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-21 18:46 delete-trailing-whitespace and binary files Kim F. Storm
2006-05-21 18:50 ` Eli Zaretskii
2006-05-22 15:11 ` Richard Stallman
2006-05-26 22:31   ` Kim F. Storm
2006-05-27  3:20     ` Stefan Monnier
2006-05-30 20:10       ` Kevin Rodgers
2006-06-01 14:23         ` Kim F. Storm
2006-06-02  3:13           ` Richard Stallman
2006-06-02  8:28             ` David Kastrup
2006-06-02 22:39               ` Richard Stallman
2006-06-02 22:45                 ` David Kastrup
2006-06-04  2:24                   ` Richard Stallman
2006-06-06 20:25                     ` Stuart D. Herring
2006-06-06 21:16                       ` Stuart D. Herring
2006-06-07  2:20                         ` Richard Stallman

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).