xml-parse-file and text properties

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* xml-parse-file and text properties
@ 2006-07-18 21:35 JD Smith
  2006-07-20 21:46 ` Richard Stallman
                   ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: JD Smith @ 2006-07-18 21:35 UTC (permalink / raw)




xml-parse-file now includes text properties in its returned list, ala:

((name . #("WV_APPLET" 0 9 (fontified nil))) (link . #("WV_APPLET.html" 0
14 (fontified nil))))

when global-font-lock-mode is on, whereas before it did not.  Was this
intended?  Any way to temporarily avoid fontification on loaded
buffers (aside from turning global-font-lock-mode off prior to
xml-parse-file)?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-18 21:35 xml-parse-file and text properties JD Smith
@ 2006-07-20 21:46 ` Richard Stallman
  2006-07-20 22:11   ` JD Smith
  2006-07-20 21:46 ` Richard Stallman
  2006-07-21 12:55 ` Stefan Monnier
  2 siblings, 1 reply; 34+ messages in thread
From: Richard Stallman @ 2006-07-20 21:46 UTC (permalink / raw)
  Cc: emacs-devel

Does this fix it?

*** xml.el	07 Feb 2006 18:16:17 -0500	1.53
--- xml.el	20 Jul 2006 16:24:41 -0400	
***************
*** 409,415 ****
  	(unless (search-forward "]]>" nil t)
  	  (error "XML: (Not Well Formed) CDATA section does not end anywhere in the document"))
  	(concat
! 	 (buffer-substring pos (match-beginning 0))
  	 (xml-parse-string))))
       ;;  DTD for the document
       ((looking-at "<!DOCTYPE")
--- 409,415 ----
  	(unless (search-forward "]]>" nil t)
  	  (error "XML: (Not Well Formed) CDATA section does not end anywhere in the document"))
  	(concat
! 	 (buffer-substring-no-properties pos (match-beginning 0))
  	 (xml-parse-string))))
       ;;  DTD for the document
       ((looking-at "<!DOCTYPE")
***************
*** 483,489 ****
  		  (nreverse children)))
  	    ;;  This was an invalid start tag (Expected ">", but didn't see it.)
  	    (error "XML: (Well-Formed) Couldn't parse tag: %s"
! 		   (buffer-substring (- (point) 10) (+ (point) 1)))))))
       (t	;; (Not one of PI, CDATA, Comment, End tag, or Start tag)
        (unless xml-sub-parser		; Usually, we error out.
  	(error "XML: (Well-Formed) Invalid character"))
--- 483,490 ----
  		  (nreverse children)))
  	    ;;  This was an invalid start tag (Expected ">", but didn't see it.)
  	    (error "XML: (Well-Formed) Couldn't parse tag: %s"
! 		   (buffer-substring-no-properties
! 		    (- (point) 10) (+ (point) 1)))))))
       (t	;; (Not one of PI, CDATA, Comment, End tag, or Start tag)
        (unless xml-sub-parser		; Usually, we error out.
  	(error "XML: (Well-Formed) Invalid character"))
***************
*** 498,504 ****
  	 (string (progn (if (search-forward "<" nil t)
  			    (forward-char -1)
  			  (goto-char (point-max)))
! 			(buffer-substring pos (point)))))
      ;; Clean up the string.  As per XML specifications, the XML
      ;; processor should always pass the whole string to the
      ;; application.  But \r's should be replaced:
--- 499,505 ----
  	 (string (progn (if (search-forward "<" nil t)
  			    (forward-char -1)
  			  (goto-char (point-max)))
! 			(buffer-substring-no-properties pos (point)))))
      ;; Clean up the string.  As per XML specifications, the XML
      ;; processor should always pass the whole string to the
      ;; application.  But \r's should be replaced:

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-18 21:35 xml-parse-file and text properties JD Smith
  2006-07-20 21:46 ` Richard Stallman
@ 2006-07-20 21:46 ` Richard Stallman
  2006-07-20 22:40   ` JD Smith
  2006-07-21 12:55 ` Stefan Monnier
  2 siblings, 1 reply; 34+ messages in thread
From: Richard Stallman @ 2006-07-20 21:46 UTC (permalink / raw)
  Cc: emacs-devel

By the way, it looks like the function xml-parse-file
would be much cleaner if written like this.  Does this work?

(defun xml-parse-file (file &optional parse-dtd parse-ns)
  "Parse the well-formed XML file FILE.
If FILE is already visited, use its buffer and don't kill it.
Returns the top node with all its children.
If PARSE-DTD is non-nil, the DTD is parsed rather than skipped.
If PARSE-NS is non-nil, then QNAMES are expanded."
  (if (get-file-buffer file)
      (with-current-buffer (get-file-buffer file)
	(save-excursion
	  (xml-parse-region (point-min)
			    (point-max)
			    (current-buffer)
			    parse-dtd parse-ns)))
    (with-temp-buffer
      (insert-file-contents file)
      (xml-parse-region (point-min)
			(point-max)
			(current-buffer)
			parse-dtd parse-ns))))

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-20 21:46 ` Richard Stallman
@ 2006-07-20 22:11   ` JD Smith
  2006-07-21  4:46     ` Richard Stallman
  0 siblings, 1 reply; 34+ messages in thread
From: JD Smith @ 2006-07-20 22:11 UTC (permalink / raw)
  Cc: Mark A. Hershberger

On Thu, 2006-07-20 at 17:46 -0400, Richard Stallman wrote:
> Does this fix it?
> 
> *** xml.el	07 Feb 2006 18:16:17 -0500	1.53
> --- xml.el	20 Jul 2006 16:24:41 -0400	
> ***************
> *** 409,415 ****
>   	(unless (search-forward "]]>" nil t)
>   	  (error "XML: (Not Well Formed) CDATA section does not end anywhere in the document"))
>   	(concat
> ! 	 (buffer-substring pos (match-beginning 0))
>   	 (xml-parse-string))))
>        ;;  DTD for the document
>        ((looking-at "<!DOCTYPE")

This doesn't fix the problem, which is likely due to all the
match-string's still scattered throughout.  If I heavy-handedly replace
all of the match-string's with match-string-no-properties, it does fix
it.

However, Mark did identify this comment someone left in the header
material:

;; Note that {buffer-substring,match-string}-no-properties were
;; formerly used in several places, but that removes composition info.

but neither of us were clear on the meaning of the statement, or why
retaining text properties in any XML parsed data would be desirable.  

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-20 21:46 ` Richard Stallman
@ 2006-07-20 22:40   ` JD Smith
  0 siblings, 0 replies; 34+ messages in thread
From: JD Smith @ 2006-07-20 22:40 UTC (permalink / raw)


On Thu, 20 Jul 2006 17:46:31 -0400, Richard Stallman wrote:

> By the way, it looks like the function xml-parse-file would be much
> cleaner if written like this.  Does this work?

This works for me (still returning the text properties with the list
data).

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-20 22:11   ` JD Smith
@ 2006-07-21  4:46     ` Richard Stallman
  2006-07-21  6:35       ` Kenichi Handa
  2006-07-21 16:13       ` Kevin Rodgers
  0 siblings, 2 replies; 34+ messages in thread
From: Richard Stallman @ 2006-07-21  4:46 UTC (permalink / raw)
  Cc: mah, emacs-devel

    ;; Note that {buffer-substring,match-string}-no-properties were
    ;; formerly used in several places, but that removes composition info.

    but neither of us were clear on the meaning of the statement, or why
    retaining text properties in any XML parsed data would be desirable.  

I think I see why.  Losing the composition info could mean that the
composed characters turn into other sequences of characters.  It
literally would change the text!

This is an ugly problem.  Many things want to get rid of most text
properties, but they don't want to forget about composition.
Logically speaking, composition is really part of the characters in
the text.  Using text properties to encode it is fundamentally
inconsistent.

We have been lucky so far, in that this inconsistency has not caused a
lot of problems -- but now our luck is running out.

I can see only two kinds of approaches:

1. Distinguish composition properties from others, and make functions
like buffer-substring-no-properties preserve composition properties,
even as they discard all other properties.

2. Change the representation of composition so it uses something other
than text properties.

#2 would be a big maintenance trouble.  It would take us a long time
to get everything working again after such a change.  We certainly
should not install such a change now, and I hope we won't need to do
it ever.

Can #1 work?

Handa, please respond.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-21  4:46     ` Richard Stallman
@ 2006-07-21  6:35       ` Kenichi Handa
  2006-07-21  7:24         ` Eli Zaretskii
  2006-07-22  4:39         ` Richard Stallman
  2006-07-21 16:13       ` Kevin Rodgers
  1 sibling, 2 replies; 34+ messages in thread
From: Kenichi Handa @ 2006-07-21  6:35 UTC (permalink / raw)
  Cc: mah, emacs-devel, jdsmith

In article <E1G3muP-0000l5-26@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:

>     ;; Note that {buffer-substring,match-string}-no-properties were
>     ;; formerly used in several places, but that removes composition info.

>     but neither of us were clear on the meaning of the statement, or why
>     retaining text properties in any XML parsed data would be desirable.  

> I think I see why.  Losing the composition info could mean that the
> composed characters turn into other sequences of characters.  It
> literally would change the text!

??? Composition is just a text property.  It doesn't change
the character sequence.  It just changes how characters are
displayed.  In that sense, it's the same as `display'
property.

> This is an ugly problem.  Many things want to get rid of most text
> properties, but they don't want to forget about composition.
> Logically speaking, composition is really part of the characters in
> the text.  Using text properties to encode it is fundamentally
> inconsistent.

I don't understand why `composition' property is that
special compared with other properties such as `display',
`fill-space'.

In addition, I don't understand why XML parsed data wants to
get rid of text properties.  What's the problem of strings
in the returned list containing text properties?

Anyway, logically speaking, composition should be internal
to display engine and should be hidden from any other place.
In emacs-unicode-2, loosing of composition property has no
problem because I've implemented a code to construct
composition automatically in display engine.

> We have been lucky so far, in that this inconsistency has not caused a
> lot of problems -- but now our luck is running out.

> I can see only two kinds of approaches:

> 1. Distinguish composition properties from others, and make functions
> like buffer-substring-no-properties preserve composition properties,
> even as they discard all other properties.

> 2. Change the representation of composition so it uses something other
> than text properties.

> #2 would be a big maintenance trouble.  It would take us a long time
> to get everything working again after such a change.  We certainly
> should not install such a change now, and I hope we won't need to do
> it ever.

> Can #1 work?

I don't know.  If text properties causes a problem in XML,
#1 doesn't solve the current problem.  If text properties is
not a problem, we don't need #1.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-21  6:35       ` Kenichi Handa
@ 2006-07-21  7:24         ` Eli Zaretskii
  2006-07-21  8:14           ` Kenichi Handa
  2006-07-22  4:39         ` Richard Stallman
  1 sibling, 1 reply; 34+ messages in thread
From: Eli Zaretskii @ 2006-07-21  7:24 UTC (permalink / raw)
  Cc: mah, jdsmith, emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Date: Fri, 21 Jul 2006 15:35:03 +0900
> Cc: mah@everybody.org, emacs-devel@gnu.org, jdsmith@as.arizona.edu
> 
> > I think I see why.  Losing the composition info could mean that the
> > composed characters turn into other sequences of characters.  It
> > literally would change the text!
> 
> ??? Composition is just a text property.  It doesn't change
> the character sequence.  It just changes how characters are
> displayed.

If the text is displayed differently due to loss of the composition
property, would it still be readable by those who know the language?

If not, then removing the composition property _does_ have the effect
of changing the text _as_i_is_displayed_to_the_user_.

> I don't understand why `composition' property is that
> special compared with other properties such as `display',
> `fill-space'.

I think those are special as well, but I may be wrong.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-21  7:24         ` Eli Zaretskii
@ 2006-07-21  8:14           ` Kenichi Handa
  0 siblings, 0 replies; 34+ messages in thread
From: Kenichi Handa @ 2006-07-21  8:14 UTC (permalink / raw)
  Cc: mah, jdsmith, emacs-devel

In article <uu05bh130.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

>> > I think I see why.  Losing the composition info could mean that the
>> > composed characters turn into other sequences of characters.  It
>> > literally would change the text!
>> 
>> ??? Composition is just a text property.  It doesn't change
>> the character sequence.  It just changes how characters are
>> displayed.

> If the text is displayed differently due to loss of the composition
> property, would it still be readable by those who know the language?

It depends on language/script and the depth of knowledge.

> If not, then removing the composition property _does_ have the effect
> of changing the text _as_i_is_displayed_to_the_user_.

Of course I understand that.  But, in the context of XML
handling, I don't know what is the problem other than the
logical sequence of characters.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-18 21:35 xml-parse-file and text properties JD Smith
  2006-07-20 21:46 ` Richard Stallman
  2006-07-20 21:46 ` Richard Stallman
@ 2006-07-21 12:55 ` Stefan Monnier
  2006-07-21 17:34   ` JD Smith
  2 siblings, 1 reply; 34+ messages in thread
From: Stefan Monnier @ 2006-07-21 12:55 UTC (permalink / raw)
  Cc: emacs-devel

> xml-parse-file now includes text properties in its returned list, ala:

> ((name . #("WV_APPLET" 0 9 (fontified nil))) (link . #("WV_APPLET.html" 0
> 14 (fontified nil))))

> when global-font-lock-mode is on, whereas before it did not.  Was this
> intended?  Any way to temporarily avoid fontification on loaded
> buffers (aside from turning global-font-lock-mode off prior to
> xml-parse-file)?

Could you explain why the text properties cause problems?


        Stefan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-21  4:46     ` Richard Stallman
  2006-07-21  6:35       ` Kenichi Handa
@ 2006-07-21 16:13       ` Kevin Rodgers
  2006-07-21 23:33         ` Kevin Rodgers
  1 sibling, 1 reply; 34+ messages in thread
From: Kevin Rodgers @ 2006-07-21 16:13 UTC (permalink / raw)


Richard Stallman wrote:
>     ;; Note that {buffer-substring,match-string}-no-properties were
>     ;; formerly used in several places, but that removes composition info.
> 
>     but neither of us were clear on the meaning of the statement, or why
>     retaining text properties in any XML parsed data would be desirable.  
> 
> I think I see why.  Losing the composition info could mean that the
> composed characters turn into other sequences of characters.  It
> literally would change the text!
> 
> This is an ugly problem.  Many things want to get rid of most text
> properties, but they don't want to forget about composition.
> Logically speaking, composition is really part of the characters in
> the text.  Using text properties to encode it is fundamentally
> inconsistent.
> 
> We have been lucky so far, in that this inconsistency has not caused a
> lot of problems -- but now our luck is running out.
> 
> I can see only two kinds of approaches:
> 
> 1. Distinguish composition properties from others, and make functions
> like buffer-substring-no-properties preserve composition properties,
> even as they discard all other properties.
> 
> 2. Change the representation of composition so it uses something other
> than text properties.
> 
> #2 would be a big maintenance trouble.  It would take us a long time
> to get everything working again after such a change.  We certainly
> should not install such a change now, and I hope we won't need to do
> it ever.
> 
> Can #1 work?

How about extending buffer-substring-no-properties with an optional
KEEP-PROPERTIES argument, a list of text properties to preserve in
the returned string?  Then xml-parse-file could call it with the
list of composition properties (whatever they are).

-- 
Kevin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-21 12:55 ` Stefan Monnier
@ 2006-07-21 17:34   ` JD Smith
  2006-07-21 20:22     ` Stefan Monnier
  2006-07-21 20:52     ` Thien-Thi Nguyen
  0 siblings, 2 replies; 34+ messages in thread
From: JD Smith @ 2006-07-21 17:34 UTC (permalink / raw)

On Fri, 21 Jul 2006 08:55:57 -0400, Stefan Monnier wrote:

>> xml-parse-file now includes text properties in its returned list, ala:
> 
>> ((name . #("WV_APPLET" 0 9 (fontified nil))) (link . #("WV_APPLET.html"
>> 0 14 (fontified nil))))
> 
>> when global-font-lock-mode is on, whereas before it did not.  Was this
>> intended?  Any way to temporarily avoid fontification on loaded buffers
>> (aside from turning global-font-lock-mode off prior to xml-parse-file)?
> 
> Could you explain why the text properties cause problems?

I'm parsing a very large XML file (a document link and calling syntax
catalog for IDLWAVE), trimming it and making slight modifications, and
then writing it out to file as a big set of sexp's for later recovery,
primarily for reasons of speed.  This file is read whenever IDLWAVE
mode is first entered.

With text properties (amounting simply to #(" " 0 5 (fontified nil))'
constructs), the file is almost three times as large, erasing much of
the speed advantage of translating to a LISP form in the first place.
I use `prin1' to write the lists.  I suppose I could spin through the
list first and remove any text properties on strings, but it seems
silly that parsing an XML file never loaded into an active buffer
should be laden with inert properties like '(fontified nil).  If there
were a simple way to prevent that (other than turning global-font-lock
off), that would suffice for my purposes, though not of course address the
larger issue of text properties in XML parsed lists in general.

JD

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-21 17:34   ` JD Smith
@ 2006-07-21 20:22     ` Stefan Monnier
  2006-07-21 21:50       ` JD Smith
  2006-07-22 15:49       ` Richard Stallman
  2006-07-21 20:52     ` Thien-Thi Nguyen
  1 sibling, 2 replies; 34+ messages in thread
From: Stefan Monnier @ 2006-07-21 20:22 UTC (permalink / raw)
  Cc: emacs-devel

>>> xml-parse-file now includes text properties in its returned list, ala:
>> 
>>> ((name . #("WV_APPLET" 0 9 (fontified nil))) (link . #("WV_APPLET.html"
>>> 0 14 (fontified nil))))
>> 
>>> when global-font-lock-mode is on, whereas before it did not.  Was this
>>> intended?  Any way to temporarily avoid fontification on loaded buffers
>>> (aside from turning global-font-lock-mode off prior to xml-parse-file)?
>> 
>> Could you explain why the text properties cause problems?

> I'm parsing a very large XML file (a document link and calling syntax
> catalog for IDLWAVE), trimming it and making slight modifications, and
> then writing it out to file as a big set of sexp's for later recovery,
> primarily for reasons of speed.  This file is read whenever IDLWAVE
> mode is first entered.

> With text properties (amounting simply to #(" " 0 5 (fontified nil))'
> constructs), the file is almost three times as large, erasing much of
> the speed advantage of translating to a LISP form in the first place.
> I use `prin1' to write the lists.  I suppose I could spin through the
> list first and remove any text properties on strings, but it seems
> silly that parsing an XML file never loaded into an active buffer
> should be laden with inert properties like '(fontified nil).  If there
> were a simple way to prevent that (other than turning global-font-lock
> off), that would suffice for my purposes, though not of course address the
> larger issue of text properties in XML parsed lists in general.

Clearly, in the case of xml-parse-file, I see no reason why we shouldn't
strip all properties.  After all, it's supposed to parse the *file*, not the
buffer, and files don't have those text properties.

The argument that we need to preserve the `composition' property doesn't
seem valid: this property can be computed from the sequence of chars.


        Stefan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-21 17:34   ` JD Smith
  2006-07-21 20:22     ` Stefan Monnier
@ 2006-07-21 20:52     ` Thien-Thi Nguyen
  2006-07-21 21:45       ` JD Smith
  1 sibling, 1 reply; 34+ messages in thread
From: Thien-Thi Nguyen @ 2006-07-21 20:52 UTC (permalink / raw)
  Cc: emacs-devel

JD Smith <jdsmith@as.arizona.edu> writes:

> I'm parsing a very large XML file
> [...]
> never loaded into an active buffer

do you use `xml-parse-file' w/ a FILE that refers to a file not
currently visited (in a buffer)?  looks like `xml-parse-file' uses
`find-file' to handle that case.  maybe something like the following
would give better results?

thi

________________________________________________________
*** xml.el	6 Feb 2006 14:33:36 -0000	1.53
--- xml.el	21 Jul 2006 20:46:44 -0000
***************
*** 170,177 ****
  	(progn
  	  (set-buffer (get-file-buffer file))
  	  (setq keep (point)))
!       (let (auto-mode-alist)		; no need for xml-mode
! 	(find-file file)))
  
      (let ((xml (xml-parse-region (point-min)
  				 (point-max)
--- 170,177 ----
  	(progn
  	  (set-buffer (get-file-buffer file))
  	  (setq keep (point)))
!       (set-buffer (generate-new-buffer " *xml work*"))
!       (insert-file-contents file))
  
      (let ((xml (xml-parse-region (point-min)
  				 (point-max)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-21 20:52     ` Thien-Thi Nguyen
@ 2006-07-21 21:45       ` JD Smith
  2006-07-22  9:15         ` Eli Zaretskii
  0 siblings, 1 reply; 34+ messages in thread
From: JD Smith @ 2006-07-21 21:45 UTC (permalink / raw)

On Fri, 21 Jul 2006 16:52:37 -0400, Thien-Thi Nguyen wrote:

> JD Smith <jdsmith@as.arizona.edu> writes:
> 
>> I'm parsing a very large XML file
>> [...]
>> never loaded into an active buffer
> 
> do you use `xml-parse-file' w/ a FILE that refers to a file not currently
> visited (in a buffer)?  looks like `xml-parse-file' uses `find-file' to
> handle that case.  maybe something like the following would give better
> results?

This is similar to the improved method Richard proposed yesterday
(insert-file-contents using a temporary buffer).  Remarkably enough
(and highly surprising to me), even inserting the XML file contents in
a temporary buffer is enough to get '(fontified nil) text properties
added all over, which xml-parse-file dutifully returns (unless
modified with *-no-properties).

JD

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-21 20:22     ` Stefan Monnier
@ 2006-07-21 21:50       ` JD Smith
  2006-07-22 15:49       ` Richard Stallman
  1 sibling, 0 replies; 34+ messages in thread
From: JD Smith @ 2006-07-21 21:50 UTC (permalink / raw)


On Fri, 21 Jul 2006 16:22:22 -0400, Stefan Monnier wrote:

> Clearly, in the case of xml-parse-file, I see no reason why we shouldn't
> strip all properties.  After all, it's supposed to parse the *file*, not the
> buffer, and files don't have those text properties.

It parses the buffer directly if the file is already loaded, but I agree
with your point.
 
> The argument that we need to preserve the `composition' property doesn't
> seem valid: this property can be computed from the sequence of chars.

The remaining question is are there *any* xml- routines for which
returning properties *is* desirable.  Otherwise, replacing
{buffer,match}-string with the -no-properties version throughout
xml.el works.

JD

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-21 16:13       ` Kevin Rodgers
@ 2006-07-21 23:33         ` Kevin Rodgers
  0 siblings, 0 replies; 34+ messages in thread
From: Kevin Rodgers @ 2006-07-21 23:33 UTC (permalink / raw)


Kevin Rodgers wrote:
> How about extending buffer-substring-no-properties with an optional
> KEEP-PROPERTIES argument, a list of text properties to preserve in
> the returned string?  Then xml-parse-file could call it with the
> list of composition properties (whatever they are).

I have a working Lisp implementation, if anyone wants to try it.

-- 
Kevin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-21  6:35       ` Kenichi Handa
  2006-07-21  7:24         ` Eli Zaretskii
@ 2006-07-22  4:39         ` Richard Stallman
  1 sibling, 0 replies; 34+ messages in thread
From: Richard Stallman @ 2006-07-22  4:39 UTC (permalink / raw)
  Cc: mah, emacs-devel, jdsmith

    ??? Composition is just a text property.  It doesn't change
    the character sequence.  It just changes how characters are
    displayed.

If it replaces two characters with one character, that makes
a difference what the text is.  You've said that users would
perceive them as different characters.

Does the composition property affect the sequence of characters that
would be written to a file?  I have a vague memory that it does.

    In emacs-unicode-2, loosing of composition property has no
    problem because I've implemented a code to construct
    composition automatically in display engine.

That definitely seems like an improvement; could you tell me more?
However, at present, we need to make the right things happen in Emacs
22.

    > 1. Distinguish composition properties from others, and make functions
    > like buffer-substring-no-properties preserve composition properties,
    > even as they discard all other properties.

    > Can #1 work?

    I don't know.  If text properties causes a problem in XML,
    #1 doesn't solve the current problem.  If text properties is
    not a problem, we don't need #1.

He said that the text properties in XML cause a serious problem when
they are numerous.  Keeping all is no good; discarding all text
properties would be wrong when there is a composition property.
However, keeping just the composition property would be fine, since
most of the time there won't be any of them.

However, the problem of losing composition properties is not limited
to XML.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-21 21:45       ` JD Smith
@ 2006-07-22  9:15         ` Eli Zaretskii
  2006-07-24 16:44           ` JD Smith
  0 siblings, 1 reply; 34+ messages in thread
From: Eli Zaretskii @ 2006-07-22  9:15 UTC (permalink / raw)
  Cc: emacs-devel

> From: JD Smith <jdsmith@as.arizona.edu>
> Date: Fri, 21 Jul 2006 14:45:27 -0700
> 
> This is similar to the improved method Richard proposed yesterday
> (insert-file-contents using a temporary buffer).  Remarkably enough
> (and highly surprising to me), even inserting the XML file contents in
> a temporary buffer is enough to get '(fontified nil) text properties
> added all over

I don't see anything surprising here, since font-lock is now ON by
default.

You should be able to overcome this if you turn off font-lock-mode in
the temporary buffer, before inserting the file's contents.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-21 20:22     ` Stefan Monnier
  2006-07-21 21:50       ` JD Smith
@ 2006-07-22 15:49       ` Richard Stallman
  2006-07-24  1:51         ` Kenichi Handa
  1 sibling, 1 reply; 34+ messages in thread
From: Richard Stallman @ 2006-07-22 15:49 UTC (permalink / raw)
  Cc: emacs-devel, jdsmith

    The argument that we need to preserve the `composition' property doesn't
    seem valid: this property can be computed from the sequence of chars.

Handa, is it true that the composition property can always be recomputed
from the sequence of characters?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-22 15:49       ` Richard Stallman
@ 2006-07-24  1:51         ` Kenichi Handa
  2006-07-24  3:17           ` Stefan Monnier
  2006-07-24 18:22           ` Richard Stallman
  0 siblings, 2 replies; 34+ messages in thread
From: Kenichi Handa @ 2006-07-24  1:51 UTC (permalink / raw)
  Cc: jdsmith, monnier, emacs-devel

In article <E1G4Jjd-0004fU-D5@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:

>     The argument that we need to preserve the `composition' property doesn't
>     seem valid: this property can be computed from the sequence of chars.

> Handa, is it true that the composition property can always be recomputed
> from the sequence of characters?

Usually yes, but not always.  All normal compositions
(i.e. compositions for displaying a script in the correct
way) are registered in composition-function-table (a
char-table).  So, by looking up the table for each
character, we can recover compositions (though very slow).
But, it's possible to manually compose some text.  For
instance please try this:

(insert (compose-string "+o" 0 2 '(?+ (tc . bc) ?o)))

It inserts "+o" with composition property that can't be
recovered automatically once lost.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-24  1:51         ` Kenichi Handa
@ 2006-07-24  3:17           ` Stefan Monnier
  2006-07-24  4:36             ` Kenichi Handa
  2006-07-24 18:22           ` Richard Stallman
  1 sibling, 1 reply; 34+ messages in thread
From: Stefan Monnier @ 2006-07-24  3:17 UTC (permalink / raw)
  Cc: jdsmith, rms, emacs-devel

> (insert (compose-string "+o" 0 2 '(?+ (tc . bc) ?o)))

> It inserts "+o" with composition property that can't be
> recovered automatically once lost.

But that's a hypothetical example.


        Stefan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-24  3:17           ` Stefan Monnier
@ 2006-07-24  4:36             ` Kenichi Handa
  0 siblings, 0 replies; 34+ messages in thread
From: Kenichi Handa @ 2006-07-24  4:36 UTC (permalink / raw)
  Cc: emacs-devel, rms, jdsmith

In article <jwv64hnwv4k.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> (insert (compose-string "+o" 0 2 '(?+ (tc . bc) ?o)))
>> It inserts "+o" with composition property that can't be
>> recovered automatically once lost.

> But that's a hypothetical example.

Yes.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-22  9:15         ` Eli Zaretskii
@ 2006-07-24 16:44           ` JD Smith
  2006-07-25 16:05             ` JD Smith
  0 siblings, 1 reply; 34+ messages in thread
From: JD Smith @ 2006-07-24 16:44 UTC (permalink / raw)


On Sat, 22 Jul 2006 12:15:34 +0300, Eli Zaretskii wrote:

>> From: JD Smith <jdsmith@as.arizona.edu>
>> Date: Fri, 21 Jul 2006 14:45:27 -0700
>> 
>> This is similar to the improved method Richard proposed yesterday
>> (insert-file-contents using a temporary buffer).  Remarkably enough
>> (and highly surprising to me), even inserting the XML file contents in
>> a temporary buffer is enough to get '(fontified nil) text properties
>> added all over
> 
> I don't see anything surprising here, since font-lock is now ON by
> default.
> 
> You should be able to overcome this if you turn off font-lock-mode in
> the temporary buffer, before inserting the file's contents.

This was my mistake.  In fact, with Richard's formulation:

    (with-temp-buffer
      (insert-file-contents file)
      (xml-parse-region (point-min)
			(point-max)
			(current-buffer)
			parse-dtd parse-ns))))


no font-lock text properties ever get added to the temporary buffer,
despite global-font-lock being on.  It turns out I was pre-loading the
file into a buffer to prevent warnings about its read-only status, so
this code path was not being taken.

With the patch below, xml-parse-file only returns unwanted text
properties when a file is already loaded into a buffer.  Should we
install it?  This doesn't address the larger issue of whether
xml-parse-file should ever return text-properties, but it is simple and
sensible.

JD



*** xml.el	06 Feb 2006 07:33:36 -0700	1.53
--- xml.el	24 Jul 2006 09:40:07 -0700	
***************
*** 161,187 ****
  ;;;###autoload
  (defun xml-parse-file (file &optional parse-dtd parse-ns)
    "Parse the well-formed XML file FILE.
! If FILE is already visited, use its buffer and don't kill it.
! Returns the top node with all its children.
! If PARSE-DTD is non-nil, the DTD is parsed rather than skipped.
! If PARSE-NS is non-nil, then QNAMES are expanded."
!   (let ((keep))
!     (if (get-file-buffer file)
! 	(progn
! 	  (set-buffer (get-file-buffer file))
! 	  (setq keep (point)))
!       (let (auto-mode-alist)		; no need for xml-mode
! 	(find-file file)))
! 
!     (let ((xml (xml-parse-region (point-min)
! 				 (point-max)
! 				 (current-buffer)
! 				 parse-dtd parse-ns)))
!       (if keep
! 	  (goto-char keep)
! 	(kill-buffer (current-buffer)))
!       xml)))
! 
  
  (defvar xml-name-re)
  (defvar xml-entity-value-re)
--- 161,182 ----
  ;;;###autoload
  (defun xml-parse-file (file &optional parse-dtd parse-ns)
    "Parse the well-formed XML file FILE.
! If FILE is already visited, use its buffer and don't kill it. Returns the
! top node with all its children. If PARSE-DTD is non-nil, the DTD is parsed
! rather than skipped. If PARSE-NS is non-nil, then QNAMES are expanded."
!   (if (get-file-buffer file)
!       (with-current-buffer (get-file-buffer file)
! 	(save-excursion
! 	  (xml-parse-region (point-min)
! 			    (point-max)
! 			    (current-buffer)
! 			    parse-dtd parse-ns)))
!     (with-temp-buffer
!       (insert-file-contents file)
!       (xml-parse-region (point-min)
! 			(point-max)
! 			(current-buffer)
! 			parse-dtd parse-ns))))
  
  (defvar xml-name-re)
  (defvar xml-entity-value-re)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-24  1:51         ` Kenichi Handa
  2006-07-24  3:17           ` Stefan Monnier
@ 2006-07-24 18:22           ` Richard Stallman
  2006-07-24 20:38             ` Stuart D. Herring
  2006-07-24 20:51             ` Stefan Monnier
  1 sibling, 2 replies; 34+ messages in thread
From: Richard Stallman @ 2006-07-24 18:22 UTC (permalink / raw)
  Cc: jdsmith, monnier, emacs-devel

    Usually yes, but not always.  All normal compositions
    (i.e. compositions for displaying a script in the correct
    way) are registered in composition-function-table (a
    char-table).  So, by looking up the table for each
    character, we can recover compositions (though very slow).

Interesting.

    But, it's possible to manually compose some text.

We can consider that improper use of the composition property;
we need not cater to it.

That leaves us in a peculiar in-between situation.  If discarding
these properties altered what text would be written in a file, it
would be very bad, and it would be clear we need to preserve these
properties.  But that is not the case.  Nonetheless, the text will
display wrong if subsequently reinserted in a buffer.  All in all, I
think that is enough reason for xml.el to preserve the composition
property.

So the question is how.

One first step would be to write a function that operates on a string
and discards all text properties except composition.  More generally,
all except a certain specified list of property names.

I think that should be done at the C level for speed.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-24 18:22           ` Richard Stallman
@ 2006-07-24 20:38             ` Stuart D. Herring
  2006-07-25  3:09               ` Richard Stallman
  2006-07-24 20:51             ` Stefan Monnier
  1 sibling, 1 reply; 34+ messages in thread
From: Stuart D. Herring @ 2006-07-24 20:38 UTC (permalink / raw)
  Cc: emacs-devel

> One first step would be to write a function that operates on a string
> and discards all text properties except composition.  More generally,
> all except a certain specified list of property names.

It seems cleaner to me to define a property symbol (like 'composition or
'fontified) as "necessary" to the text if it (the symbol) has a non-nil
value for one of its properties -- say, 'necessary, or 'textual.  Then
there could be `strip-superficial-properties' that removed text properties
whose names lacked the property.  I think this would be nearly as fast as
having a list to retain, but it would break for text properties whose name
aren't symbols.  It seems to me, though, that anything so important as to
need retention would be important enough to warrant a symbol for its name.

Davis

-- 
This product is sold by volume, not by mass.  If it appears too dense or
too sparse, it is because mass-energy conversion has occurred during
shipping.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-24 18:22           ` Richard Stallman
  2006-07-24 20:38             ` Stuart D. Herring
@ 2006-07-24 20:51             ` Stefan Monnier
  2006-07-25  3:09               ` Richard Stallman
  1 sibling, 1 reply; 34+ messages in thread
From: Stefan Monnier @ 2006-07-24 20:51 UTC (permalink / raw)
  Cc: emacs-devel, jdsmith, Kenichi Handa

> One first step would be to write a function that operates on a string
> and discards all text properties except composition.  More generally,
> all except a certain specified list of property names.

> I think that should be done at the C level for speed.

Note that this won't be useful for Emacs-23 any more.  I'm not sure it's
wort the trouble.  After all, it's not specific to xml.el, far from that.
Are we going to go through every single using of buffer-substring and
buffer-substring-no-properties and see what should be done for each of them?

Stripping the `display' property may also result is text that's saved
correctly but displayed incorrectly.  Even `face' can do that (e.g.
with X-Symbol).  I don't think we want to go down that road right now.

        Stefan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-24 20:51             ` Stefan Monnier
@ 2006-07-25  3:09               ` Richard Stallman
  0 siblings, 0 replies; 34+ messages in thread
From: Richard Stallman @ 2006-07-25  3:09 UTC (permalink / raw)
  Cc: emacs-devel, jdsmith, handa

    Note that this won't be useful for Emacs-23 any more.  I'm not sure it's
    wort the trouble.  After all, it's not specific to xml.el, far from that.
    Are we going to go through every single using of buffer-substring and
    buffer-substring-no-properties and see what should be done for each of them?

Ok, I agree.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-24 20:38             ` Stuart D. Herring
@ 2006-07-25  3:09               ` Richard Stallman
  2006-07-25 14:00                 ` Stefan Monnier
  0 siblings, 1 reply; 34+ messages in thread
From: Richard Stallman @ 2006-07-25  3:09 UTC (permalink / raw)
  Cc: emacs-devel

    It seems cleaner to me to define a property symbol (like `composition' or
    `fontified') as "necessary" to the text if it (the symbol) has a non-nil
    value for one of its properties -- say, `necessary', or `textual'.  Then
    there could be `strip-superficial-properties' that removed text properties
    whose names lacked the property.

I think that is a good idea.

Stefan convinced me that we do not need this now, but I think it would
be useful to start writing it, for installation after the release.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-25  3:09               ` Richard Stallman
@ 2006-07-25 14:00                 ` Stefan Monnier
  2006-07-25 22:15                   ` Richard Stallman
  0 siblings, 1 reply; 34+ messages in thread
From: Stefan Monnier @ 2006-07-25 14:00 UTC (permalink / raw)
  Cc: emacs-devel

>     It seems cleaner to me to define a property symbol (like `composition' or
>     `fontified') as "necessary" to the text if it (the symbol) has a non-nil
>     value for one of its properties -- say, `necessary', or `textual'.  Then
>     there could be `strip-superficial-properties' that removed text properties
>     whose names lacked the property.

> I think that is a good idea.

Maybe an approach along the lines of what we do with front-sticky would make
sense, this way every particular use of a property can be separately marked
as needing to be kept or not.


        Stefan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-24 16:44           ` JD Smith
@ 2006-07-25 16:05             ` JD Smith
  2006-07-25 16:27               ` Stefan Monnier
  0 siblings, 1 reply; 34+ messages in thread
From: JD Smith @ 2006-07-25 16:05 UTC (permalink / raw)




Just for completeness: Richard checked in the changes below, and they
have the effect of avoiding the unwanted text properties for parsed
files not already loaded in a buffer.

> *** xml.el	06 Feb 2006 07:33:36 -0700	1.53
> --- xml.el	24 Jul 2006 09:40:07 -0700	
> ***************
> *** 161,187 ****
>   ;;;###autoload
>   (defun xml-parse-file (file &optional parse-dtd parse-ns)
>     "Parse the well-formed XML file FILE.
> ! If FILE is already visited, use its buffer and don't kill it.
> ! Returns the top node with all its children.
> ! If PARSE-DTD is non-nil, the DTD is parsed rather than skipped.
> ! If PARSE-NS is non-nil, then QNAMES are expanded."
> !   (let ((keep))
> !     (if (get-file-buffer file)
> ! 	(progn
> ! 	  (set-buffer (get-file-buffer file))
> ! 	  (setq keep (point)))
> !       (let (auto-mode-alist)		; no need for xml-mode
> ! 	(find-file file)))
> ! 
> !     (let ((xml (xml-parse-region (point-min)
> ! 				 (point-max)
> ! 				 (current-buffer)
> ! 				 parse-dtd parse-ns)))
> !       (if keep
> ! 	  (goto-char keep)
> ! 	(kill-buffer (current-buffer)))
> !       xml)))
> ! 
>   
>   (defvar xml-name-re)
>   (defvar xml-entity-value-re)
> --- 161,182 ----
>   ;;;###autoload
>   (defun xml-parse-file (file &optional parse-dtd parse-ns)
>     "Parse the well-formed XML file FILE.
> ! If FILE is already visited, use its buffer and don't kill it. Returns the
> ! top node with all its children. If PARSE-DTD is non-nil, the DTD is parsed
> ! rather than skipped. If PARSE-NS is non-nil, then QNAMES are expanded."
> !   (if (get-file-buffer file)
> !       (with-current-buffer (get-file-buffer file)
> ! 	(save-excursion
> ! 	  (xml-parse-region (point-min)
> ! 			    (point-max)
> ! 			    (current-buffer)
> ! 			    parse-dtd parse-ns)))
> !     (with-temp-buffer
> !       (insert-file-contents file)
> !       (xml-parse-region (point-min)
> ! 			(point-max)
> ! 			(current-buffer)
> ! 			parse-dtd parse-ns))))
>   
>   (defvar xml-name-re)
>   (defvar xml-entity-value-re)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-25 16:05             ` JD Smith
@ 2006-07-25 16:27               ` Stefan Monnier
  2006-07-25 19:16                 ` JD Smith
  0 siblings, 1 reply; 34+ messages in thread
From: Stefan Monnier @ 2006-07-25 16:27 UTC (permalink / raw)
  Cc: emacs-devel

> Just for completeness: Richard checked in the changes below, and they
> have the effect of avoiding the unwanted text properties for parsed
> files not already loaded in a buffer.

Getting different results depending on whether or not the user happens to
be visiting the buffer is hardly a feature.


        Stefan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-25 16:27               ` Stefan Monnier
@ 2006-07-25 19:16                 ` JD Smith
  0 siblings, 0 replies; 34+ messages in thread
From: JD Smith @ 2006-07-25 19:16 UTC (permalink / raw)
  Cc: emacs-devel

On Tue, 2006-07-25 at 12:27 -0400, Stefan Monnier wrote:
> > Just for completeness: Richard checked in the changes below, and they
> > have the effect of avoiding the unwanted text properties for parsed
> > files not already loaded in a buffer.
> 
> Getting different results depending on whether or not the user happens to
> be visiting the buffer is hardly a feature.

Agreed, but of the misfeatures, returning text properties when parsing a
file which has never even been visited is the more odious, in my
opinion.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: xml-parse-file and text properties
  2006-07-25 14:00                 ` Stefan Monnier
@ 2006-07-25 22:15                   ` Richard Stallman
  0 siblings, 0 replies; 34+ messages in thread
From: Richard Stallman @ 2006-07-25 22:15 UTC (permalink / raw)
  Cc: emacs-devel

    Maybe an approach along the lines of what we do with front-sticky would make
    sense, this way every particular use of a property can be separately marked
    as needing to be kept or not.

I would not object to it, but I don't think this complexity is needed.
I think that any given property will always want the same treatment
here.

If we find a case where that is not so, we could still add this extra
complexity then.

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2006-07-25 22:15 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-18 21:35 xml-parse-file and text properties JD Smith
2006-07-20 21:46 ` Richard Stallman
2006-07-20 22:11   ` JD Smith
2006-07-21  4:46     ` Richard Stallman
2006-07-21  6:35       ` Kenichi Handa
2006-07-21  7:24         ` Eli Zaretskii
2006-07-21  8:14           ` Kenichi Handa
2006-07-22  4:39         ` Richard Stallman
2006-07-21 16:13       ` Kevin Rodgers
2006-07-21 23:33         ` Kevin Rodgers
2006-07-20 21:46 ` Richard Stallman
2006-07-20 22:40   ` JD Smith
2006-07-21 12:55 ` Stefan Monnier
2006-07-21 17:34   ` JD Smith
2006-07-21 20:22     ` Stefan Monnier
2006-07-21 21:50       ` JD Smith
2006-07-22 15:49       ` Richard Stallman
2006-07-24  1:51         ` Kenichi Handa
2006-07-24  3:17           ` Stefan Monnier
2006-07-24  4:36             ` Kenichi Handa
2006-07-24 18:22           ` Richard Stallman
2006-07-24 20:38             ` Stuart D. Herring
2006-07-25  3:09               ` Richard Stallman
2006-07-25 14:00                 ` Stefan Monnier
2006-07-25 22:15                   ` Richard Stallman
2006-07-24 20:51             ` Stefan Monnier
2006-07-25  3:09               ` Richard Stallman
2006-07-21 20:52     ` Thien-Thi Nguyen
2006-07-21 21:45       ` JD Smith
2006-07-22  9:15         ` Eli Zaretskii
2006-07-24 16:44           ` JD Smith
2006-07-25 16:05             ` JD Smith
2006-07-25 16:27               ` Stefan Monnier
2006-07-25 19:16                 ` JD Smith

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).