* xml-parse-file and text properties @ 2006-07-18 21:35 JD Smith 2006-07-20 21:46 ` Richard Stallman ` (2 more replies) 0 siblings, 3 replies; 34+ messages in thread From: JD Smith @ 2006-07-18 21:35 UTC (permalink / raw) xml-parse-file now includes text properties in its returned list, ala: ((name . #("WV_APPLET" 0 9 (fontified nil))) (link . #("WV_APPLET.html" 0 14 (fontified nil)))) when global-font-lock-mode is on, whereas before it did not. Was this intended? Any way to temporarily avoid fontification on loaded buffers (aside from turning global-font-lock-mode off prior to xml-parse-file)? ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-18 21:35 xml-parse-file and text properties JD Smith @ 2006-07-20 21:46 ` Richard Stallman 2006-07-20 22:11 ` JD Smith 2006-07-20 21:46 ` Richard Stallman 2006-07-21 12:55 ` Stefan Monnier 2 siblings, 1 reply; 34+ messages in thread From: Richard Stallman @ 2006-07-20 21:46 UTC (permalink / raw) Cc: emacs-devel Does this fix it? *** xml.el 07 Feb 2006 18:16:17 -0500 1.53 --- xml.el 20 Jul 2006 16:24:41 -0400 *************** *** 409,415 **** (unless (search-forward "]]>" nil t) (error "XML: (Not Well Formed) CDATA section does not end anywhere in the document")) (concat ! (buffer-substring pos (match-beginning 0)) (xml-parse-string)))) ;; DTD for the document ((looking-at "<!DOCTYPE") --- 409,415 ---- (unless (search-forward "]]>" nil t) (error "XML: (Not Well Formed) CDATA section does not end anywhere in the document")) (concat ! (buffer-substring-no-properties pos (match-beginning 0)) (xml-parse-string)))) ;; DTD for the document ((looking-at "<!DOCTYPE") *************** *** 483,489 **** (nreverse children))) ;; This was an invalid start tag (Expected ">", but didn't see it.) (error "XML: (Well-Formed) Couldn't parse tag: %s" ! (buffer-substring (- (point) 10) (+ (point) 1))))))) (t ;; (Not one of PI, CDATA, Comment, End tag, or Start tag) (unless xml-sub-parser ; Usually, we error out. (error "XML: (Well-Formed) Invalid character")) --- 483,490 ---- (nreverse children))) ;; This was an invalid start tag (Expected ">", but didn't see it.) (error "XML: (Well-Formed) Couldn't parse tag: %s" ! (buffer-substring-no-properties ! (- (point) 10) (+ (point) 1))))))) (t ;; (Not one of PI, CDATA, Comment, End tag, or Start tag) (unless xml-sub-parser ; Usually, we error out. (error "XML: (Well-Formed) Invalid character")) *************** *** 498,504 **** (string (progn (if (search-forward "<" nil t) (forward-char -1) (goto-char (point-max))) ! (buffer-substring pos (point))))) ;; Clean up the string. As per XML specifications, the XML ;; processor should always pass the whole string to the ;; application. But \r's should be replaced: --- 499,505 ---- (string (progn (if (search-forward "<" nil t) (forward-char -1) (goto-char (point-max))) ! (buffer-substring-no-properties pos (point))))) ;; Clean up the string. As per XML specifications, the XML ;; processor should always pass the whole string to the ;; application. But \r's should be replaced: ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-20 21:46 ` Richard Stallman @ 2006-07-20 22:11 ` JD Smith 2006-07-21 4:46 ` Richard Stallman 0 siblings, 1 reply; 34+ messages in thread From: JD Smith @ 2006-07-20 22:11 UTC (permalink / raw) Cc: Mark A. Hershberger On Thu, 2006-07-20 at 17:46 -0400, Richard Stallman wrote: > Does this fix it? > > *** xml.el 07 Feb 2006 18:16:17 -0500 1.53 > --- xml.el 20 Jul 2006 16:24:41 -0400 > *************** > *** 409,415 **** > (unless (search-forward "]]>" nil t) > (error "XML: (Not Well Formed) CDATA section does not end anywhere in the document")) > (concat > ! (buffer-substring pos (match-beginning 0)) > (xml-parse-string)))) > ;; DTD for the document > ((looking-at "<!DOCTYPE") This doesn't fix the problem, which is likely due to all the match-string's still scattered throughout. If I heavy-handedly replace all of the match-string's with match-string-no-properties, it does fix it. However, Mark did identify this comment someone left in the header material: ;; Note that {buffer-substring,match-string}-no-properties were ;; formerly used in several places, but that removes composition info. but neither of us were clear on the meaning of the statement, or why retaining text properties in any XML parsed data would be desirable. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-20 22:11 ` JD Smith @ 2006-07-21 4:46 ` Richard Stallman 2006-07-21 6:35 ` Kenichi Handa 2006-07-21 16:13 ` Kevin Rodgers 0 siblings, 2 replies; 34+ messages in thread From: Richard Stallman @ 2006-07-21 4:46 UTC (permalink / raw) Cc: mah, emacs-devel ;; Note that {buffer-substring,match-string}-no-properties were ;; formerly used in several places, but that removes composition info. but neither of us were clear on the meaning of the statement, or why retaining text properties in any XML parsed data would be desirable. I think I see why. Losing the composition info could mean that the composed characters turn into other sequences of characters. It literally would change the text! This is an ugly problem. Many things want to get rid of most text properties, but they don't want to forget about composition. Logically speaking, composition is really part of the characters in the text. Using text properties to encode it is fundamentally inconsistent. We have been lucky so far, in that this inconsistency has not caused a lot of problems -- but now our luck is running out. I can see only two kinds of approaches: 1. Distinguish composition properties from others, and make functions like buffer-substring-no-properties preserve composition properties, even as they discard all other properties. 2. Change the representation of composition so it uses something other than text properties. #2 would be a big maintenance trouble. It would take us a long time to get everything working again after such a change. We certainly should not install such a change now, and I hope we won't need to do it ever. Can #1 work? Handa, please respond. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-21 4:46 ` Richard Stallman @ 2006-07-21 6:35 ` Kenichi Handa 2006-07-21 7:24 ` Eli Zaretskii 2006-07-22 4:39 ` Richard Stallman 2006-07-21 16:13 ` Kevin Rodgers 1 sibling, 2 replies; 34+ messages in thread From: Kenichi Handa @ 2006-07-21 6:35 UTC (permalink / raw) Cc: mah, emacs-devel, jdsmith In article <E1G3muP-0000l5-26@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes: > ;; Note that {buffer-substring,match-string}-no-properties were > ;; formerly used in several places, but that removes composition info. > but neither of us were clear on the meaning of the statement, or why > retaining text properties in any XML parsed data would be desirable. > I think I see why. Losing the composition info could mean that the > composed characters turn into other sequences of characters. It > literally would change the text! ??? Composition is just a text property. It doesn't change the character sequence. It just changes how characters are displayed. In that sense, it's the same as `display' property. > This is an ugly problem. Many things want to get rid of most text > properties, but they don't want to forget about composition. > Logically speaking, composition is really part of the characters in > the text. Using text properties to encode it is fundamentally > inconsistent. I don't understand why `composition' property is that special compared with other properties such as `display', `fill-space'. In addition, I don't understand why XML parsed data wants to get rid of text properties. What's the problem of strings in the returned list containing text properties? Anyway, logically speaking, composition should be internal to display engine and should be hidden from any other place. In emacs-unicode-2, loosing of composition property has no problem because I've implemented a code to construct composition automatically in display engine. > We have been lucky so far, in that this inconsistency has not caused a > lot of problems -- but now our luck is running out. > I can see only two kinds of approaches: > 1. Distinguish composition properties from others, and make functions > like buffer-substring-no-properties preserve composition properties, > even as they discard all other properties. > 2. Change the representation of composition so it uses something other > than text properties. > #2 would be a big maintenance trouble. It would take us a long time > to get everything working again after such a change. We certainly > should not install such a change now, and I hope we won't need to do > it ever. > Can #1 work? I don't know. If text properties causes a problem in XML, #1 doesn't solve the current problem. If text properties is not a problem, we don't need #1. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-21 6:35 ` Kenichi Handa @ 2006-07-21 7:24 ` Eli Zaretskii 2006-07-21 8:14 ` Kenichi Handa 2006-07-22 4:39 ` Richard Stallman 1 sibling, 1 reply; 34+ messages in thread From: Eli Zaretskii @ 2006-07-21 7:24 UTC (permalink / raw) Cc: mah, jdsmith, emacs-devel > From: Kenichi Handa <handa@m17n.org> > Date: Fri, 21 Jul 2006 15:35:03 +0900 > Cc: mah@everybody.org, emacs-devel@gnu.org, jdsmith@as.arizona.edu > > > I think I see why. Losing the composition info could mean that the > > composed characters turn into other sequences of characters. It > > literally would change the text! > > ??? Composition is just a text property. It doesn't change > the character sequence. It just changes how characters are > displayed. If the text is displayed differently due to loss of the composition property, would it still be readable by those who know the language? If not, then removing the composition property _does_ have the effect of changing the text _as_i_is_displayed_to_the_user_. > I don't understand why `composition' property is that > special compared with other properties such as `display', > `fill-space'. I think those are special as well, but I may be wrong. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-21 7:24 ` Eli Zaretskii @ 2006-07-21 8:14 ` Kenichi Handa 0 siblings, 0 replies; 34+ messages in thread From: Kenichi Handa @ 2006-07-21 8:14 UTC (permalink / raw) Cc: mah, jdsmith, emacs-devel In article <uu05bh130.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: >> > I think I see why. Losing the composition info could mean that the >> > composed characters turn into other sequences of characters. It >> > literally would change the text! >> >> ??? Composition is just a text property. It doesn't change >> the character sequence. It just changes how characters are >> displayed. > If the text is displayed differently due to loss of the composition > property, would it still be readable by those who know the language? It depends on language/script and the depth of knowledge. > If not, then removing the composition property _does_ have the effect > of changing the text _as_i_is_displayed_to_the_user_. Of course I understand that. But, in the context of XML handling, I don't know what is the problem other than the logical sequence of characters. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-21 6:35 ` Kenichi Handa 2006-07-21 7:24 ` Eli Zaretskii @ 2006-07-22 4:39 ` Richard Stallman 1 sibling, 0 replies; 34+ messages in thread From: Richard Stallman @ 2006-07-22 4:39 UTC (permalink / raw) Cc: mah, emacs-devel, jdsmith ??? Composition is just a text property. It doesn't change the character sequence. It just changes how characters are displayed. If it replaces two characters with one character, that makes a difference what the text is. You've said that users would perceive them as different characters. Does the composition property affect the sequence of characters that would be written to a file? I have a vague memory that it does. In emacs-unicode-2, loosing of composition property has no problem because I've implemented a code to construct composition automatically in display engine. That definitely seems like an improvement; could you tell me more? However, at present, we need to make the right things happen in Emacs 22. > 1. Distinguish composition properties from others, and make functions > like buffer-substring-no-properties preserve composition properties, > even as they discard all other properties. > Can #1 work? I don't know. If text properties causes a problem in XML, #1 doesn't solve the current problem. If text properties is not a problem, we don't need #1. He said that the text properties in XML cause a serious problem when they are numerous. Keeping all is no good; discarding all text properties would be wrong when there is a composition property. However, keeping just the composition property would be fine, since most of the time there won't be any of them. However, the problem of losing composition properties is not limited to XML. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-21 4:46 ` Richard Stallman 2006-07-21 6:35 ` Kenichi Handa @ 2006-07-21 16:13 ` Kevin Rodgers 2006-07-21 23:33 ` Kevin Rodgers 1 sibling, 1 reply; 34+ messages in thread From: Kevin Rodgers @ 2006-07-21 16:13 UTC (permalink / raw) Richard Stallman wrote: > ;; Note that {buffer-substring,match-string}-no-properties were > ;; formerly used in several places, but that removes composition info. > > but neither of us were clear on the meaning of the statement, or why > retaining text properties in any XML parsed data would be desirable. > > I think I see why. Losing the composition info could mean that the > composed characters turn into other sequences of characters. It > literally would change the text! > > This is an ugly problem. Many things want to get rid of most text > properties, but they don't want to forget about composition. > Logically speaking, composition is really part of the characters in > the text. Using text properties to encode it is fundamentally > inconsistent. > > We have been lucky so far, in that this inconsistency has not caused a > lot of problems -- but now our luck is running out. > > I can see only two kinds of approaches: > > 1. Distinguish composition properties from others, and make functions > like buffer-substring-no-properties preserve composition properties, > even as they discard all other properties. > > 2. Change the representation of composition so it uses something other > than text properties. > > #2 would be a big maintenance trouble. It would take us a long time > to get everything working again after such a change. We certainly > should not install such a change now, and I hope we won't need to do > it ever. > > Can #1 work? How about extending buffer-substring-no-properties with an optional KEEP-PROPERTIES argument, a list of text properties to preserve in the returned string? Then xml-parse-file could call it with the list of composition properties (whatever they are). -- Kevin ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-21 16:13 ` Kevin Rodgers @ 2006-07-21 23:33 ` Kevin Rodgers 0 siblings, 0 replies; 34+ messages in thread From: Kevin Rodgers @ 2006-07-21 23:33 UTC (permalink / raw) Kevin Rodgers wrote: > How about extending buffer-substring-no-properties with an optional > KEEP-PROPERTIES argument, a list of text properties to preserve in > the returned string? Then xml-parse-file could call it with the > list of composition properties (whatever they are). I have a working Lisp implementation, if anyone wants to try it. -- Kevin ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-18 21:35 xml-parse-file and text properties JD Smith 2006-07-20 21:46 ` Richard Stallman @ 2006-07-20 21:46 ` Richard Stallman 2006-07-20 22:40 ` JD Smith 2006-07-21 12:55 ` Stefan Monnier 2 siblings, 1 reply; 34+ messages in thread From: Richard Stallman @ 2006-07-20 21:46 UTC (permalink / raw) Cc: emacs-devel By the way, it looks like the function xml-parse-file would be much cleaner if written like this. Does this work? (defun xml-parse-file (file &optional parse-dtd parse-ns) "Parse the well-formed XML file FILE. If FILE is already visited, use its buffer and don't kill it. Returns the top node with all its children. If PARSE-DTD is non-nil, the DTD is parsed rather than skipped. If PARSE-NS is non-nil, then QNAMES are expanded." (if (get-file-buffer file) (with-current-buffer (get-file-buffer file) (save-excursion (xml-parse-region (point-min) (point-max) (current-buffer) parse-dtd parse-ns))) (with-temp-buffer (insert-file-contents file) (xml-parse-region (point-min) (point-max) (current-buffer) parse-dtd parse-ns)))) ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-20 21:46 ` Richard Stallman @ 2006-07-20 22:40 ` JD Smith 0 siblings, 0 replies; 34+ messages in thread From: JD Smith @ 2006-07-20 22:40 UTC (permalink / raw) On Thu, 20 Jul 2006 17:46:31 -0400, Richard Stallman wrote: > By the way, it looks like the function xml-parse-file would be much > cleaner if written like this. Does this work? This works for me (still returning the text properties with the list data). ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-18 21:35 xml-parse-file and text properties JD Smith 2006-07-20 21:46 ` Richard Stallman 2006-07-20 21:46 ` Richard Stallman @ 2006-07-21 12:55 ` Stefan Monnier 2006-07-21 17:34 ` JD Smith 2 siblings, 1 reply; 34+ messages in thread From: Stefan Monnier @ 2006-07-21 12:55 UTC (permalink / raw) Cc: emacs-devel > xml-parse-file now includes text properties in its returned list, ala: > ((name . #("WV_APPLET" 0 9 (fontified nil))) (link . #("WV_APPLET.html" 0 > 14 (fontified nil)))) > when global-font-lock-mode is on, whereas before it did not. Was this > intended? Any way to temporarily avoid fontification on loaded > buffers (aside from turning global-font-lock-mode off prior to > xml-parse-file)? Could you explain why the text properties cause problems? Stefan ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-21 12:55 ` Stefan Monnier @ 2006-07-21 17:34 ` JD Smith 2006-07-21 20:22 ` Stefan Monnier 2006-07-21 20:52 ` Thien-Thi Nguyen 0 siblings, 2 replies; 34+ messages in thread From: JD Smith @ 2006-07-21 17:34 UTC (permalink / raw) On Fri, 21 Jul 2006 08:55:57 -0400, Stefan Monnier wrote: >> xml-parse-file now includes text properties in its returned list, ala: > >> ((name . #("WV_APPLET" 0 9 (fontified nil))) (link . #("WV_APPLET.html" >> 0 14 (fontified nil)))) > >> when global-font-lock-mode is on, whereas before it did not. Was this >> intended? Any way to temporarily avoid fontification on loaded buffers >> (aside from turning global-font-lock-mode off prior to xml-parse-file)? > > Could you explain why the text properties cause problems? I'm parsing a very large XML file (a document link and calling syntax catalog for IDLWAVE), trimming it and making slight modifications, and then writing it out to file as a big set of sexp's for later recovery, primarily for reasons of speed. This file is read whenever IDLWAVE mode is first entered. With text properties (amounting simply to #(" " 0 5 (fontified nil))' constructs), the file is almost three times as large, erasing much of the speed advantage of translating to a LISP form in the first place. I use `prin1' to write the lists. I suppose I could spin through the list first and remove any text properties on strings, but it seems silly that parsing an XML file never loaded into an active buffer should be laden with inert properties like '(fontified nil). If there were a simple way to prevent that (other than turning global-font-lock off), that would suffice for my purposes, though not of course address the larger issue of text properties in XML parsed lists in general. JD ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-21 17:34 ` JD Smith @ 2006-07-21 20:22 ` Stefan Monnier 2006-07-21 21:50 ` JD Smith 2006-07-22 15:49 ` Richard Stallman 2006-07-21 20:52 ` Thien-Thi Nguyen 1 sibling, 2 replies; 34+ messages in thread From: Stefan Monnier @ 2006-07-21 20:22 UTC (permalink / raw) Cc: emacs-devel >>> xml-parse-file now includes text properties in its returned list, ala: >> >>> ((name . #("WV_APPLET" 0 9 (fontified nil))) (link . #("WV_APPLET.html" >>> 0 14 (fontified nil)))) >> >>> when global-font-lock-mode is on, whereas before it did not. Was this >>> intended? Any way to temporarily avoid fontification on loaded buffers >>> (aside from turning global-font-lock-mode off prior to xml-parse-file)? >> >> Could you explain why the text properties cause problems? > I'm parsing a very large XML file (a document link and calling syntax > catalog for IDLWAVE), trimming it and making slight modifications, and > then writing it out to file as a big set of sexp's for later recovery, > primarily for reasons of speed. This file is read whenever IDLWAVE > mode is first entered. > With text properties (amounting simply to #(" " 0 5 (fontified nil))' > constructs), the file is almost three times as large, erasing much of > the speed advantage of translating to a LISP form in the first place. > I use `prin1' to write the lists. I suppose I could spin through the > list first and remove any text properties on strings, but it seems > silly that parsing an XML file never loaded into an active buffer > should be laden with inert properties like '(fontified nil). If there > were a simple way to prevent that (other than turning global-font-lock > off), that would suffice for my purposes, though not of course address the > larger issue of text properties in XML parsed lists in general. Clearly, in the case of xml-parse-file, I see no reason why we shouldn't strip all properties. After all, it's supposed to parse the *file*, not the buffer, and files don't have those text properties. The argument that we need to preserve the `composition' property doesn't seem valid: this property can be computed from the sequence of chars. Stefan ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-21 20:22 ` Stefan Monnier @ 2006-07-21 21:50 ` JD Smith 2006-07-22 15:49 ` Richard Stallman 1 sibling, 0 replies; 34+ messages in thread From: JD Smith @ 2006-07-21 21:50 UTC (permalink / raw) On Fri, 21 Jul 2006 16:22:22 -0400, Stefan Monnier wrote: > Clearly, in the case of xml-parse-file, I see no reason why we shouldn't > strip all properties. After all, it's supposed to parse the *file*, not the > buffer, and files don't have those text properties. It parses the buffer directly if the file is already loaded, but I agree with your point. > The argument that we need to preserve the `composition' property doesn't > seem valid: this property can be computed from the sequence of chars. The remaining question is are there *any* xml- routines for which returning properties *is* desirable. Otherwise, replacing {buffer,match}-string with the -no-properties version throughout xml.el works. JD ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-21 20:22 ` Stefan Monnier 2006-07-21 21:50 ` JD Smith @ 2006-07-22 15:49 ` Richard Stallman 2006-07-24 1:51 ` Kenichi Handa 1 sibling, 1 reply; 34+ messages in thread From: Richard Stallman @ 2006-07-22 15:49 UTC (permalink / raw) Cc: emacs-devel, jdsmith The argument that we need to preserve the `composition' property doesn't seem valid: this property can be computed from the sequence of chars. Handa, is it true that the composition property can always be recomputed from the sequence of characters? ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-22 15:49 ` Richard Stallman @ 2006-07-24 1:51 ` Kenichi Handa 2006-07-24 3:17 ` Stefan Monnier 2006-07-24 18:22 ` Richard Stallman 0 siblings, 2 replies; 34+ messages in thread From: Kenichi Handa @ 2006-07-24 1:51 UTC (permalink / raw) Cc: jdsmith, monnier, emacs-devel In article <E1G4Jjd-0004fU-D5@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes: > The argument that we need to preserve the `composition' property doesn't > seem valid: this property can be computed from the sequence of chars. > Handa, is it true that the composition property can always be recomputed > from the sequence of characters? Usually yes, but not always. All normal compositions (i.e. compositions for displaying a script in the correct way) are registered in composition-function-table (a char-table). So, by looking up the table for each character, we can recover compositions (though very slow). But, it's possible to manually compose some text. For instance please try this: (insert (compose-string "+o" 0 2 '(?+ (tc . bc) ?o))) It inserts "+o" with composition property that can't be recovered automatically once lost. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-24 1:51 ` Kenichi Handa @ 2006-07-24 3:17 ` Stefan Monnier 2006-07-24 4:36 ` Kenichi Handa 2006-07-24 18:22 ` Richard Stallman 1 sibling, 1 reply; 34+ messages in thread From: Stefan Monnier @ 2006-07-24 3:17 UTC (permalink / raw) Cc: jdsmith, rms, emacs-devel > (insert (compose-string "+o" 0 2 '(?+ (tc . bc) ?o))) > It inserts "+o" with composition property that can't be > recovered automatically once lost. But that's a hypothetical example. Stefan ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-24 3:17 ` Stefan Monnier @ 2006-07-24 4:36 ` Kenichi Handa 0 siblings, 0 replies; 34+ messages in thread From: Kenichi Handa @ 2006-07-24 4:36 UTC (permalink / raw) Cc: emacs-devel, rms, jdsmith In article <jwv64hnwv4k.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes: >> (insert (compose-string "+o" 0 2 '(?+ (tc . bc) ?o))) >> It inserts "+o" with composition property that can't be >> recovered automatically once lost. > But that's a hypothetical example. Yes. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-24 1:51 ` Kenichi Handa 2006-07-24 3:17 ` Stefan Monnier @ 2006-07-24 18:22 ` Richard Stallman 2006-07-24 20:38 ` Stuart D. Herring 2006-07-24 20:51 ` Stefan Monnier 1 sibling, 2 replies; 34+ messages in thread From: Richard Stallman @ 2006-07-24 18:22 UTC (permalink / raw) Cc: jdsmith, monnier, emacs-devel Usually yes, but not always. All normal compositions (i.e. compositions for displaying a script in the correct way) are registered in composition-function-table (a char-table). So, by looking up the table for each character, we can recover compositions (though very slow). Interesting. But, it's possible to manually compose some text. We can consider that improper use of the composition property; we need not cater to it. That leaves us in a peculiar in-between situation. If discarding these properties altered what text would be written in a file, it would be very bad, and it would be clear we need to preserve these properties. But that is not the case. Nonetheless, the text will display wrong if subsequently reinserted in a buffer. All in all, I think that is enough reason for xml.el to preserve the composition property. So the question is how. One first step would be to write a function that operates on a string and discards all text properties except composition. More generally, all except a certain specified list of property names. I think that should be done at the C level for speed. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-24 18:22 ` Richard Stallman @ 2006-07-24 20:38 ` Stuart D. Herring 2006-07-25 3:09 ` Richard Stallman 2006-07-24 20:51 ` Stefan Monnier 1 sibling, 1 reply; 34+ messages in thread From: Stuart D. Herring @ 2006-07-24 20:38 UTC (permalink / raw) Cc: emacs-devel > One first step would be to write a function that operates on a string > and discards all text properties except composition. More generally, > all except a certain specified list of property names. It seems cleaner to me to define a property symbol (like 'composition or 'fontified) as "necessary" to the text if it (the symbol) has a non-nil value for one of its properties -- say, 'necessary, or 'textual. Then there could be `strip-superficial-properties' that removed text properties whose names lacked the property. I think this would be nearly as fast as having a list to retain, but it would break for text properties whose name aren't symbols. It seems to me, though, that anything so important as to need retention would be important enough to warrant a symbol for its name. Davis -- This product is sold by volume, not by mass. If it appears too dense or too sparse, it is because mass-energy conversion has occurred during shipping. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-24 20:38 ` Stuart D. Herring @ 2006-07-25 3:09 ` Richard Stallman 2006-07-25 14:00 ` Stefan Monnier 0 siblings, 1 reply; 34+ messages in thread From: Richard Stallman @ 2006-07-25 3:09 UTC (permalink / raw) Cc: emacs-devel It seems cleaner to me to define a property symbol (like `composition' or `fontified') as "necessary" to the text if it (the symbol) has a non-nil value for one of its properties -- say, `necessary', or `textual'. Then there could be `strip-superficial-properties' that removed text properties whose names lacked the property. I think that is a good idea. Stefan convinced me that we do not need this now, but I think it would be useful to start writing it, for installation after the release. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-25 3:09 ` Richard Stallman @ 2006-07-25 14:00 ` Stefan Monnier 2006-07-25 22:15 ` Richard Stallman 0 siblings, 1 reply; 34+ messages in thread From: Stefan Monnier @ 2006-07-25 14:00 UTC (permalink / raw) Cc: emacs-devel > It seems cleaner to me to define a property symbol (like `composition' or > `fontified') as "necessary" to the text if it (the symbol) has a non-nil > value for one of its properties -- say, `necessary', or `textual'. Then > there could be `strip-superficial-properties' that removed text properties > whose names lacked the property. > I think that is a good idea. Maybe an approach along the lines of what we do with front-sticky would make sense, this way every particular use of a property can be separately marked as needing to be kept or not. Stefan ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-25 14:00 ` Stefan Monnier @ 2006-07-25 22:15 ` Richard Stallman 0 siblings, 0 replies; 34+ messages in thread From: Richard Stallman @ 2006-07-25 22:15 UTC (permalink / raw) Cc: emacs-devel Maybe an approach along the lines of what we do with front-sticky would make sense, this way every particular use of a property can be separately marked as needing to be kept or not. I would not object to it, but I don't think this complexity is needed. I think that any given property will always want the same treatment here. If we find a case where that is not so, we could still add this extra complexity then. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-24 18:22 ` Richard Stallman 2006-07-24 20:38 ` Stuart D. Herring @ 2006-07-24 20:51 ` Stefan Monnier 2006-07-25 3:09 ` Richard Stallman 1 sibling, 1 reply; 34+ messages in thread From: Stefan Monnier @ 2006-07-24 20:51 UTC (permalink / raw) Cc: emacs-devel, jdsmith, Kenichi Handa > One first step would be to write a function that operates on a string > and discards all text properties except composition. More generally, > all except a certain specified list of property names. > I think that should be done at the C level for speed. Note that this won't be useful for Emacs-23 any more. I'm not sure it's wort the trouble. After all, it's not specific to xml.el, far from that. Are we going to go through every single using of buffer-substring and buffer-substring-no-properties and see what should be done for each of them? Stripping the `display' property may also result is text that's saved correctly but displayed incorrectly. Even `face' can do that (e.g. with X-Symbol). I don't think we want to go down that road right now. Stefan ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-24 20:51 ` Stefan Monnier @ 2006-07-25 3:09 ` Richard Stallman 0 siblings, 0 replies; 34+ messages in thread From: Richard Stallman @ 2006-07-25 3:09 UTC (permalink / raw) Cc: emacs-devel, jdsmith, handa Note that this won't be useful for Emacs-23 any more. I'm not sure it's wort the trouble. After all, it's not specific to xml.el, far from that. Are we going to go through every single using of buffer-substring and buffer-substring-no-properties and see what should be done for each of them? Ok, I agree. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-21 17:34 ` JD Smith 2006-07-21 20:22 ` Stefan Monnier @ 2006-07-21 20:52 ` Thien-Thi Nguyen 2006-07-21 21:45 ` JD Smith 1 sibling, 1 reply; 34+ messages in thread From: Thien-Thi Nguyen @ 2006-07-21 20:52 UTC (permalink / raw) Cc: emacs-devel JD Smith <jdsmith@as.arizona.edu> writes: > I'm parsing a very large XML file > [...] > never loaded into an active buffer do you use `xml-parse-file' w/ a FILE that refers to a file not currently visited (in a buffer)? looks like `xml-parse-file' uses `find-file' to handle that case. maybe something like the following would give better results? thi ________________________________________________________ *** xml.el 6 Feb 2006 14:33:36 -0000 1.53 --- xml.el 21 Jul 2006 20:46:44 -0000 *************** *** 170,177 **** (progn (set-buffer (get-file-buffer file)) (setq keep (point))) ! (let (auto-mode-alist) ; no need for xml-mode ! (find-file file))) (let ((xml (xml-parse-region (point-min) (point-max) --- 170,177 ---- (progn (set-buffer (get-file-buffer file)) (setq keep (point))) ! (set-buffer (generate-new-buffer " *xml work*")) ! (insert-file-contents file)) (let ((xml (xml-parse-region (point-min) (point-max) ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-21 20:52 ` Thien-Thi Nguyen @ 2006-07-21 21:45 ` JD Smith 2006-07-22 9:15 ` Eli Zaretskii 0 siblings, 1 reply; 34+ messages in thread From: JD Smith @ 2006-07-21 21:45 UTC (permalink / raw) On Fri, 21 Jul 2006 16:52:37 -0400, Thien-Thi Nguyen wrote: > JD Smith <jdsmith@as.arizona.edu> writes: > >> I'm parsing a very large XML file >> [...] >> never loaded into an active buffer > > do you use `xml-parse-file' w/ a FILE that refers to a file not currently > visited (in a buffer)? looks like `xml-parse-file' uses `find-file' to > handle that case. maybe something like the following would give better > results? This is similar to the improved method Richard proposed yesterday (insert-file-contents using a temporary buffer). Remarkably enough (and highly surprising to me), even inserting the XML file contents in a temporary buffer is enough to get '(fontified nil) text properties added all over, which xml-parse-file dutifully returns (unless modified with *-no-properties). JD ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-21 21:45 ` JD Smith @ 2006-07-22 9:15 ` Eli Zaretskii 2006-07-24 16:44 ` JD Smith 0 siblings, 1 reply; 34+ messages in thread From: Eli Zaretskii @ 2006-07-22 9:15 UTC (permalink / raw) Cc: emacs-devel > From: JD Smith <jdsmith@as.arizona.edu> > Date: Fri, 21 Jul 2006 14:45:27 -0700 > > This is similar to the improved method Richard proposed yesterday > (insert-file-contents using a temporary buffer). Remarkably enough > (and highly surprising to me), even inserting the XML file contents in > a temporary buffer is enough to get '(fontified nil) text properties > added all over I don't see anything surprising here, since font-lock is now ON by default. You should be able to overcome this if you turn off font-lock-mode in the temporary buffer, before inserting the file's contents. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-22 9:15 ` Eli Zaretskii @ 2006-07-24 16:44 ` JD Smith 2006-07-25 16:05 ` JD Smith 0 siblings, 1 reply; 34+ messages in thread From: JD Smith @ 2006-07-24 16:44 UTC (permalink / raw) On Sat, 22 Jul 2006 12:15:34 +0300, Eli Zaretskii wrote: >> From: JD Smith <jdsmith@as.arizona.edu> >> Date: Fri, 21 Jul 2006 14:45:27 -0700 >> >> This is similar to the improved method Richard proposed yesterday >> (insert-file-contents using a temporary buffer). Remarkably enough >> (and highly surprising to me), even inserting the XML file contents in >> a temporary buffer is enough to get '(fontified nil) text properties >> added all over > > I don't see anything surprising here, since font-lock is now ON by > default. > > You should be able to overcome this if you turn off font-lock-mode in > the temporary buffer, before inserting the file's contents. This was my mistake. In fact, with Richard's formulation: (with-temp-buffer (insert-file-contents file) (xml-parse-region (point-min) (point-max) (current-buffer) parse-dtd parse-ns)))) no font-lock text properties ever get added to the temporary buffer, despite global-font-lock being on. It turns out I was pre-loading the file into a buffer to prevent warnings about its read-only status, so this code path was not being taken. With the patch below, xml-parse-file only returns unwanted text properties when a file is already loaded into a buffer. Should we install it? This doesn't address the larger issue of whether xml-parse-file should ever return text-properties, but it is simple and sensible. JD *** xml.el 06 Feb 2006 07:33:36 -0700 1.53 --- xml.el 24 Jul 2006 09:40:07 -0700 *************** *** 161,187 **** ;;;###autoload (defun xml-parse-file (file &optional parse-dtd parse-ns) "Parse the well-formed XML file FILE. ! If FILE is already visited, use its buffer and don't kill it. ! Returns the top node with all its children. ! If PARSE-DTD is non-nil, the DTD is parsed rather than skipped. ! If PARSE-NS is non-nil, then QNAMES are expanded." ! (let ((keep)) ! (if (get-file-buffer file) ! (progn ! (set-buffer (get-file-buffer file)) ! (setq keep (point))) ! (let (auto-mode-alist) ; no need for xml-mode ! (find-file file))) ! ! (let ((xml (xml-parse-region (point-min) ! (point-max) ! (current-buffer) ! parse-dtd parse-ns))) ! (if keep ! (goto-char keep) ! (kill-buffer (current-buffer))) ! xml))) ! (defvar xml-name-re) (defvar xml-entity-value-re) --- 161,182 ---- ;;;###autoload (defun xml-parse-file (file &optional parse-dtd parse-ns) "Parse the well-formed XML file FILE. ! If FILE is already visited, use its buffer and don't kill it. Returns the ! top node with all its children. If PARSE-DTD is non-nil, the DTD is parsed ! rather than skipped. If PARSE-NS is non-nil, then QNAMES are expanded." ! (if (get-file-buffer file) ! (with-current-buffer (get-file-buffer file) ! (save-excursion ! (xml-parse-region (point-min) ! (point-max) ! (current-buffer) ! parse-dtd parse-ns))) ! (with-temp-buffer ! (insert-file-contents file) ! (xml-parse-region (point-min) ! (point-max) ! (current-buffer) ! parse-dtd parse-ns)))) (defvar xml-name-re) (defvar xml-entity-value-re) ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-24 16:44 ` JD Smith @ 2006-07-25 16:05 ` JD Smith 2006-07-25 16:27 ` Stefan Monnier 0 siblings, 1 reply; 34+ messages in thread From: JD Smith @ 2006-07-25 16:05 UTC (permalink / raw) Just for completeness: Richard checked in the changes below, and they have the effect of avoiding the unwanted text properties for parsed files not already loaded in a buffer. > *** xml.el 06 Feb 2006 07:33:36 -0700 1.53 > --- xml.el 24 Jul 2006 09:40:07 -0700 > *************** > *** 161,187 **** > ;;;###autoload > (defun xml-parse-file (file &optional parse-dtd parse-ns) > "Parse the well-formed XML file FILE. > ! If FILE is already visited, use its buffer and don't kill it. > ! Returns the top node with all its children. > ! If PARSE-DTD is non-nil, the DTD is parsed rather than skipped. > ! If PARSE-NS is non-nil, then QNAMES are expanded." > ! (let ((keep)) > ! (if (get-file-buffer file) > ! (progn > ! (set-buffer (get-file-buffer file)) > ! (setq keep (point))) > ! (let (auto-mode-alist) ; no need for xml-mode > ! (find-file file))) > ! > ! (let ((xml (xml-parse-region (point-min) > ! (point-max) > ! (current-buffer) > ! parse-dtd parse-ns))) > ! (if keep > ! (goto-char keep) > ! (kill-buffer (current-buffer))) > ! xml))) > ! > > (defvar xml-name-re) > (defvar xml-entity-value-re) > --- 161,182 ---- > ;;;###autoload > (defun xml-parse-file (file &optional parse-dtd parse-ns) > "Parse the well-formed XML file FILE. > ! If FILE is already visited, use its buffer and don't kill it. Returns the > ! top node with all its children. If PARSE-DTD is non-nil, the DTD is parsed > ! rather than skipped. If PARSE-NS is non-nil, then QNAMES are expanded." > ! (if (get-file-buffer file) > ! (with-current-buffer (get-file-buffer file) > ! (save-excursion > ! (xml-parse-region (point-min) > ! (point-max) > ! (current-buffer) > ! parse-dtd parse-ns))) > ! (with-temp-buffer > ! (insert-file-contents file) > ! (xml-parse-region (point-min) > ! (point-max) > ! (current-buffer) > ! parse-dtd parse-ns)))) > > (defvar xml-name-re) > (defvar xml-entity-value-re) ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-25 16:05 ` JD Smith @ 2006-07-25 16:27 ` Stefan Monnier 2006-07-25 19:16 ` JD Smith 0 siblings, 1 reply; 34+ messages in thread From: Stefan Monnier @ 2006-07-25 16:27 UTC (permalink / raw) Cc: emacs-devel > Just for completeness: Richard checked in the changes below, and they > have the effect of avoiding the unwanted text properties for parsed > files not already loaded in a buffer. Getting different results depending on whether or not the user happens to be visiting the buffer is hardly a feature. Stefan ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xml-parse-file and text properties 2006-07-25 16:27 ` Stefan Monnier @ 2006-07-25 19:16 ` JD Smith 0 siblings, 0 replies; 34+ messages in thread From: JD Smith @ 2006-07-25 19:16 UTC (permalink / raw) Cc: emacs-devel On Tue, 2006-07-25 at 12:27 -0400, Stefan Monnier wrote: > > Just for completeness: Richard checked in the changes below, and they > > have the effect of avoiding the unwanted text properties for parsed > > files not already loaded in a buffer. > > Getting different results depending on whether or not the user happens to > be visiting the buffer is hardly a feature. Agreed, but of the misfeatures, returning text properties when parsing a file which has never even been visited is the more odious, in my opinion. ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2006-07-25 22:15 UTC | newest] Thread overview: 34+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-07-18 21:35 xml-parse-file and text properties JD Smith 2006-07-20 21:46 ` Richard Stallman 2006-07-20 22:11 ` JD Smith 2006-07-21 4:46 ` Richard Stallman 2006-07-21 6:35 ` Kenichi Handa 2006-07-21 7:24 ` Eli Zaretskii 2006-07-21 8:14 ` Kenichi Handa 2006-07-22 4:39 ` Richard Stallman 2006-07-21 16:13 ` Kevin Rodgers 2006-07-21 23:33 ` Kevin Rodgers 2006-07-20 21:46 ` Richard Stallman 2006-07-20 22:40 ` JD Smith 2006-07-21 12:55 ` Stefan Monnier 2006-07-21 17:34 ` JD Smith 2006-07-21 20:22 ` Stefan Monnier 2006-07-21 21:50 ` JD Smith 2006-07-22 15:49 ` Richard Stallman 2006-07-24 1:51 ` Kenichi Handa 2006-07-24 3:17 ` Stefan Monnier 2006-07-24 4:36 ` Kenichi Handa 2006-07-24 18:22 ` Richard Stallman 2006-07-24 20:38 ` Stuart D. Herring 2006-07-25 3:09 ` Richard Stallman 2006-07-25 14:00 ` Stefan Monnier 2006-07-25 22:15 ` Richard Stallman 2006-07-24 20:51 ` Stefan Monnier 2006-07-25 3:09 ` Richard Stallman 2006-07-21 20:52 ` Thien-Thi Nguyen 2006-07-21 21:45 ` JD Smith 2006-07-22 9:15 ` Eli Zaretskii 2006-07-24 16:44 ` JD Smith 2006-07-25 16:05 ` JD Smith 2006-07-25 16:27 ` Stefan Monnier 2006-07-25 19:16 ` JD Smith
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.