* nXML gags on XML with one long line
@ 2014-12-24 9:02 Boylan, Ross
2014-12-24 11:41 ` Yuri Khan
2014-12-24 16:28 ` Eli Zaretskii
0 siblings, 2 replies; 4+ messages in thread
From: Boylan, Ross @ 2014-12-24 9:02 UTC (permalink / raw)
To: help-gnu-emacs@gnu.org
I opened a .xml file that was relatively large in GNU Emacs 23.4.1 (x86_64-pc-linux-gnu, GTK+ Version 2.24.10)
of 2012-09-08 on trouble, modified by Debian. This opened in nXML mode, but started using up all the CPU (I think after I asked it to use outline mode) and became unresponsive. Before that it showed a message saying the file was 87% validated (very roughly--from memory) for quite awhile (a minute?), with low CPU use. It did eventually show as completely validated.
The xml file is one long block of text with no whitespace between entries; in fact, there isn't even a line break at the end of the file. Some of the nXML documentation, specifically on paragraphs, refers to identifying paragraphs by line breaks. Perhaps nXML can't cope with files without newlines?
At any rate, any suggestions for how to deal with the file, or at least debug the problem?
$ wc KHC-Endnote2.xml # the file referred to above
0 344068 4883375 KHC-Endnote2.xml
The file was exported from Endnote in its XML format.
Thanks.
Ross Boylan
P.S. I was able to change to text mode, but that isn't too helpful since the thing is just one long string.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: nXML gags on XML with one long line
2014-12-24 9:02 nXML gags on XML with one long line Boylan, Ross
@ 2014-12-24 11:41 ` Yuri Khan
2014-12-24 16:28 ` Eli Zaretskii
1 sibling, 0 replies; 4+ messages in thread
From: Yuri Khan @ 2014-12-24 11:41 UTC (permalink / raw)
To: Boylan, Ross; +Cc: help-gnu-emacs@gnu.org
On Wed, Dec 24, 2014 at 3:02 PM, Boylan, Ross <Ross.Boylan@ucsf.edu> wrote:
> any suggestions for how to deal with the file, or at least debug the problem?
>
> $ wc KHC-Endnote2.xml # the file referred to above
> 0 344068 4883375 KHC-Endnote2.xml
If that particular xml schema is not whitespace-sensitive, you might
work around the problem by pretty-printing the file first.
$ xmlstarlet format KHC-Endnote2.xml > KHC-Endnote2.formatted.xml
This will add plenty of newlines for nXML to rest upon.
Turning off global-font-lock-mode (before opening the file) may also help.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: nXML gags on XML with one long line
2014-12-24 9:02 nXML gags on XML with one long line Boylan, Ross
2014-12-24 11:41 ` Yuri Khan
@ 2014-12-24 16:28 ` Eli Zaretskii
2014-12-24 20:08 ` Boylan, Ross
1 sibling, 1 reply; 4+ messages in thread
From: Eli Zaretskii @ 2014-12-24 16:28 UTC (permalink / raw)
To: help-gnu-emacs
> From: "Boylan, Ross" <Ross.Boylan@ucsf.edu>
> Date: Wed, 24 Dec 2014 09:02:43 +0000
>
> I opened a .xml file that was relatively large in GNU Emacs 23.4.1 (x86_64-pc-linux-gnu, GTK+ Version 2.24.10)
> of 2012-09-08 on trouble, modified by Debian. This opened in nXML mode, but started using up all the CPU (I think after I asked it to use outline mode) and became unresponsive. Before that it showed a message saying the file was 87% validated (very roughly--from memory) for quite awhile (a minute?), with low CPU use. It did eventually show as completely validated.
>
> The xml file is one long block of text with no whitespace between entries; in fact, there isn't even a line break at the end of the file. Some of the nXML documentation, specifically on paragraphs, refers to identifying paragraphs by line breaks. Perhaps nXML can't cope with files without newlines?
How long is that single long line? Emacs has known problems with
displaying very long lines (like tens of thousands of characters).
If you visit the file in Fundamental mode, does the problem go away?
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: nXML gags on XML with one long line
2014-12-24 16:28 ` Eli Zaretskii
@ 2014-12-24 20:08 ` Boylan, Ross
0 siblings, 0 replies; 4+ messages in thread
From: Boylan, Ross @ 2014-12-24 20:08 UTC (permalink / raw)
To: Eli Zaretskii, help-gnu-emacs@gnu.org
The file is almost 5 million characters; the output of wc at the bottom of my original message has the exact figures, as well as the word count.
In text mode things emacs didn't chew up the CPU and was responsive; as I said, the result wasn't very useful to me.
Ross
________________________________________
From: help-gnu-emacs-bounces+ross.boylan=ucsf.edu@gnu.org [help-gnu-emacs-bounces+ross.boylan=ucsf.edu@gnu.org] on behalf of Eli Zaretskii [eliz@gnu.org]
Sent: Wednesday, December 24, 2014 8:28 AM
To: help-gnu-emacs@gnu.org
Subject: Re: nXML gags on XML with one long line
> From: "Boylan, Ross" <Ross.Boylan@ucsf.edu>
> Date: Wed, 24 Dec 2014 09:02:43 +0000
>
> I opened a .xml file that was relatively large in GNU Emacs 23.4.1 (x86_64-pc-linux-gnu, GTK+ Version 2.24.10)
> of 2012-09-08 on trouble, modified by Debian. This opened in nXML mode, but started using up all the CPU (I think after I asked it to use outline mode) and became unresponsive. Before that it showed a message saying the file was 87% validated (very roughly--from memory) for quite awhile (a minute?), with low CPU use. It did eventually show as completely validated.
>
> The xml file is one long block of text with no whitespace between entries; in fact, there isn't even a line break at the end of the file. Some of the nXML documentation, specifically on paragraphs, refers to identifying paragraphs by line breaks. Perhaps nXML can't cope with files without newlines?
How long is that single long line? Emacs has known problems with
displaying very long lines (like tens of thousands of characters).
If you visit the file in Fundamental mode, does the problem go away?
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-12-24 20:08 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-24 9:02 nXML gags on XML with one long line Boylan, Ross
2014-12-24 11:41 ` Yuri Khan
2014-12-24 16:28 ` Eli Zaretskii
2014-12-24 20:08 ` Boylan, Ross
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).