unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Suggestion / feature request
@ 2012-04-19 13:28 Carlos Aguilar
  2012-04-19 14:17 ` Tassilo Horn
  2012-04-19 22:21 ` Stefan Monnier
  0 siblings, 2 replies; 9+ messages in thread
From: Carlos Aguilar @ 2012-04-19 13:28 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1128 bytes --]

Hi everybody,
I am a heavy user of doc-view mode and more generally of emacs, so first 
of all big thanks to all the developers for their contributions to this 
unique piece of software.

I write to the list to ask whether an idea I have had could be 
implemented and/or would be useful.

I often use doc-view mode with medium to large pdf/ps/dvi files, when 
writing/modifying latex documents. Whether these are slides or a 
document with or without chapters, my modification often just change a 
small amount of pages. This is almost always the case for slides as they 
are independent. For documents there are changes typically in the last 
page in small documents, or those between the point in which I am and 
the end of the associated chapter. As the conversion process creates an 
image for every page in each iteration most of the processing is useless 
(it regenerates images of pages that already exist).

I wondered if it would be possible to keep a set of signatures of the 
pdf/ps/dvi pages processed so that those that are unchanged are not 
reconverted to (already existing) bitmap images.

Best,

Carlos Aguilar

[-- Attachment #2: Type: text/html, Size: 1495 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Suggestion / feature request
  2012-04-19 13:28 Suggestion / feature request Carlos Aguilar
@ 2012-04-19 14:17 ` Tassilo Horn
  2012-04-19 16:36   ` Carlos Aguilar
  2012-04-19 17:15   ` Carlos Aguilar
  2012-04-19 22:21 ` Stefan Monnier
  1 sibling, 2 replies; 9+ messages in thread
From: Tassilo Horn @ 2012-04-19 14:17 UTC (permalink / raw)
  To: Carlos Aguilar; +Cc: emacs-devel

Carlos Aguilar <carlos.aguilar@unilim.fr> writes:

Hi Carlos,

> I often use doc-view mode with medium to large pdf/ps/dvi files, when
> writing/modifying latex documents.

Yes, frequently changing documents are clearly not the prime use-case
for doc-view, exactly because of the reasons you mention.  Do you use
AUCTeX for writing your documents?  If yes, then preview-latex might be
exactly what you need.

,----[ (info "(preview-latex)Top") ]
|    preview-latex is a package embedding preview fragments into Emacs
| source buffers under the AUCTeX editing environment for LaTeX.  It uses
| `preview.sty' for the extraction of certain environments (most notably
| displayed formulas).  Other applications of this style file are
| possible and exist.
`----

> I wondered if it would be possible to keep a set of signatures of the
> pdf/ps/dvi pages processed so that those that are unchanged are not
> reconverted to (already existing) bitmap images.

Um, I have no idea how to do that.  Doc-view only knows the PNG images
generated from the original document, and you can't compare those with
pages in the document.

Well, it also has the old document's contents in the current buffer and
the updated document is on the file system, so in theory it could also
compare the documents.  But I have no clue how to do that.  Googling
around, I've found http://www.qtrac.eu/comparepdf.html, but I'm not sure
if it does the trick.  (Oh, and of course if the comparison of the docs
is not significantly cheaper than a reconversion, there's no sense in
doing so. ;-))

Bye,
Tassilo



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Suggestion / feature request
  2012-04-19 14:17 ` Tassilo Horn
@ 2012-04-19 16:36   ` Carlos Aguilar
  2012-04-19 17:15   ` Carlos Aguilar
  1 sibling, 0 replies; 9+ messages in thread
From: Carlos Aguilar @ 2012-04-19 16:36 UTC (permalink / raw)
  To: Tassilo Horn; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2688 bytes --]



Le 19/04/2012 16:17, Tassilo Horn a écrit :
> Carlos Aguilar<carlos.aguilar@unilim.fr>  writes:
>
> Hi Carlos,
>
>> I often use doc-view mode with medium to large pdf/ps/dvi files, when
>> writing/modifying latex documents.
> Yes, frequently changing documents are clearly not the prime use-case
> for doc-view, exactly because of the reasons you mention.  Do you use
> AUCTeX for writing your documents?  If yes, then preview-latex might be
> exactly what you need.
>
> ,----[ (info "(preview-latex)Top") ]
> |    preview-latex is a package embedding preview fragments into Emacs
> | source buffers under the AUCTeX editing environment for LaTeX.  It uses
> | `preview.sty' for the extraction of certain environments (most notably
> | displayed formulas).  Other applications of this style file are
> | possible and exist.
> `----
>
Ummm I suppose I am a bit of a maniac I really feel unconfortable if I 
don't have the real pdf in front of me and can go back and forward to 
check the global aspect of the document  ...
>> I wondered if it would be possible to keep a set of signatures of the
>> pdf/ps/dvi pages processed so that those that are unchanged are not
>> reconverted to (already existing) bitmap images.
> Um, I have no idea how to do that.  Doc-view only knows the PNG images
> generated from the original document, and you can't compare those with
> pages in the document.
>
> Well, it also has the old document's contents in the current buffer and
> the updated document is on the file system, so in theory it could also
> compare the documents.  But I have no clue how to do that.  Googling
> around, I've found http://www.qtrac.eu/comparepdf.html, but I'm not sure
> if it does the trick.  (Oh, and of course if the comparison of the docs
> is not significantly cheaper than a reconversion, there's no sense in
> doing so. ;-))
Of course it must be MUCH cheaper.

I have tried to use pdftk burst (which splits almost instantaneously a 
pdf file in a set of files with one page each) with two closely 
different versions of a large pdf. The idea was : if the pdf for each 
file is exactly the same when pages are unchanged just a checksum will 
do the test (and be very significantly cheaper than a reconversion).

The resulting pdf files are close for each page but different. Maybe 
there is a workaround with pdftk or directly with gs ... but before 
doing that the question is :
Would you change doc-view if it is possible to find a (simple enough) 
way to obtain the functionality I am talking about ? Or do you believe 
this is uninteresting/dangerous/outofscope ?

cheers,

Carlos


> Bye,
> Tassilo

[-- Attachment #2: Type: text/html, Size: 3690 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Suggestion / feature request
  2012-04-19 14:17 ` Tassilo Horn
  2012-04-19 16:36   ` Carlos Aguilar
@ 2012-04-19 17:15   ` Carlos Aguilar
  2012-04-19 18:59     ` Tassilo Horn
  1 sibling, 1 reply; 9+ messages in thread
From: Carlos Aguilar @ 2012-04-19 17:15 UTC (permalink / raw)
  To: Tassilo Horn; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 3003 bytes --]

I have found a simple test "by hand" that works. Initial conditions : 
two files file1.pdf and file2.pdf with 101 pages and only a small 
difference in page 85.

time (mkdir file1_burst; cd file1_burst; pdftk ../file1.pdf burst; cd ..
mkdir file2_burst; cd file2_burst; pdftk ../file2.pdf burst
for I in pg*; do echo $I; diff -a $I ../file1_burst/$I|grep -v 
'BaseFont\|FontName\|Creator\|Info\|...c...\|\-\-\-\|..c..'; done
cd .. ; rm -Rf file1_burst; rm -Rf file2_burst)

pg_0001.pdf
pg_0002.pdf
pg_0003.pdf
pg_0004.pdf
pg_0005.pdf
.
.
.
pg_0084.pdf
pg_0085.pdf
< x?T?n?0
          ??W?(\x015CQ?\x15?^
< 0000059304 00000 n
 > 0000059305 00000 n
< 0000059402 00000 n
< 0000059466 00000 n
< 0000059513 00000 n
 > 0000059403 00000 n
 > 0000059467 00000 n
 > 0000059514 00000 n
< 59672
 > 59673
pg_0086.pdf
pg_0087.pdf
pg_0088.pdf
pg_0089.pdf
pg_0090.pdf
pg_0091.pdf
pg_0092.pdf
pg_0093.pdf
pg_0094.pdf
pg_0095.pdf
pg_0096.pdf
pg_0097.pdf
pg_0098.pdf
pg_0099.pdf
pg_0100.pdf
pg_0101.pdf


real    0m7.009s
user    0m6.160s
sys    0m0.210s


This can probably be optimized much. The only page for which there is a 
difference is the one with a change. Doc-view reconversion needs in the 
same computer 20 seconds.


Cheers,

Carlos

Le 19/04/2012 16:17, Tassilo Horn a écrit :
> Carlos Aguilar<carlos.aguilar@unilim.fr>  writes:
>
> Hi Carlos,
>
>> I often use doc-view mode with medium to large pdf/ps/dvi files, when
>> writing/modifying latex documents.
> Yes, frequently changing documents are clearly not the prime use-case
> for doc-view, exactly because of the reasons you mention.  Do you use
> AUCTeX for writing your documents?  If yes, then preview-latex might be
> exactly what you need.
>
> ,----[ (info "(preview-latex)Top") ]
> |    preview-latex is a package embedding preview fragments into Emacs
> | source buffers under the AUCTeX editing environment for LaTeX.  It uses
> | `preview.sty' for the extraction of certain environments (most notably
> | displayed formulas).  Other applications of this style file are
> | possible and exist.
> `----
>
>> I wondered if it would be possible to keep a set of signatures of the
>> pdf/ps/dvi pages processed so that those that are unchanged are not
>> reconverted to (already existing) bitmap images.
> Um, I have no idea how to do that.  Doc-view only knows the PNG images
> generated from the original document, and you can't compare those with
> pages in the document.
>
> Well, it also has the old document's contents in the current buffer and
> the updated document is on the file system, so in theory it could also
> compare the documents.  But I have no clue how to do that.  Googling
> around, I've found http://www.qtrac.eu/comparepdf.html, but I'm not sure
> if it does the trick.  (Oh, and of course if the comparison of the docs
> is not significantly cheaper than a reconversion, there's no sense in
> doing so. ;-))
>
> Bye,
> Tassilo

[-- Attachment #2: Type: text/html, Size: 4343 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Suggestion / feature request
  2012-04-19 17:15   ` Carlos Aguilar
@ 2012-04-19 18:59     ` Tassilo Horn
  0 siblings, 0 replies; 9+ messages in thread
From: Tassilo Horn @ 2012-04-19 18:59 UTC (permalink / raw)
  To: Carlos Aguilar; +Cc: emacs-devel

Carlos Aguilar <carlos.aguilar@unilim.fr> writes:

Hi Carlos,

> time (mkdir file1_burst; cd file1_burst; pdftk ../file1.pdf burst; cd ..
> mkdir file2_burst; cd file2_burst; pdftk ../file2.pdf burst
> for I in pg*; do echo $I; diff -a $I ../file1_burst/$I|grep -v 
> 'BaseFont\|FontName\|Creator\|Info\|...c...\|\-\-\-\|..c..'; done
> cd .. ; rm -Rf file1_burst; rm -Rf file2_burst)
>
> pg_0085.pdf
> < x?T?n?0
>          ??W?(5CQ??^
> < 0000059304 00000 n
>> 0000059305 00000 n
> < 0000059402 00000 n
> < 0000059466 00000 n
> < 0000059513 00000 n
>> 0000059403 00000 n
>> 0000059467 00000 n
>> 0000059514 00000 n
> < 59672
>> 59673
> pg_0086.pdf
>
> real    0m7.009s
> user    0m6.160s
> sys    0m0.210s
>
> This can probably be optimized much. The only page for which there is
> a difference is the one with a change. Doc-view reconversion needs in
> the same computer 20 seconds.

Hm, seven seconds is not so super-fast compared to a complete
reconversion taking 20 seconds.  And if you edit one of the first pages
in the TeX doc, it takes additional 7 seconds to find out that you have
to reconvert almost everything, anyway.

Especially consider that you are most probably viewing page 85 when you
change it in the TeX document and recompile.  Then, if doc-view would
use such a diff-command it'll take 7 seconds for it to know that it
needs to reconvert only page 85.  The current doc-view behavior is to
start the reconversion immediately *with the current page first*.  So
right now, you'll see your changes on the current page almoust
instantly.

The many temporary files are a bit problematic, too...

Another problem is that while we can somehow expect that most (at least
GNU/Linux) machines have GhostScript installed, but that's probably not
the case for pdftk.

But in general, if there was some command that given to PDF files
quickly tells me the pages that differ, I'd try to make use of it in
doc-view, of course, as an optional feature.  But "quickly" means
something around one second for a 100 pages doc.

Bye,
Tassilo



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Suggestion / feature request
  2012-04-19 13:28 Suggestion / feature request Carlos Aguilar
  2012-04-19 14:17 ` Tassilo Horn
@ 2012-04-19 22:21 ` Stefan Monnier
  2012-04-20 18:15   ` Tassilo Horn
  1 sibling, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2012-04-19 22:21 UTC (permalink / raw)
  To: Carlos Aguilar; +Cc: emacs-devel

> I often use doc-view mode with medium to large pdf/ps/dvi files, when
> writing/modifying latex documents. Whether these are slides or a document

Not sure what we can do for dvi/ps files, but I use doc-view for my
LaTeXing with pdflatex and have bumped into the same problem.

My solution is that I changed doc-view-mode to compute PNG pages lazily,
so after changing the PDF file, auto-revert-mode will cause the
currently displayed page to be recomputed and the rest will be flushed
(and recomputed on the fly when you move forward/backward in the
document).

It slows down movement in the document (tho I also added a command to
force recomputing the whole document), but is very worthwhile in terms
of battery power and machine load.

Tassilo (doc-view maintainer) wasn't too excited by my change because it
makes doc-view more complex and it only works for pdf (and it has a few
other downsides, e.g. lazily converting pages means that we usually
don't know how many total pages there are; which I partly addressed by
adding a bit of Elisp code which parses the PDF to extract the
page-count info, but that only works for not-too-recent versions of
pdf, since more recent ones use compression in that data and I haven't
bothered to figure out how/if that can be handled from Elisp).


        Stefan



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Suggestion / feature request
  2012-04-19 22:21 ` Stefan Monnier
@ 2012-04-20 18:15   ` Tassilo Horn
  2012-04-20 20:16     ` Stefan Monnier
  0 siblings, 1 reply; 9+ messages in thread
From: Tassilo Horn @ 2012-04-20 18:15 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Carlos Aguilar, emacs-devel

Stefan Monnier <monnier@IRO.UMontreal.CA> writes:

>> I often use doc-view mode with medium to large pdf/ps/dvi files, when
>> writing/modifying latex documents. Whether these are slides or a
>> document
>
> Not sure what we can do for dvi/ps files, but I use doc-view for my
> LaTeXing with pdflatex and have bumped into the same problem.

My main reason for not using doc-view when writing documents is that it
doesn't (and cannot) support forward/backward search using SyncTeX.

> Tassilo (doc-view maintainer) wasn't too excited by my change because
> it makes doc-view more complex and it only works for pdf (and it has a
> few other downsides

Hey, that shouldn't stop you from anything. ;-)

Over time, I've received some more mails from people using doc-view when
writing documents.  I've always thought that were just some weird souls,
but now we've reached a critical mass, and I come to agree that such a
lazy conversion fits the bill very well.  And of course, I wasn't really
up to implementing a feature I don't have a need for myself.  But since
you already have it and people request it, just go ahead for Emacs 24.2.

Bye,
Tassilo



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Suggestion / feature request
  2012-04-20 18:15   ` Tassilo Horn
@ 2012-04-20 20:16     ` Stefan Monnier
  2012-04-23  7:06       ` Carlos Aguilar
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2012-04-20 20:16 UTC (permalink / raw)
  To: Tassilo Horn; +Cc: Carlos Aguilar, emacs-devel

> My main reason for not using doc-view when writing documents is that it
> doesn't (and cannot) support forward/backward search using SyncTeX.

Yes, it's far from perfect, but it has its advantages.

> up to implementing a feature I don't have a need for myself.  But since
> you already have it and people request it, just go ahead for Emacs 24.2.

Damn!  So now I have to go and clean up that hack!


        Stefan



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Suggestion / feature request
  2012-04-20 20:16     ` Stefan Monnier
@ 2012-04-23  7:06       ` Carlos Aguilar
  0 siblings, 0 replies; 9+ messages in thread
From: Carlos Aguilar @ 2012-04-23  7:06 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Tassilo Horn, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1303 bytes --]

Well,
I see you two are up to it and will probably do a better job than me. I 
tried to use gs as pdftk. It works but it is pretty slow. As I launch gs 
for each page separately it reads the whole file for each page 
extraction which gives a time to split the document which is ... 
quadratic in the number of pages !

  I suppose you can launch just gs once and do a series of commands to 
avoid this issue but I haven't found yet how. I also suppose Stefan will 
use the function doc-view-pdf->png-1 for each page when doing the lazy 
reconversion which will also call gs once for each page ... if I find a 
workaround I will share it !

Anyway lazy reconversion is ok for me, and I am happy to have been 
useful at least  for counting as one among others :)

Thanks to both of you !

Carlos

Le 20/04/2012 22:16, Stefan Monnier a écrit :
>> My main reason for not using doc-view when writing documents is that it
>> doesn't (and cannot) support forward/backward search using SyncTeX.
> Yes, it's far from perfect, but it has its advantages.
>
>> up to implementing a feature I don't have a need for myself.  But since
>> you already have it and people request it, just go ahead for Emacs 24.2.
> Damn!  So now I have to go and clean up that hack!
>
>
>          Stefan

[-- Attachment #2: Type: text/html, Size: 1960 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-04-23  7:06 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-04-19 13:28 Suggestion / feature request Carlos Aguilar
2012-04-19 14:17 ` Tassilo Horn
2012-04-19 16:36   ` Carlos Aguilar
2012-04-19 17:15   ` Carlos Aguilar
2012-04-19 18:59     ` Tassilo Horn
2012-04-19 22:21 ` Stefan Monnier
2012-04-20 18:15   ` Tassilo Horn
2012-04-20 20:16     ` Stefan Monnier
2012-04-23  7:06       ` Carlos Aguilar

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).