all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* md5 checksum of a img file, or get the value of 100th byte
@ 2008-06-05 11:29 Xah
  2008-06-05 19:17 ` Eli Zaretskii
       [not found] ` <mailman.12717.1212693595.18990.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 12+ messages in thread
From: Xah @ 2008-06-05 11:29 UTC (permalink / raw)
  To: help-gnu-emacs

I have few thousand image files, and i need to compute a unique id for
each... something like md5 checksum.

I noticed that elisp has a md5 function that does that, but it takes a
buffer as argument. Does this mean that i have to create a buffer
first by opening the image file? Would this be very inefficient? since
all i need is to build a hash table of the image files, with checksum
as id and file path(s) as value.

or, is there some other way? Perhaps, i can use the image file size
together with a list of byte values at several file stream locations,
as the unique id... this should be sufficient for my purposes but how
do i get the value of, say, the 1000th byte?

Thanks in advance.

  Xah
  xah@xahlee.org
∑ http://xahlee.org/^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: md5 checksum of a img file, or get the value of 100th byte
  2008-06-05 11:29 md5 checksum of a img file, or get the value of 100th byte Xah
@ 2008-06-05 19:17 ` Eli Zaretskii
       [not found] ` <mailman.12717.1212693595.18990.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 12+ messages in thread
From: Eli Zaretskii @ 2008-06-05 19:17 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Xah <xahlee@gmail.com>
> Date: Thu, 5 Jun 2008 04:29:59 -0700 (PDT)
> 
> I noticed that elisp has a md5 function that does that, but it takes a
> buffer as argument. Does this mean that i have to create a buffer
> first by opening the image file? Would this be very inefficient? since
> all i need is to build a hash table of the image files, with checksum
> as id and file path(s) as value.

Reading the file into a buffer and then working on that buffer is one
of the most efficient operations in Emacs.  After all, this is the
most basic operation of any text editor, so it must be heavily
optimized.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: md5 checksum of a img file, or get the value of 100th byte
       [not found] ` <mailman.12717.1212693595.18990.help-gnu-emacs@gnu.org>
@ 2008-06-06 14:44   ` Ted Zlatanov
  2008-06-06 16:33     ` Eli Zaretskii
       [not found]     ` <mailman.12786.1212770026.18990.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 12+ messages in thread
From: Ted Zlatanov @ 2008-06-06 14:44 UTC (permalink / raw)
  To: help-gnu-emacs

On Thu, 05 Jun 2008 22:17:02 +0300 Eli Zaretskii <eliz@gnu.org> wrote: 

EZ> Reading the file into a buffer and then working on that buffer is one
EZ> of the most efficient operations in Emacs.  After all, this is the
EZ> most basic operation of any text editor, so it must be heavily
EZ> optimized.

Not if the file is large :)  I'd like to see a 'sliding editing window'
for large files but I think it would require changes at the C level.

Ted


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: md5 checksum of a img file, or get the value of 100th byte
  2008-06-06 14:44   ` Ted Zlatanov
@ 2008-06-06 16:33     ` Eli Zaretskii
       [not found]     ` <mailman.12786.1212770026.18990.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 12+ messages in thread
From: Eli Zaretskii @ 2008-06-06 16:33 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Ted Zlatanov <tzz@lifelogs.com>
> Date: Fri, 06 Jun 2008 09:44:57 -0500
> 
> On Thu, 05 Jun 2008 22:17:02 +0300 Eli Zaretskii <eliz@gnu.org> wrote: 
> 
> EZ> Reading the file into a buffer and then working on that buffer is one
> EZ> of the most efficient operations in Emacs.  After all, this is the
> EZ> most basic operation of any text editor, so it must be heavily
> EZ> optimized.
> 
> Not if the file is large :)

Yes, even if the file is large.

Note, I'm talking about Emacs, not about any computer program in
general.

> I'd like to see a 'sliding editing window' for large files but I
> think it would require changes at the C level.

??? insert-file-contents already can read only a portion of a file.
See its doc string.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: md5 checksum of a img file, or get the value of 100th byte
       [not found]     ` <mailman.12786.1212770026.18990.help-gnu-emacs@gnu.org>
@ 2008-06-06 18:55       ` Ted Zlatanov
  2008-06-06 20:07         ` Eli Zaretskii
       [not found]         ` <mailman.12802.1212782926.18990.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 12+ messages in thread
From: Ted Zlatanov @ 2008-06-06 18:55 UTC (permalink / raw)
  To: help-gnu-emacs

On Fri, 06 Jun 2008 19:33:40 +0300 Eli Zaretskii <eliz@gnu.org> wrote: 

>> From: Ted Zlatanov <tzz@lifelogs.com>
>> Date: Fri, 06 Jun 2008 09:44:57 -0500
>> 
>> On Thu, 05 Jun 2008 22:17:02 +0300 Eli Zaretskii <eliz@gnu.org> wrote: 
>> 
EZ> Reading the file into a buffer and then working on that buffer is one
EZ> of the most efficient operations in Emacs.  After all, this is the
EZ> most basic operation of any text editor, so it must be heavily
EZ> optimized.
>> 
>> Not if the file is large :)

EZ> Yes, even if the file is large.

EZ> Note, I'm talking about Emacs, not about any computer program in
EZ> general.

I thought you meant the OP should just read the whole file into the
buffer, which is a bad idea for large files.  Sorry.

>> I'd like to see a 'sliding editing window' for large files but I
>> think it would require changes at the C level.

EZ> ??? insert-file-contents already can read only a portion of a file.
EZ> See its doc string.

I know about that.  I'd like to be able to open a 1 GB file in Emacs and
view just a piece at a time, in a sliding window.  I am not aware of
functionality to do this, and it's hard to do it efficiently because
every save is so painful, not to mention selecting the whole buffer and
other normally simple operations.  Read-only may be somewhat OK but lots
of basic commands will need to be advised or overridden.

Ted


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: md5 checksum of a img file, or get the value of 100th byte
  2008-06-06 18:55       ` Ted Zlatanov
@ 2008-06-06 20:07         ` Eli Zaretskii
       [not found]         ` <mailman.12802.1212782926.18990.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 12+ messages in thread
From: Eli Zaretskii @ 2008-06-06 20:07 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Ted Zlatanov <tzz@lifelogs.com>
> Date: Fri, 06 Jun 2008 13:55:18 -0500
> 
> EZ> ??? insert-file-contents already can read only a portion of a file.
> EZ> See its doc string.
> 
> I know about that.  I'd like to be able to open a 1 GB file in Emacs and
> view just a piece at a time, in a sliding window.  I am not aware of
> functionality to do this

For viewing such a file, it should be easy to write some Lisp using
insert-file-contents to read a file one chunk at a time.  For editing,
I submit that there should be no reason to edit such large files.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: md5 checksum of a img file, or get the value of 100th byte
       [not found]         ` <mailman.12802.1212782926.18990.help-gnu-emacs@gnu.org>
@ 2008-06-06 22:45           ` Ted Zlatanov
  2008-06-07  6:34             ` Eli Zaretskii
       [not found]             ` <mailman.12833.1212820449.18990.help-gnu-emacs@gnu.org>
  2008-06-07  0:03           ` Tim X
  1 sibling, 2 replies; 12+ messages in thread
From: Ted Zlatanov @ 2008-06-06 22:45 UTC (permalink / raw)
  To: help-gnu-emacs

On Fri, 06 Jun 2008 23:07:11 +0300 Eli Zaretskii <eliz@gnu.org> wrote: 

>> From: Ted Zlatanov <tzz@lifelogs.com>
>> Date: Fri, 06 Jun 2008 13:55:18 -0500
>> 
EZ> ??? insert-file-contents already can read only a portion of a file.
EZ> See its doc string.
>> 
>> I know about that.  I'd like to be able to open a 1 GB file in Emacs and
>> view just a piece at a time, in a sliding window.  I am not aware of
>> functionality to do this

EZ> For viewing such a file, it should be easy to write some Lisp using
EZ> insert-file-contents to read a file one chunk at a time.

That's what I meant, but it's definitely not easy.  Emacs assumes it can
just copy all data into the kill ring, for instance.  I couldn't write
such a node though I'd appreciate it very much.

EZ> For editing, I submit that there should be no reason to edit such
EZ> large files.

Maybe 10 years ago that was true.  Today I see huge (over 300 MB) files
very often (as a programmer and as a sysadmin).  Logs of all sorts, for
instance, and database dumps.

It gets annoying to use less, split, grep, and Perl to get things done
when in Emacs it would have been a few quick commands.  But I cope.

Ted


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: md5 checksum of a img file, or get the value of 100th byte
       [not found]         ` <mailman.12802.1212782926.18990.help-gnu-emacs@gnu.org>
  2008-06-06 22:45           ` Ted Zlatanov
@ 2008-06-07  0:03           ` Tim X
  2008-06-09 15:15             ` Ted Zlatanov
  1 sibling, 1 reply; 12+ messages in thread
From: Tim X @ 2008-06-07  0:03 UTC (permalink / raw)
  To: help-gnu-emacs

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Ted Zlatanov <tzz@lifelogs.com>
>> Date: Fri, 06 Jun 2008 13:55:18 -0500
>> 
>> EZ> ??? insert-file-contents already can read only a portion of a file.
>> EZ> See its doc string.
>> 
>> I know about that.  I'd like to be able to open a 1 GB file in Emacs and
>> view just a piece at a time, in a sliding window.  I am not aware of
>> functionality to do this
>
> For viewing such a file, it should be easy to write some Lisp using
> insert-file-contents to read a file one chunk at a time.  For editing,
> I submit that there should be no reason to edit such large files.
>
A few years ago, I would have totally agreed. However, I'm frequently
coming across files larger than 1Gb that need editing. I will admit that
when I've needed to do this, the editing has been pretty
straight-forward and I've used sed to do it instead of trying to use an
editor. For example, I regularly need to edit large (multiple gigabyte)
Oracle database files in order to change some control information so
that the files can be loaded into a different database instance. While
Oracle now does have things like cloneing and hot backups that can
handle this sort of requirement more efficiently, sometimes it is still
required to do it manually. 

My point is that I'm encountering situations in which files are 1Gb and
larger an they need text to be changed and that these days, editing of
1Gb files isn't as odd a requirement as it use to be. However, the
extent to which you need an editor like emacs to do this is probably
still questionable. My experience is that utilities like sed, awk, perl
and other scripting languages can probably handle most
cases. Unfortunately, it is becoming rare that people even seem to know
about things like sed/awk and therefore turn to something like emacs or
vi to solve heir problem. 

To the OP, I don't understand why you would be using emacs for the job
you describe. sometimes I think people are trying to use emacs for the
wrong tasks. Its a great editor with wonderful support for configuration
and customization, but it is primarily an editor, not a full blown
programming environment. It sounds like using existing utilities to
create unique signatures would be far more efficient and less time
consuming. For example, you could solve yor problem in just a few lines
of perl code. My recommendation would be to use Emacs to write the perl
code, but let perl do the task of generating unique sigs from either the
image file data or the path/name information for the file. 

Tim


-- 
tcross (at) rapttech dot com dot au


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: md5 checksum of a img file, or get the value of 100th byte
  2008-06-06 22:45           ` Ted Zlatanov
@ 2008-06-07  6:34             ` Eli Zaretskii
       [not found]             ` <mailman.12833.1212820449.18990.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 12+ messages in thread
From: Eli Zaretskii @ 2008-06-07  6:34 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Ted Zlatanov <tzz@lifelogs.com>
> Date: Fri, 06 Jun 2008 17:45:43 -0500
> 
> EZ> For editing, I submit that there should be no reason to edit such
> EZ> large files.
> 
> Maybe 10 years ago that was true.  Today I see huge (over 300 MB) files
> very often (as a programmer and as a sysadmin).  Logs of all sorts, for
> instance, and database dumps.

Why would you want to _edit_ (as opposed to _view_) these files?

Please note that I agreed that viewing such large files chunk-wise
would be useful, indeed.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: md5 checksum of a img file, or get the value of 100th byte
       [not found]             ` <mailman.12833.1212820449.18990.help-gnu-emacs@gnu.org>
@ 2008-06-09 15:04               ` Ted Zlatanov
  2008-06-09 15:58                 ` Eli Zaretskii
  0 siblings, 1 reply; 12+ messages in thread
From: Ted Zlatanov @ 2008-06-09 15:04 UTC (permalink / raw)
  To: help-gnu-emacs

On Sat, 07 Jun 2008 09:34:01 +0300 Eli Zaretskii <eliz@gnu.org> wrote: 

>> From: Ted Zlatanov <tzz@lifelogs.com>
>> Date: Fri, 06 Jun 2008 17:45:43 -0500
>> 
EZ> For editing, I submit that there should be no reason to edit such
EZ> large files.
>> 
>> Maybe 10 years ago that was true.  Today I see huge (over 300 MB) files
>> very often (as a programmer and as a sysadmin).  Logs of all sorts, for
>> instance, and database dumps.

EZ> Why would you want to _edit_ (as opposed to _view_) these files?

As an example, at a previous job all server log time stamps were in Unix
epoch offsets.  That was hard to interpret, so I often had to filter the
whole file to rewrite those time stamps to UTC ISO format.  I then
needed to do further modifications, e.g. look at only fields 3-9 and
12.  I wrote many custom tools to make this easier, but in Emacs it
would have been trivial.  Sure, I can write a custom mode or pipe the
file through a filter, but I *knew* I'd never need the original time
stamps, so saving the ISO dates the first time was productive, and
editing on the fly in general would have made my life easier.

As another example, I often have to peek inside a large database dump or
import CSV/TSV/etc. file and understand what went wrong.  Once
understood, it's more efficient to fix and save inside Emacs than split,
modify, recombine outside Emacs.

Expected memory sizes today are 2-8 GB.  Disk speed is really the major
issue with editing a 300MB file, and that's improving swiftly lately.  I
think it's feasible and not too painful to allow this kind of editing.
How hard would it be to customize the Emacs defaults for that to happen?

EZ> Please note that I agreed that viewing such large files chunk-wise
EZ> would be useful, indeed.

Do you or anyone know of an existing less-like mode for Emacs?  If not,
I'll put it on my TODO list.

Thanks
Ted


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: md5 checksum of a img file, or get the value of 100th byte
  2008-06-07  0:03           ` Tim X
@ 2008-06-09 15:15             ` Ted Zlatanov
  0 siblings, 0 replies; 12+ messages in thread
From: Ted Zlatanov @ 2008-06-09 15:15 UTC (permalink / raw)
  To: help-gnu-emacs

On Sat, 07 Jun 2008 10:03:26 +1000 Tim X <timx@nospam.dev.null> wrote: 

TX> editing of 1Gb files isn't as odd a requirement as it use to
TX> be. However, the extent to which you need an editor like emacs to do
TX> this is probably still questionable. My experience is that utilities
TX> like sed, awk, perl and other scripting languages can probably
TX> handle most cases. Unfortunately, it is becoming rare that people
TX> even seem to know about things like sed/awk and therefore turn to
TX> something like emacs or vi to solve heir problem.

I'm comfortable with Perl and I've used it for many such tasks.  It's
sort of a superset of sed and awk, so I won't comment on those or other
capable scripting languages (Python, Ruby, etc.).

The Emacs features I've missed the most when writing Perl filters:

- instant feedback
- incremental search and replace
- run any function on a region or the whole buffer interactively
- automatic backups (perl sort of has that with -i.bak)
- automatic undo
- open remote files (over Tramp) and run VCS commands on them
- toggle-debug-on-*

For fairness, here are the Perl features I miss the most in Emacs when I
edit interactively:

- obfuscated code ;)
- the -n -p -i -a -e switches for filtering and in-place editing
- CPAN modules of all kinds
- database queries
- Perl's regular expressions
- speed, benchmarks, Test::More
- chained pipelines

When you consider all this, it's clear that they both have valuable
features for editing files, so editing a large file in Emacs is
worthwhile.

Ted


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: md5 checksum of a img file, or get the value of 100th byte
  2008-06-09 15:04               ` Ted Zlatanov
@ 2008-06-09 15:58                 ` Eli Zaretskii
  0 siblings, 0 replies; 12+ messages in thread
From: Eli Zaretskii @ 2008-06-09 15:58 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Ted Zlatanov <tzz@lifelogs.com>
> Date: Mon, 09 Jun 2008 10:04:45 -0500
> 
> Do you or anyone know of an existing less-like mode for Emacs?

view-mode is supposed to be it, but it doesn't include the
functionality you are looking for, AFAIK.




^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-06-09 15:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-05 11:29 md5 checksum of a img file, or get the value of 100th byte Xah
2008-06-05 19:17 ` Eli Zaretskii
     [not found] ` <mailman.12717.1212693595.18990.help-gnu-emacs@gnu.org>
2008-06-06 14:44   ` Ted Zlatanov
2008-06-06 16:33     ` Eli Zaretskii
     [not found]     ` <mailman.12786.1212770026.18990.help-gnu-emacs@gnu.org>
2008-06-06 18:55       ` Ted Zlatanov
2008-06-06 20:07         ` Eli Zaretskii
     [not found]         ` <mailman.12802.1212782926.18990.help-gnu-emacs@gnu.org>
2008-06-06 22:45           ` Ted Zlatanov
2008-06-07  6:34             ` Eli Zaretskii
     [not found]             ` <mailman.12833.1212820449.18990.help-gnu-emacs@gnu.org>
2008-06-09 15:04               ` Ted Zlatanov
2008-06-09 15:58                 ` Eli Zaretskii
2008-06-07  0:03           ` Tim X
2008-06-09 15:15             ` Ted Zlatanov

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.