unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* suggestion: function: buffer-bytes
@ 2007-07-01  0:18 T. V. Raman
  2007-07-01  1:39 ` Stefan Monnier
  0 siblings, 1 reply; 14+ messages in thread
From: T. V. Raman @ 2007-07-01  0:18 UTC (permalink / raw)
  To: emacs-devel

Emacs built-in buffer-size returns the number of characters ---
in some situations one needs the count of bytes. 
Here is a small function that does this --- perhaps it could be
ncluded in subr.el?

(defsubst buffer-bytes (&optional buffer)
  "Return number of bytes in a buffer."
  (save-excursion
    (and buffer (set-buffer buffer))
    (1- (position-bytes (point-max)))))

-- 
Best Regards,
--raman

      
Email:  raman@users.sf.net
WWW:    http://emacspeak.sf.net/raman/
AIM:    emacspeak       GTalk: tv.raman.tv@gmail.com
PGP:    http://emacspeak.sf.net/raman/raman-almaden.asc
Google: tv+raman 
IRC:    irc://irc.freenode.net/#emacs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: suggestion: function: buffer-bytes
  2007-07-01  0:18 suggestion: function: buffer-bytes T. V. Raman
@ 2007-07-01  1:39 ` Stefan Monnier
  2007-07-01  2:40   ` T. V. Raman
  0 siblings, 1 reply; 14+ messages in thread
From: Stefan Monnier @ 2007-07-01  1:39 UTC (permalink / raw)
  To: raman; +Cc: emacs-devel

> Emacs built-in buffer-size returns the number of characters ---
> in some situations one needs the count of bytes. 
> Here is a small function that does this --- perhaps it could be
> ncluded in subr.el?

> (defsubst buffer-bytes (&optional buffer)
>   "Return number of bytes in a buffer."
>   (save-excursion
>     (and buffer (set-buffer buffer))
>     (1- (position-bytes (point-max)))))

This function is very unlikely to ever be useful: the number of bytes to
represent a particular sequence of characters depends on the encoding used.
So for example the result will be different for the exact same text when run
in Emacs-22 or in Emacs-unicode.  And it most likely will different from the
number of bytes of the file associated with the buffer.


        Stefan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: suggestion: function: buffer-bytes
  2007-07-01  1:39 ` Stefan Monnier
@ 2007-07-01  2:40   ` T. V. Raman
  2007-07-01  6:31     ` David Kastrup
  2007-07-01  8:22     ` Kenichi Handa
  0 siblings, 2 replies; 14+ messages in thread
From: T. V. Raman @ 2007-07-01  2:40 UTC (permalink / raw)
  To: monnier; +Cc: raman, emacs-devel


Stephane,

Where I used this:

Package g-client 
http://emacspeak.googlecode.com/svn/trunk/lisp/g-client 

I use curl to talk HTTP in that package -- uses Atom Publishing
Protocol to talk to servers --
and I needed the byte count  for computing HTTP headers
correctly.
It does appear to work, but also because I do set buffer-encoding
appropriately in those buffers where I am building up the HTTP
message being posted.
buffer-size definitely bombs in that use case -- do you have a
better suggestion for how one might count bytes?

>>>>> "Stefan" == Stefan Monnier <monnier@iro.umontreal.ca> writes:
    >> Emacs built-in buffer-size returns the number of
    >> characters --- in some situations one needs the count of
    >> bytes.  Here is a small function that does this ---
    >> perhaps it could be ncluded in subr.el?
    Stefan> 
    >> (defsubst buffer-bytes (&optional buffer) "Return number
    >> of bytes in a buffer."  (save-excursion (and buffer
    >> (set-buffer buffer)) (1- (position-bytes (point-max)))))
    Stefan> 
    Stefan> This function is very unlikely to ever be useful: the
    Stefan> number of bytes to represent a particular sequence of
    Stefan> characters depends on the encoding used.  So for
    Stefan> example the result will be different for the exact
    Stefan> same text when run in Emacs-22 or in Emacs-unicode.
    Stefan> And it most likely will different from the number of
    Stefan> bytes of the file associated with the buffer.
    Stefan> 
    Stefan> 
    Stefan>         Stefan
    Stefan> 
    Stefan> 
    Stefan> _______________________________________________
    Stefan> Emacs-devel mailing list Emacs-devel@gnu.org
    Stefan> http://lists.gnu.org/mailman/listinfo/emacs-devel

-- 
Best Regards,
--raman

      
Email:  raman@users.sf.net
WWW:    http://emacspeak.sf.net/raman/
AIM:    emacspeak       GTalk: tv.raman.tv@gmail.com
PGP:    http://emacspeak.sf.net/raman/raman-almaden.asc
Google: tv+raman 
IRC:    irc://irc.freenode.net/#emacs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: suggestion: function: buffer-bytes
  2007-07-01  2:40   ` T. V. Raman
@ 2007-07-01  6:31     ` David Kastrup
  2007-07-01  8:22     ` Kenichi Handa
  1 sibling, 0 replies; 14+ messages in thread
From: David Kastrup @ 2007-07-01  6:31 UTC (permalink / raw)
  To: raman; +Cc: monnier, emacs-devel

"T. V. Raman" <raman@users.sf.net> writes:

> Stephane,
>
> Where I used this:
>
> Package g-client 
> http://emacspeak.googlecode.com/svn/trunk/lisp/g-client 
>
> I use curl to talk HTTP in that package -- uses Atom Publishing
> Protocol to talk to servers --
> and I needed the byte count  for computing HTTP headers
> correctly.
> It does appear to work, but also because I do set buffer-encoding
> appropriately in those buffers where I am building up the HTTP
> message being posted.

You can't: buffers are always encoded in Emacs-mule (or its own
version of utf-8 in Emacs 23), or in unibyte (in which case the
position-byte function becomes rather pointless).  I don't know what
you call "set buffer-encoding appropriately".

You can. presumably, talk in unibyte with your server and do the
encoding and decoding on the way to the buffer manually.  In which
case you can just use buffer positions in characters as synonyms with
those in bytes.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: suggestion: function: buffer-bytes
  2007-07-01  2:40   ` T. V. Raman
  2007-07-01  6:31     ` David Kastrup
@ 2007-07-01  8:22     ` Kenichi Handa
  2007-07-01 12:39       ` Stefan Monnier
                         ` (2 more replies)
  1 sibling, 3 replies; 14+ messages in thread
From: Kenichi Handa @ 2007-07-01  8:22 UTC (permalink / raw)
  To: raman; +Cc: emacs-devel, monnier, raman

In article <18055.5127.154758.705881@gargle.gargle.HOWL>, "T. V. Raman" <raman@users.sourceforge.net> writes:

> Package g-client 
> http://emacspeak.googlecode.com/svn/trunk/lisp/g-client 

> I use curl to talk HTTP in that package -- uses Atom Publishing
> Protocol to talk to servers --
> and I needed the byte count  for computing HTTP headers
> correctly.
> It does appear to work, but also because I do set buffer-encoding
> appropriately in those buffers where I am building up the HTTP
> message being posted.
> buffer-size definitely bombs in that use case -- do you have a
> better suggestion for how one might count bytes?

Then perhaps what you need is this.

(defun buffer-encoded-size (&optional buffer coding)
  "Return the encoded size of the current byffer in bytes.
..."
  (save-excursion
    (and buffer (set-buffer buffer))
    (or coding
	(setq coding buffer-file-coding-system))
    (length (encode-coding-string (buffer-string) coding))))

In emacs-unicode-2, you can use a little bit faster version.

(defun buffer-encoded-size (&optional buffer coding)
  "Return the encoded size of the current byffer in bytes.
..."
  (save-excursion
    (and buffer (set-buffer buffer))
    (or coding
	(setq coding buffer-file-coding-system))
    (length (encode-coding-region (point-min) (point-max) coding t))))

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: suggestion: function: buffer-bytes
  2007-07-01  8:22     ` Kenichi Handa
@ 2007-07-01 12:39       ` Stefan Monnier
  2007-07-02  0:48         ` Kenichi Handa
  2007-07-01 17:47       ` T. V. Raman
  2007-07-01 19:34       ` Eli Zaretskii
  2 siblings, 1 reply; 14+ messages in thread
From: Stefan Monnier @ 2007-07-01 12:39 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel, raman

> Then perhaps what you need is this.
> (defun buffer-encoded-size (&optional buffer coding)

But in his case, he's probably better off just encoding the buffer manually
before passing the data to the process/network, so that he can get his hands
on the number of bytes with just buffer-size.


        Stefan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: suggestion: function: buffer-bytes
  2007-07-01  8:22     ` Kenichi Handa
  2007-07-01 12:39       ` Stefan Monnier
@ 2007-07-01 17:47       ` T. V. Raman
  2007-07-02  1:11         ` Kenichi Handa
  2007-07-01 19:34       ` Eli Zaretskii
  2 siblings, 1 reply; 14+ messages in thread
From: T. V. Raman @ 2007-07-01 17:47 UTC (permalink / raw)
  To: handa; +Cc: emacs-devel, monnier, raman


So someone on the g-client group originally proposed computing
the length of buffer-string using string-encoding; I proposed
using position-bytes as a simpler alternative.



>>>>> "Kenichi" == Kenichi Handa <handa@m17n.org> writes:
    Kenichi> In article
    Kenichi> <18055.5127.154758.705881@gargle.gargle.HOWL>,
    Kenichi> "T. V. Raman" <raman@users.sourceforge.net> writes:
    >> Package g-client
    >> http://emacspeak.googlecode.com/svn/trunk/lisp/g-client
    Kenichi> 
    >> I use curl to talk HTTP in that package -- uses Atom
    >> Publishing Protocol to talk to servers -- and I needed the
    >> byte count for computing HTTP headers correctly.  It does
    >> appear to work, but also because I do set buffer-encoding
    >> appropriately in those buffers where I am building up the
    >> HTTP message being posted.  buffer-size definitely bombs
    >> in that use case -- do you have a better suggestion for
    >> how one might count bytes?
    Kenichi> 
    Kenichi> Then perhaps what you need is this.
    Kenichi> 
    Kenichi> (defun buffer-encoded-size (&optional buffer coding)
    Kenichi> "Return the encoded size of the current byffer in
    Kenichi> bytes.  ..."  (save-excursion (and buffer
    Kenichi> (set-buffer buffer)) (or coding (setq coding
    Kenichi> buffer-file-coding-system)) (length
    Kenichi> (encode-coding-string (buffer-string) coding))))
    Kenichi> 
    Kenichi> In emacs-unicode-2, you can use a little bit faster
    Kenichi> version.
    Kenichi> 
    Kenichi> (defun buffer-encoded-size (&optional buffer coding)
    Kenichi> "Return the encoded size of the current byffer in
    Kenichi> bytes.  ..."  (save-excursion (and buffer
    Kenichi> (set-buffer buffer)) (or coding (setq coding
    Kenichi> buffer-file-coding-system)) (length
    Kenichi> (encode-coding-region (point-min) (point-max) coding
    Kenichi> t))))
    Kenichi> 
    Kenichi> --- Kenichi Handa handa@m17n.org
    Kenichi> 
    Kenichi> 
    Kenichi> _______________________________________________
    Kenichi> Emacs-devel mailing list Emacs-devel@gnu.org
    Kenichi> http://lists.gnu.org/mailman/listinfo/emacs-devel

-- 
Best Regards,
--raman

      
Email:  raman@users.sf.net
WWW:    http://emacspeak.sf.net/raman/
AIM:    emacspeak       GTalk: tv.raman.tv@gmail.com
PGP:    http://emacspeak.sf.net/raman/raman-almaden.asc
Google: tv+raman 
IRC:    irc://irc.freenode.net/#emacs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: suggestion: function: buffer-bytes
  2007-07-01  8:22     ` Kenichi Handa
  2007-07-01 12:39       ` Stefan Monnier
  2007-07-01 17:47       ` T. V. Raman
@ 2007-07-01 19:34       ` Eli Zaretskii
  2007-07-02  0:55         ` Kenichi Handa
  2 siblings, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2007-07-01 19:34 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel, raman

> From: Kenichi Handa <handa@m17n.org>
> Date: Sun, 01 Jul 2007 17:22:29 +0900
> Cc: emacs-devel@gnu.org, monnier@iro.umontreal.ca, raman@users.sourceforge.net
> 
> In emacs-unicode-2, you can use a little bit faster version.
> 
> (defun buffer-encoded-size (&optional buffer coding)
>   "Return the encoded size of the current byffer in bytes.
> ..."
>   (save-excursion
>     (and buffer (set-buffer buffer))
>     (or coding
> 	(setq coding buffer-file-coding-system))
>     (length (encode-coding-region (point-min) (point-max) coding t))))

Why can't he use this version in Emacs 22?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: suggestion: function: buffer-bytes
  2007-07-01 12:39       ` Stefan Monnier
@ 2007-07-02  0:48         ` Kenichi Handa
  0 siblings, 0 replies; 14+ messages in thread
From: Kenichi Handa @ 2007-07-02  0:48 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: raman, emacs-devel

In article <jwvabug711b.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:

> > Then perhaps what you need is this.
> > (defun buffer-encoded-size (&optional buffer coding)

> But in his case, he's probably better off just encoding the buffer manually
> before passing the data to the process/network, so that he can get his hands
> on the number of bytes with just buffer-size.

Perhaps.  But, I don't know how and when the data is passed
to the process and when he needs the encoded byte size.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: suggestion: function: buffer-bytes
  2007-07-01 19:34       ` Eli Zaretskii
@ 2007-07-02  0:55         ` Kenichi Handa
  0 siblings, 0 replies; 14+ messages in thread
From: Kenichi Handa @ 2007-07-02  0:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, raman

In article <uved3lxy8.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > From: Kenichi Handa <handa@m17n.org>
> > Date: Sun, 01 Jul 2007 17:22:29 +0900
> > Cc: emacs-devel@gnu.org, monnier@iro.umontreal.ca, raman@users.sourceforge.net
> > 
> > In emacs-unicode-2, you can use a little bit faster version.
> > 
> > (defun buffer-encoded-size (&optional buffer coding)
> >   "Return the encoded size of the current byffer in bytes.
> > ..."
> >   (save-excursion
> >     (and buffer (set-buffer buffer))
> >     (or coding
> > 	(setq coding buffer-file-coding-system))
> >     (length (encode-coding-region (point-min) (point-max) coding t))))

> Why can't he use this version in Emacs 22?

Because the 4th argument DESTINATION of encode-coding-region
is introduced by emacs-unicode-2.

Optional 4th arguments DESTINATION specifies where the encoded text goes.
If nil, the region between start and end is replace by the encoded text.
If buffer, the encoded text is inserted in the buffer.
If t, the encoded text is returned.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: suggestion: function: buffer-bytes
  2007-07-01 17:47       ` T. V. Raman
@ 2007-07-02  1:11         ` Kenichi Handa
  2007-07-04 16:31           ` T. V. Raman
  0 siblings, 1 reply; 14+ messages in thread
From: Kenichi Handa @ 2007-07-02  1:11 UTC (permalink / raw)
  To: raman; +Cc: raman, monnier, emacs-devel

In article <18055.59548.547994.901837@gargle.gargle.HOWL>, "T. V. Raman" <raman@users.sourceforge.net> writes:

> So someone on the g-client group originally proposed computing
> the length of buffer-string using string-encoding; I proposed
> using position-bytes as a simpler alternative.

I see.  And, do you see why position-bytes can't be used in
the current case now?

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: suggestion: function: buffer-bytes
  2007-07-02  1:11         ` Kenichi Handa
@ 2007-07-04 16:31           ` T. V. Raman
  2007-07-05 14:42             ` Stefan Monnier
  0 siblings, 1 reply; 14+ messages in thread
From: T. V. Raman @ 2007-07-04 16:31 UTC (permalink / raw)
  To: handa; +Cc: emacs-devel, monnier, raman


I can se how position-bytes is not a generic solution for writing
a buffer-bytes function.

But for the use case I needed, namely someone composing blog
entries using nxml-mode,
and then building up an http post message from what they created,
the position-bytes solution does work.

>>>>> "Kenichi" == Kenichi Handa <handa@m17n.org> writes:
    Kenichi> In article
    Kenichi> <18055.59548.547994.901837@gargle.gargle.HOWL>,
    Kenichi> "T. V. Raman" <raman@users.sourceforge.net> writes:
    >> So someone on the g-client group originally proposed
    >> computing the length of buffer-string using
    >> string-encoding; I proposed using position-bytes as a
    >> simpler alternative.
    Kenichi> 
    Kenichi> I see.  And, do you see why position-bytes can't be
    Kenichi> used in the current case now?
    Kenichi> 
    Kenichi> --- Kenichi Handa handa@m17n.org
    Kenichi> 
    Kenichi> 
    Kenichi> _______________________________________________
    Kenichi> Emacs-devel mailing list Emacs-devel@gnu.org
    Kenichi> http://lists.gnu.org/mailman/listinfo/emacs-devel

-- 
Best Regards,
--raman

      
Email:  raman@users.sf.net
WWW:    http://emacspeak.sf.net/raman/
AIM:    emacspeak       GTalk: tv.raman.tv@gmail.com
PGP:    http://emacspeak.sf.net/raman/raman-almaden.asc
Google: tv+raman 
IRC:    irc://irc.freenode.net/#emacs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: suggestion: function: buffer-bytes
  2007-07-04 16:31           ` T. V. Raman
@ 2007-07-05 14:42             ` Stefan Monnier
  2007-07-09  2:44               ` Kenichi Handa
  0 siblings, 1 reply; 14+ messages in thread
From: Stefan Monnier @ 2007-07-05 14:42 UTC (permalink / raw)
  To: raman; +Cc: raman, emacs-devel, handa

> But for the use case I needed, namely someone composing blog
> entries using nxml-mode,
> and then building up an http post message from what they created,
> the position-bytes solution does work.

If it worked in some case, it was only by accident, unless they specifically
decided to use emacs-mule as the coding system for their file (and to switch
wholesale to utf-8 at the exact time they upgrade their Emacs), which would
be rather unusual.


        Stefan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: suggestion: function: buffer-bytes
  2007-07-05 14:42             ` Stefan Monnier
@ 2007-07-09  2:44               ` Kenichi Handa
  0 siblings, 0 replies; 14+ messages in thread
From: Kenichi Handa @ 2007-07-09  2:44 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel, raman

In article <jwvk5teoqvj.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:

> > But for the use case I needed, namely someone composing blog
> > entries using nxml-mode,
> > and then building up an http post message from what they created,
> > the position-bytes solution does work.

> If it worked in some case, it was only by accident, unless they specifically
> decided to use emacs-mule as the coding system for their file (and to switch
> wholesale to utf-8 at the exact time they upgrade their Emacs), which would
> be rather unusual.

Right.  By the way, currently emacs-mule uses 2-bytes, for
instance, for Latin-1 characters, and UTF-8 also uses
2-bytes for them.  So, in the case of sending a Latin-1 text
to a process with UTF-8, position-bytes solution works just
by chance.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2007-07-09  2:44 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-01  0:18 suggestion: function: buffer-bytes T. V. Raman
2007-07-01  1:39 ` Stefan Monnier
2007-07-01  2:40   ` T. V. Raman
2007-07-01  6:31     ` David Kastrup
2007-07-01  8:22     ` Kenichi Handa
2007-07-01 12:39       ` Stefan Monnier
2007-07-02  0:48         ` Kenichi Handa
2007-07-01 17:47       ` T. V. Raman
2007-07-02  1:11         ` Kenichi Handa
2007-07-04 16:31           ` T. V. Raman
2007-07-05 14:42             ` Stefan Monnier
2007-07-09  2:44               ` Kenichi Handa
2007-07-01 19:34       ` Eli Zaretskii
2007-07-02  0:55         ` Kenichi Handa

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).