unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: Austin Clements <amdragon@MIT.EDU>
To: Tomi Ollila <tomi.ollila@iki.fi>
Cc: notmuch@notmuchmail.org
Subject: Re: [Patch v5 3/6] util: add gz_readline
Date: Wed, 2 Apr 2014 16:43:38 -0400	[thread overview]
Message-ID: <20140402204337.GA4678@mit.edu> (raw)
In-Reply-To: <m2wqf73co4.fsf@guru.guru-group.fi>

Quoth Tomi Ollila on Apr 02 at  7:43 pm:
> On Wed, Apr 02 2014, Austin Clements <amdragon@MIT.EDU> wrote:
> 
> > Quoth David Bremner on Apr 01 at 10:16 pm:
> >> The idea is to provide a more or less drop in replacement for readline
> >> to read from zlib/gzip streams.  Take the opportunity to replace
> >> malloc with talloc.
> >> ---
> >>  util/Makefile.local |  2 +-
> >>  util/util.h         | 12 +++++++++
> >>  util/zlib-extra.c   | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>  util/zlib-extra.h   | 11 ++++++++
> >>  4 files changed, 100 insertions(+), 1 deletion(-)
> >>  create mode 100644 util/util.h
> >>  create mode 100644 util/zlib-extra.c
> >>  create mode 100644 util/zlib-extra.h
> >> 
> >> diff --git a/util/Makefile.local b/util/Makefile.local
> >> index 29c0ce6..e2a5b65 100644
> >> --- a/util/Makefile.local
> >> +++ b/util/Makefile.local
> >> @@ -4,7 +4,7 @@ dir := util
> >>  extra_cflags += -I$(srcdir)/$(dir)
> >>  
> >>  libutil_c_srcs := $(dir)/xutil.c $(dir)/error_util.c $(dir)/hex-escape.c \
> >> -		  $(dir)/string-util.c $(dir)/talloc-extra.c
> >> +		  $(dir)/string-util.c $(dir)/talloc-extra.c $(dir)/zlib-extra.c
> >>  
> >>  libutil_modules := $(libutil_c_srcs:.c=.o)
> >>  
> >> diff --git a/util/util.h b/util/util.h
> >> new file mode 100644
> >> index 0000000..8663cfc
> >> --- /dev/null
> >> +++ b/util/util.h
> >> @@ -0,0 +1,12 @@
> >> +#ifndef _UTIL_H
> >> +#define _UTIL_H
> >> +
> >> +typedef enum util_status {
> >> +    UTIL_SUCCESS = 0,
> >> +    UTIL_ERROR = 1,
> >> +    UTIL_OUT_OF_MEMORY,
> >> +    UTIL_EOF,
> >> +    UTIL_FILE,
> >> +} util_status_t;
> >> +
> >> +#endif
> >> diff --git a/util/zlib-extra.c b/util/zlib-extra.c
> >> new file mode 100644
> >> index 0000000..cb1eba0
> >> --- /dev/null
> >> +++ b/util/zlib-extra.c
> >> @@ -0,0 +1,76 @@
> >> +/* zlib-extra.c -  Extra or enhanced routines for compressed I/O.
> >> + *
> >> + * Copyright (c) 2014 David Bremner
> >> + *
> >> + * This program is free software: you can redistribute it and/or modify
> >> + * it under the terms of the GNU General Public License as published by
> >> + * the Free Software Foundation, either version 3 of the License, or
> >> + * (at your option) any later version.
> >> + *
> >> + * This program is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> + * GNU General Public License for more details.
> >> + *
> >> + * You should have received a copy of the GNU General Public License
> >> + * along with this program.  If not, see http://www.gnu.org/licenses/ .
> >> + *
> >> + * Author: David Bremner <david@tethera.net>
> >> + */
> >> +
> >> +#include "zlib-extra.h"
> >> +#include <talloc.h>
> >> +#include <stdio.h>
> >> +#include <string.h>
> >> +
> >> +/* mimic POSIX/glibc getline, but on a zlib gzFile stream, and using talloc */
> >> +util_status_t
> >> +gz_getline (void *talloc_ctx, char **bufptr, size_t *bufsiz, ssize_t *bytes_read,
> >
> > Talloc chunks know their size, so rather than taking bufsize, use
> > talloc_get_size (or talloc_array_length if you switch to talloc array
> > functions below).
> 
> Now yoy David have a chance to drop the bufsiz argument altogether, as the
> info is available in *bufptr:s talloc context...
> 
> >
> >> +	    gzFile stream)
> >> +{
> >> +    size_t len = *bufsiz;
> >> +    char *buf = *bufptr;
> >> +    size_t offset = 0;
> >> +
> >> +    if (len == 0 || buf == NULL) {
> >> +	/* same as getdelim from gnulib */
> >> +	len = 120;
> >
> > This is presumably because glibc's malloc has an 8 byte header.  Fun
> > fact: talloc has a 104 byte header (on 64-bit and including the malloc
> > header).
> 
> hmm, what should we choose here? 152 ? Some bikeshedding on IRC ?

How about we bikeshed about not bikeshedding about this?

> >> +	buf = talloc_size (talloc_ctx, len);
> >> +	if (buf == NULL)
> >> +	    return UTIL_OUT_OF_MEMORY;
> >> +    }
> >> +
> >> +    while (1) {
> >> +	if (! gzgets (stream, buf + offset, len - offset)) {
> >> +	    int zlib_status = 0;
> >> +	    (void) gzerror (stream, &zlib_status);
> >> +	    switch (zlib_status) {
> >> +	    case Z_OK:
> >> +		/* follow getline behaviour */
> >> +		*bytes_read = -1;
> >
> > Is this really what getline does when the last line of a file isn't
> > \n-terminated?
> 
> Maybe the previous call returned non-\n -terminated string and
> for this call there was 0 bytes left to return ???

But my point is that the previous call *won't* return a
non-\n-terminated string.  If my file looks like "a\nb\nc", this will
return "a\n", then "b\n", and then fail (unless I'm following the code
wrong).  This is *not* what getline does (the manpage is confusing,
but I just tested it).

> Tomi
> 
> >> +		return UTIL_EOF;
> >> +		break;

Unnecessary break.

> >> +	    case Z_ERRNO:
> >> +		return UTIL_FILE;
> >> +		break;

And here.

> >> +	    default:
> >> +		return UTIL_ERROR;
> >> +	    }
> >> +	}
> >> +
> >> +	offset += strlen (buf + offset);
> >> +
> >> +	if ( buf[offset - 1] == '\n' )
> >
> > Too many spaces!
> >
> >> +	    break;
> >> +
> >> +	len *= 2;
> >> +	buf = talloc_realloc (talloc_ctx, buf, char, len);
> >
> > Or talloc_realloc_size, to match the initial talloc_size.
> > Alternatively, the initial talloc_size could be a talloc_array.
> >
> >> +	if (buf == NULL)
> >> +	    return UTIL_OUT_OF_MEMORY;
> >> +    }
> >> +
> >> +    *bufptr = buf;
> >> +    *bufsiz = len;
> >> +    *bytes_read = offset;
> >> +    return UTIL_SUCCESS;
> >> +}
> >> diff --git a/util/zlib-extra.h b/util/zlib-extra.h
> >> new file mode 100644
> >> index 0000000..ed46ac1
> >> --- /dev/null
> >> +++ b/util/zlib-extra.h
> >> @@ -0,0 +1,11 @@
> >> +#ifndef _ZLIB_EXTRA_H
> >> +#define _ZLIB_EXTRA_H
> >> +
> >> +#include <zlib.h>
> >> +#include "util.h"
> >
> > I'd put "util.h" first so we're more likely to catch missing header
> > dependencies (obviously util.h doesn't have any right now, but in the
> > future).
> >
> > Also, I'd put a blank line after the #includes.
> >
> >> +/* Like getline, but read from a gzFile. Allocation is with talloc */
> >> +util_status_t
> >> +gz_getline (void *ctx, char **lineptr, size_t *line_size, ssize_t *bytes_read,
> >> +	    gzFile stream);
> >> +
> >> +#endif

  reply	other threads:[~2014-04-02 20:43 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-02  1:16 v5 gzip / dump restore David Bremner
2014-04-02  1:16 ` [Patch v5 1/6] dump: support gzipped output David Bremner
2014-04-02 16:50   ` Tomi Ollila
2014-04-02  1:16 ` [Patch v5 2/6] dump: when given output file name, write atomically David Bremner
2014-04-02  1:16 ` [Patch v5 3/6] util: add gz_readline David Bremner
2014-04-02  3:26   ` Austin Clements
2014-04-02 16:43     ` Tomi Ollila
2014-04-02 20:43       ` Austin Clements [this message]
2014-04-03  6:03       ` Kim Minh Kaplan
2014-04-03  6:17       ` Kim Minh Kaplan
2014-04-02  1:16 ` [Patch v5 4/6] restore: transparently support gzipped input David Bremner
2014-04-02  2:49   ` Austin Clements
2014-04-02  1:16 ` [Patch v5 5/6] notmuch-new: backup tags before database upgrade David Bremner
2014-04-02  3:35   ` Austin Clements
2014-04-02  1:16 ` [Patch v5 6/6] test: verify tag backup generated by " David Bremner
2014-04-02  2:07 ` [Patch v5 2/6] dump: when given output file name, write atomically Austin Clements
2014-04-02 20:55   ` Austin Clements

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140402204337.GA4678@mit.edu \
    --to=amdragon@mit.edu \
    --cc=notmuch@notmuchmail.org \
    --cc=tomi.ollila@iki.fi \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).