From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!.POSTED!not-for-mail
From: Eli Zaretskii <eliz@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: libnettle/libhogweed WIP (was: request to reconsider
	libnettle/libhogweed)
Date: Thu, 16 Mar 2017 17:28:20 +0200
Message-ID: <83y3w5z1ez.fsf@gnu.org>
References: <CADtN0WLU+RyPjwur8gaCK8sXSnDCmnQnY_+e0-pjBajR-1n7MA@mail.gmail.com>
	<83a89gq3us.fsf@gnu.org> <87bmtjiv0w.fsf_-_@lifelogs.com>
	<83o9xjn06c.fsf@gnu.org> <87shmeb5ln.fsf_-_@lifelogs.com>
Reply-To: Eli Zaretskii <eliz@gnu.org>
NNTP-Posting-Host: blaine.gmane.org
X-Trace: blaine.gmane.org 1489678150 6627 195.159.176.226 (16 Mar 2017 15:29:10 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Thu, 16 Mar 2017 15:29:10 +0000 (UTC)
To: emacs-devel@gnu.org
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Mar 16 16:29:03 2017
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by blaine.gmane.org with esmtp (Exim 4.84_2)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1coXKk-0000fT-UD
	for ged-emacs-devel@m.gmane.org; Thu, 16 Mar 2017 16:28:59 +0100
Original-Received: from localhost ([::1]:44310 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1coXKq-0003vS-Ty
	for ged-emacs-devel@m.gmane.org; Thu, 16 Mar 2017 11:29:04 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:50068)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1coXKe-0003rn-Cr
	for emacs-devel@gnu.org; Thu, 16 Mar 2017 11:28:53 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1coXKZ-0003Ej-NJ
	for emacs-devel@gnu.org; Thu, 16 Mar 2017 11:28:52 -0400
Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:45004)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@gnu.org>)
	id 1coXKZ-0003EN-JQ
	for emacs-devel@gnu.org; Thu, 16 Mar 2017 11:28:47 -0400
Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:2583
	helo=home-c4e4a596f7)
	by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256)
	(Exim 4.82) (envelope-from <eliz@gnu.org>) id 1coXKY-0005qp-KL
	for emacs-devel@gnu.org; Thu, 16 Mar 2017 11:28:47 -0400
In-reply-to: <87shmeb5ln.fsf_-_@lifelogs.com> (message from Ted Zlatanov on
	Wed, 15 Mar 2017 17:19:32 -0400)
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 2001:4830:134:3::e
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel/>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: "Emacs-devel" <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Xref: news.gmane.org gmane.emacs.devel:213074
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/213074>

> From: Ted Zlatanov <tzz@lifelogs.com>
> Date: Wed, 15 Mar 2017 17:19:32 -0400
> 
> I did some work, at least to shut up the warnings, but it's far from
> ready. I pushed it into the branch scratch/tzz/nettle for now. If you
> could take a look, that would be very helpful.

I took a look, comments below.

> * maybe there's a better way than using the NETTLE_STRING_EQUAL_UNIBYTE
>   macro to compare Emacs and C strings?

I think this entire issue should be rethought, see the comments below.

> * several times I've forced casts to shut up warnings, and generally
>   could use some help converting between C bytes and the ELisp strings

If you agree with my comments below, most, if not all, of this issue
should go away.

> * data is not wiped after use

I'm not an expert, but AFAIU this is a serious flaw in
security-related software.  Should this be optional (as it probably
has non-negligible run-time costs)?

> * it compiles but doesn't link: "error adding symbols: DSO missing from
>   command line" which I hope is something trivial

Did the -lhogweed -lnettle switches appear on the link command line?

Here are some comments to the patch:

First, some general observations:

 . the patch is missing additions to the manual and NEWS
 . support for dynamic loading on MS-Windows should be added
 . did you consider exposing this functionality through corresponding
   GnuTLS functions?

Specific comments follow:

> +  # Windows loads libnettle dynamically
> +  if test "${opsys}" = "mingw32"; then
> +    LIBNETTLE_LIBS=
> +  else
> +    CFLAGS="$CFLAGS $LIBNETTLE_CFLAGS"
> +    LIBS="$LIBNETTLE_LIBS $LIBS"
> +  fi

CFLAGS should be set for the Windows build as well.  Only LIBS should
not be added.

> diff --git a/lisp/nettle.el b/lisp/nettle.el
> new file mode 100644
> index 0000000000..6f49c3f031
> --- /dev/null
> +++ b/lisp/nettle.el
> @@ -0,0 +1,335 @@
> +;;; nettle.el --- Interface to libnettle/libhogweed  -*- lexical-binding: t; -*-

Actually, this does not seem to be an interface at all.  What you have
here are defcustom that seems not to be used anywhere, a bunch of
diagnostic functions useful for debugging, and tests that should be
moved elsewhere anyway.  Do we really need this file?

> +;; Copyright (C) 2013  Teodor Zlatanov

This should cite the FSF as the copyright holder, I think.

> @@ -1369,6 +1373,10 @@ Using an Emacs configured with --with-x-toolkit=lucid does not have this problem
>    globals_of_kqueue ();
>  #endif
 
> +#ifdef HAVE_NETTLE
> +      syms_of_nettle ();
> +#endif

syms_of_nettle should be called only if !initialized.

> +#ifdef HAVE_NETTLE
> +
> +#include "nettle.h"
> +
> +DEFUN ("nettle-available-p", Fnettle_available_p, Snettle_available_p, 0, 0, 0,
> +       doc: /* Return t if Nettle is available in this instance of Emacs.
> +
> +The Nettle integration binds Emacs with libnettle and libhogweed.
> +See http://www.lysator.liu.se/~nisse/nettle for more on Nettle.  */)
> +     (void)
> +{
> +  return Qt;
> +}

This function should be outside of the "#ifdef HAVE_NETTLE"
conditional, otherwise it will not be available in an Emacs built
without the library, and Lisp programs will need to use fboundp
instead, which all but defeats the purpose of the function.

> +DEFUN ("nettle-hash", Fnettle_hash, Snettle_hash, 2, 2, 0,
> +       doc: /* Hash INPUT string with HASH-METHOD into a unibyte string.

Here and elsewhere, the doc string should explicitly tell that INPUT
must be a unibyte string.

A design question: would it make sense to support vectors as INPUT,
here and in the rest of the functions?

Another design question: should be support buffer regions, instead of
strings, as input to these functions?  The way your code is written,
Lisp programs must always cons a string from buffer text, before they
invoke these functions.  This could be gratuitous cost in some use
cases.

> +  (Lisp_Object input, Lisp_Object hash_method)
> +{
> +  Lisp_Object ret = Qnil;
> +
> +  CHECK_STRING (input);
> +  CHECK_STRING (hash_method);
> +
> +  for (int i = 0; NULL != nettle_hashes[i]; i++)
> +    {
> +      if (NETTLE_STRING_EQUAL_UNIBYTE (hash_method, nettle_hashes[i]->name))

I don't think it's a good idea to use strings for methods and other
such fixed parameters in your patch.  We usually use symbols for that.
The advantage of symbols is that you can test equality with EQ,
instead of the costly NETTLE_STRING_EQUAL_UNIBYTE, or any equivalent
code which compares strings.

> +          const struct nettle_hash *alg = nettle_hashes[i];
> +          unsigned int length = alg->digest_size;
> +          void *ctx = xzalloc (alg->context_size);
> +          uint8_t *digest;
> +          ctx = xzalloc (alg->context_size);
> +          alg->init (ctx);
> +          alg->update (ctx, SCHARS (input), SDATA (input));
> +
> +          digest = xzalloc (length);
> +          alg->digest (ctx, length, digest);
> +
> +          ret = make_unibyte_string ((const char*) digest, length);
> +
> +          xfree (digest);
> +          xfree (ctx);

Here and elsewhere, the size of the result is known in advance, so I
would avoid allocating a scratch buffer and then copying its data
(inside make_unibyte_string) into a newly-allocated string.  Instead,
use make_uninit_string, and then pass its string data pointer to the
algorithm that produces the digest.

> +          uint8_t *digest;
             ^^^^^^^
Why not 'unsigned char'?

> +DEFUN ("nettle-pbkdf2", Fnettle_pbkdf2, Snettle_pbkdf2, 5, 5, 0,
> +       doc: /* Make PBKDF2 of HASH-LENGTH from KEY with HASH-METHOD using ITERATIONS and SALT.
> +
> +The PBKDF2 data is stored in a unibyte string according to RFC 2898.
> +The list of hash-methods can be obtained with `nettle-hashes`.
> +The `sha1' and `sha256' hashing methods are most common and only supported now.  */)

This doc string doesn't document the ITERATIONS and SALT arguments.

> +DEFUN ("nettle-rsa-verify", Fnettle_rsa_verify, Snettle_rsa_verify, 4, 4, 0,
> +       doc: /* Verify the RSA SIGNATURE of DATA with PUBLIC-KEY and HASH-METHOD.
> +
> +The list of hash-methods can be obtained with `nettle-hashes`.
> +Only the `md5', `sha1', `sha256', and `sha512' hashing methods are supported.  */)

This doc string doesn't describe some of its argument, and doesn't
tell their data types.

> +DEFUN ("nettle-crypt", Fnettle_crypt, Snettle_crypt, 6, 6, 0,
> +       doc: /* Encrypt or decrypt INPUT in CRYPT-MODE with KEY, CIPHER, CIPHER-MODE, and IV.
> +
> +The INPUT will be zero-padded to be a multiple of the cipher's block size.
> +
> +The KEY will be zero-padded to the cipher's key size and will be
> +trimmed if it exceeds that key size.
> +
> +The list of ciphers can be obtained with `nettle-ciphers`.
> +The list of cipher modes can be obtained with `nettle-cipher-modes`.
> +The `aes256' cipher method is probably best for general use.
> +The `twofish256' cipher method may be better if you want to avoid NIST ciphers.  */)

This doesn't document the argument IV, and doesn't tell the data types
of the arguments.

> +  mode = CAR_SAFE (Fmember (cipher_mode, Fnettle_cipher_modes ()));

Wouldn't it be less expensive to access the data on the C level,
without consing a list?

> +          /* Increment input_length to the next multiple of block_size.  */
> +          if (0 != alg->block_size)
> +            {
> +              while (0 != (input_length % alg->block_size))
> +                {
> +                  input_length++;
> +                }
> +            }

I think there are more efficient ways of doing this, which don't need
an explicit loop.

> +          dest = xzalloc (input_length);
> +          memcpy (dest, SDATA (input), SCHARS (input));
> +          // message("Dest buffer: '%s' and size is %d", dest, input_length);

I don't understand why you copy the data here, instead of passing the
algorithm a pointer to the original string's data.

> +          key_hold = xzalloc (alg->key_size);
> +          memcpy (key_hold, SDATA (key), min (alg->key_size, SCHARS (key)));

Likewise here.  In addition, since you are using 'min', shouldn't we
signal an error if KEY is too long?

> +          if (0 == NETTLE_STRING_EQUAL_UNIBYTE (mode, "CBC"))
> +            {
> +              if (decrypt)
> +                {
> +                  cbc_decrypt (ctx, alg->decrypt,
> +                               alg->block_size, iv_hold,
> +                               input_length, dest, dest);
> +                }
> +              else
> +                {
> +                  cbc_encrypt (ctx, alg->encrypt,
> +                               alg->block_size, iv_hold,
> +                               input_length, dest, dest);
> +                }
> +            }
> +          else if (ctr_mode)
> +            {
> +              ctr_crypt (ctx, alg->encrypt,
> +                         alg->block_size, iv_hold,
> +                         input_length, dest, dest);
> +            }
> +          else
> +            {
> +              error ("Unexpected error: Nettle cipher mode %s was not found", SDATA (mode));
> +            }
> +
> +          // message("-> produced (%d) '%s'", input_length, dest);
> +          ret = make_unibyte_string ((const char*) dest, input_length);

Once again, since the length is known in advance, producing output
into an allocated buffer and then creating a Lisp string by copying
that buffer is wasteful.  It is better to produce output directly into
a string's data.

Thanks again for working on this.