unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: David Bremner <david@tethera.net>
To: David Bremner <david@tethera.net>, notmuch@notmuchmail.org
Subject: [PATCH 1/9] util: add unicode_word_utf8
Date: Sun, 28 Apr 2019 20:10:41 -0300	[thread overview]
Message-ID: <20190428231049.15737-2-david@tethera.net> (raw)
In-Reply-To: <20190428231049.15737-1-david@tethera.net>

This originally use Xapian::Unicode::is_wordchar, but that forces
clients to link directly to libxapian, which seems like it might be
busywork if nothing else.
---
 util/Makefile.local |  3 ++-
 util/unicode-util.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 util/unicode-util.h | 12 ++++++++++++
 3 files changed, 57 insertions(+), 1 deletion(-)
 create mode 100644 util/unicode-util.c
 create mode 100644 util/unicode-util.h

diff --git a/util/Makefile.local b/util/Makefile.local
index ba03230e..46f8af3a 100644
--- a/util/Makefile.local
+++ b/util/Makefile.local
@@ -5,7 +5,8 @@ extra_cflags += -I$(srcdir)/$(dir)
 
 libnotmuch_util_c_srcs := $(dir)/xutil.c $(dir)/error_util.c $(dir)/hex-escape.c \
 		  $(dir)/string-util.c $(dir)/talloc-extra.c $(dir)/zlib-extra.c \
-		$(dir)/util.c $(dir)/gmime-extra.c $(dir)/crypto.c
+		$(dir)/util.c $(dir)/gmime-extra.c $(dir)/crypto.c \
+		$(dir)/unicode-util.c
 
 libnotmuch_util_modules := $(libnotmuch_util_c_srcs:.c=.o)
 
diff --git a/util/unicode-util.c b/util/unicode-util.c
new file mode 100644
index 00000000..28ce6001
--- /dev/null
+++ b/util/unicode-util.c
@@ -0,0 +1,43 @@
+#include "unicode-util.h"
+
+/* Based on Xapian::Unicode::is_wordchar, to avoid forcing clients to
+   link directly to libxapian.
+*/
+
+static bool
+unicode_is_wordchar (notmuch_unichar ch)
+{
+    switch (g_unichar_type (ch)) {
+    case G_UNICODE_UPPERCASE_LETTER:
+    case G_UNICODE_LOWERCASE_LETTER:
+    case G_UNICODE_TITLECASE_LETTER:
+    case G_UNICODE_MODIFIER_LETTER:
+    case G_UNICODE_OTHER_LETTER:
+    case G_UNICODE_NON_SPACING_MARK:
+    case G_UNICODE_ENCLOSING_MARK:
+    case G_UNICODE_SPACING_MARK:
+    case G_UNICODE_DECIMAL_NUMBER:
+    case G_UNICODE_LETTER_NUMBER:
+    case G_UNICODE_OTHER_NUMBER:
+    case G_UNICODE_CONNECT_PUNCTUATION:
+	return true;
+    default:
+	return false;
+    }
+}
+
+bool
+unicode_word_utf8 (const char *utf8_str)
+{
+    gunichar *decoded=g_utf8_to_ucs4_fast (utf8_str, -1, NULL);
+    const gunichar *p = decoded;
+    bool ret;
+
+    while (*p && unicode_is_wordchar (*p))
+	p++;
+
+    ret =  (*p == '\0');
+
+    g_free (decoded);
+    return ret;
+}
diff --git a/util/unicode-util.h b/util/unicode-util.h
new file mode 100644
index 00000000..32d1e6ef
--- /dev/null
+++ b/util/unicode-util.h
@@ -0,0 +1,12 @@
+#ifndef UNICODE_UTIL_H
+#define UNICODE_UTIL_H
+
+#include <stdbool.h>
+#include <gmodule.h>
+
+/* The utf8 encoded string would tokenize as a single word, according
+ * to xapian. */
+bool unicode_word_utf8 (const char *str);
+typedef gunichar notmuch_unichar;
+
+#endif
-- 
2.20.1

  reply	other threads:[~2019-04-28 23:11 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-27 11:16 Index user defined headers David Bremner
2019-03-27 11:16 ` [PATCH 1/9] util: add unicode_word_utf8 David Bremner
2019-03-27 11:16 ` [PATCH 2/9] cli/config: refactor _stored_in_db David Bremner
2019-03-27 11:16 ` [PATCH 3/9] cli/config: support user header index config David Bremner
2019-03-27 11:16 ` [PATCH 4/9] cli/config: check syntax of user configured field names David Bremner
2019-03-27 11:16 ` [PATCH 5/9] lib: setup user headers in query parser David Bremner
2019-03-27 11:16 ` [PATCH 6/9] lib: cache user prefixes in database object David Bremner
2019-03-27 11:16 ` [PATCH 7/9] lib: support user prefix names in term generation David Bremner
2019-03-27 11:16 ` [PATCH 8/9] lib/database: index user headers David Bremner
2019-03-27 11:16 ` [PATCH 9/9] doc: document user header indexing David Bremner
2019-04-26 11:15 ` Index user defined headers David Bremner
2019-05-25 10:38   ` David Bremner
2019-04-28 23:10 ` Index user defined headers v2 David Bremner
2019-04-28 23:10   ` David Bremner [this message]
2019-04-28 23:10   ` [PATCH 2/9] cli/config: refactor _stored_in_db David Bremner
2019-04-28 23:10   ` [PATCH 3/9] cli/config: support user header index config David Bremner
2019-04-28 23:10   ` [PATCH 4/9] cli/config: check syntax of user configured field names David Bremner
2019-04-28 23:10   ` [PATCH 5/9] lib: setup user headers in query parser David Bremner
2019-04-28 23:10   ` [PATCH 6/9] lib: cache user prefixes in database object David Bremner
2019-04-28 23:10   ` [PATCH 7/9] lib: support user prefix names in term generation David Bremner
2019-04-28 23:10   ` [PATCH 8/9] lib/database: index user headers David Bremner
2019-04-28 23:10   ` [PATCH 9/9] doc: document user header indexing David Bremner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190428231049.15737-2-david@tethera.net \
    --to=david@tethera.net \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).