unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#45246: 28.0.50; etags assertion error
@ 2020-12-14 23:38 Gregor Zattler
  2022-06-07 11:35 ` Lars Ingebrigtsen
  0 siblings, 1 reply; 17+ messages in thread
From: Gregor Zattler @ 2020-12-14 23:38 UTC (permalink / raw)
  To: 45246

[-- Attachment #1: Type: text/plain, Size: 1473 bytes --]

Dear emacs developers,

I use emacs Configured using:
 'configure -C --with-file-notification=inotify --with-cairo
 --without-toolkit-scroll-bars --with-x-toolkit=lucid
 --with-sound=yes --without-gconf --with-mailutils
 --with-x=yes --enable-checking=yes
 --enable-check-lisp-object-type=yes --with-nativecomp
 'CFLAGS=-g -O2
 -fdebug-prefix-map=/home/grfz/src/emacs-feature_native-comp=. -fstack-protector-strong
 -Wformat -Werror=format-security -Wall -fno-pie'
 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2 '
 'LDFLAGS=-Wl,-z,relro -no-pie''



and I get an assertion error when executing the following line:

~/src$ find . -type f -print0 | egrep -zZ -- '(\.el|\.c|\.h)(\.gz)?$' | xargs -0IXXXXX sh -c "/home/grfz/src/emacs-master/lib-src/etags XXXXX || echo XXXXX"
etags: etags.c:4153: C_entries: Assertion `bracelev == typdefbracelev' failed.
Aborted
./xapian-core-1.4.17/include/xapian/unicode.h
etags: etags.c:4153: C_entries: Assertion `bracelev == typdefbracelev' failed.
Aborted
./xapian-core-1.4.17/debian/tmp/usr/include/xapian/unicode.h
etags: etags.c:4153: C_entries: Assertion `bracelev == typdefbracelev' failed.
Aborted
./xapian-core-1.4.17/debian/libxapian-dev/usr/include/xapian/unicode.h

The file in question is attached to this email.


I do not get an assertion error if I use
/usr/bin/etags.emacs --version ./xapian-core-1.4.17/include/xapian/unicode.h

This etags binary is from the debian buster distribution.


Thanks for your attention, Gregor



[-- Attachment #2: unicode.h --]
[-- Type: text/plain, Size: 15207 bytes --]

/** @file unicode.h
 * @brief Unicode and UTF-8 related classes and functions.
 */
/* Copyright (C) 2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2019 Olly Betts
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
 */

#ifndef XAPIAN_INCLUDED_UNICODE_H
#define XAPIAN_INCLUDED_UNICODE_H

#if !defined XAPIAN_IN_XAPIAN_H && !defined XAPIAN_LIB_BUILD
# error Never use <xapian/unicode.h> directly; include <xapian.h> instead.
#endif

#include <xapian/attributes.h>
#include <xapian/visibility.h>

#include <string>

namespace Xapian {

/** An iterator which returns Unicode character values from a UTF-8 encoded
 *  string.
 */
class XAPIAN_VISIBILITY_DEFAULT Utf8Iterator {
    const unsigned char* p;
    const unsigned char* end;
    mutable unsigned seqlen;

    bool XAPIAN_NOTHROW(calculate_sequence_length() const);

    unsigned get_char() const;

    Utf8Iterator(const unsigned char* p_,
		 const unsigned char* end_,
		 unsigned seqlen_)
	: p(p_), end(end_), seqlen(seqlen_) { }

  public:
    /** Return the raw const char* pointer for the current position. */
    const char* raw() const {
	return reinterpret_cast<const char*>(p ? p : end);
    }

    /** Return the number of bytes left in the iterator's buffer. */
    size_t left() const { return p ? end - p : 0; }

    /** Assign a new string to the iterator.
     *
     *  The iterator will forget the string it was iterating through, and
     *  return characters from the start of the new string when next called.
     *  The string is not copied into the iterator, so it must remain valid
     *  while the iteration is in progress.
     *
     *  @param p_ A pointer to the start of the string to read.
     *
     *  @param len The length of the string to read.
     */
    void assign(const char* p_, size_t len) {
	if (len) {
	    p = reinterpret_cast<const unsigned char*>(p_);
	    end = p + len;
	    seqlen = 0;
	} else {
	    p = NULL;
	}
    }

    /** Assign a new string to the iterator.
     *
     *  The iterator will forget the string it was iterating through, and
     *  return characters from the start of the new string when next called.
     *  The string is not copied into the iterator, so it must remain valid
     *  while the iteration is in progress.
     *
     *  @param s The string to read.  Must not be modified while the iteration
     *		 is in progress.
     */
    void assign(const std::string& s) { assign(s.data(), s.size()); }

    /** Create an iterator given a pointer to a null terminated string.
     *
     *  The iterator will return characters from the start of the string when
     *  next called.  The string is not copied into the iterator, so it must
     *  remain valid while the iteration is in progress.
     *
     *  @param p_ A pointer to the start of the null terminated string to read.
     */
    explicit Utf8Iterator(const char* p_);

    /** Create an iterator given a pointer and a length.
     *
     *  The iterator will return characters from the start of the string when
     *  next called.  The string is not copied into the iterator, so it must
     *  remain valid while the iteration is in progress.
     *
     *  @param p_ A pointer to the start of the string to read.
     *
     *  @param len The length of the string to read.
     */
    Utf8Iterator(const char* p_, size_t len) { assign(p_, len); }

    /** Create an iterator given a string.
     *
     *  The iterator will return characters from the start of the string when
     *  next called.  The string is not copied into the iterator, so it must
     *  remain valid while the iteration is in progress.
     *
     *  @param s The string to read.  Must not be modified while the iteration
     *		 is in progress.
     */
    Utf8Iterator(const std::string& s) { assign(s.data(), s.size()); }

    /** Create an iterator which is at the end of its iteration.
     *
     *  This can be compared to another iterator to check if the other iterator
     *  has reached its end.
     */
    XAPIAN_NOTHROW(Utf8Iterator())
	: p(NULL), end(0), seqlen(0) { }

    /** Get the current Unicode character value pointed to by the iterator.
     *
     *  If an invalid UTF-8 sequence is encountered, then the byte values
     *  comprising it are returned until valid UTF-8 or the end of the input is
     *  reached.
     *
     *  Returns unsigned(-1) if the iterator has reached the end of its buffer.
     */
    unsigned XAPIAN_NOTHROW(operator*() const) XAPIAN_PURE_FUNCTION;

    /** @private @internal Get the current Unicode character
     *  value pointed to by the iterator.
     *
     *  If an invalid UTF-8 sequence is encountered, then the byte values
     *  comprising it are returned with the top bit set (so the caller can
     *  differentiate these from the same values arising from valid UTF-8)
     *  until valid UTF-8 or the end of the input is reached.
     *
     *  Returns unsigned(-1) if the iterator has reached the end of its buffer.
     */
    unsigned XAPIAN_NOTHROW(strict_deref() const) XAPIAN_PURE_FUNCTION;

    /** Move forward to the next Unicode character.
     *
     *  @return An iterator pointing to the position before the move.
     */
    Utf8Iterator operator++(int) {
	// If we've not calculated seqlen yet, do so.
	if (seqlen == 0) calculate_sequence_length();
	const unsigned char* old_p = p;
	unsigned old_seqlen = seqlen;
	p += seqlen;
	if (p == end) p = NULL;
	seqlen = 0;
	return Utf8Iterator(old_p, end, old_seqlen);
    }

    /** Move forward to the next Unicode character.
     *
     *  @return A reference to this object.
     */
    Utf8Iterator& operator++() {
	if (seqlen == 0) calculate_sequence_length();
	p += seqlen;
	if (p == end) p = NULL;
	seqlen = 0;
	return *this;
    }

    /** Test two Utf8Iterators for equality.
     *
     *  @param other	The Utf8Iterator to compare this one with.
     *  @return true iff the iterators point to the same position.
     */
    bool XAPIAN_NOTHROW(operator==(const Utf8Iterator& other) const) {
	return p == other.p;
    }

    /** Test two Utf8Iterators for inequality.
     *
     *  @param other	The Utf8Iterator to compare this one with.
     *  @return true iff the iterators do not point to the same position.
     */
    bool XAPIAN_NOTHROW(operator!=(const Utf8Iterator& other) const) {
	return p != other.p;
    }

    /// We implement the semantics of an STL input_iterator.
    //@{
    typedef std::input_iterator_tag iterator_category;
    typedef unsigned value_type;
    typedef size_t difference_type;
    typedef const unsigned* pointer;
    typedef const unsigned& reference;
    //@}
};

/// Functions associated with handling Unicode characters.
namespace Unicode {

/** Each Unicode character is in exactly one of these categories.
 *
 * The Unicode standard calls this the "General Category", and uses a
 * "Major, minor" convention to derive a two letter code.
 */
typedef enum {
    UNASSIGNED,                         /**< Other, not assigned (Cn) */
    UPPERCASE_LETTER,                   /**< Letter, uppercase (Lu) */
    LOWERCASE_LETTER,                   /**< Letter, lowercase (Ll) */
    TITLECASE_LETTER,                   /**< Letter, titlecase (Lt) */
    MODIFIER_LETTER,                    /**< Letter, modifier (Lm) */
    OTHER_LETTER,                       /**< Letter, other (Lo) */
    NON_SPACING_MARK,                   /**< Mark, nonspacing (Mn) */
    ENCLOSING_MARK,                     /**< Mark, enclosing (Me) */
    COMBINING_SPACING_MARK,             /**< Mark, spacing combining (Mc) */
    DECIMAL_DIGIT_NUMBER,               /**< Number, decimal digit (Nd) */
    LETTER_NUMBER,                      /**< Number, letter (Nl) */
    OTHER_NUMBER,                       /**< Number, other (No) */
    SPACE_SEPARATOR,                    /**< Separator, space (Zs) */
    LINE_SEPARATOR,                     /**< Separator, line (Zl) */
    PARAGRAPH_SEPARATOR,                /**< Separator, paragraph (Zp) */
    CONTROL,                            /**< Other, control (Cc) */
    FORMAT,                             /**< Other, format (Cf) */
    PRIVATE_USE,                        /**< Other, private use (Co) */
    SURROGATE,                          /**< Other, surrogate (Cs) */
    CONNECTOR_PUNCTUATION,              /**< Punctuation, connector (Pc) */
    DASH_PUNCTUATION,                   /**< Punctuation, dash (Pd) */
    OPEN_PUNCTUATION,                   /**< Punctuation, open (Ps) */
    CLOSE_PUNCTUATION,                  /**< Punctuation, close (Pe) */
    INITIAL_QUOTE_PUNCTUATION,          /**< Punctuation, initial quote (Pi) */
    FINAL_QUOTE_PUNCTUATION,            /**< Punctuation, final quote (Pf) */
    OTHER_PUNCTUATION,                  /**< Punctuation, other (Po) */
    MATH_SYMBOL,                        /**< Symbol, math (Sm) */
    CURRENCY_SYMBOL,                    /**< Symbol, currency (Sc) */
    MODIFIER_SYMBOL,                    /**< Symbol, modified (Sk) */
    OTHER_SYMBOL                        /**< Symbol, other (So) */
} category;

namespace Internal {
    /** @private @internal Extract the information about a character from the
     *  Unicode character tables.
     *
     *  Characters outside of the Unicode range (i.e. ch >= 0x110000) are
     *  treated as UNASSIGNED with no case variants.
     */
    XAPIAN_VISIBILITY_DEFAULT
    int XAPIAN_NOTHROW(get_character_info(unsigned ch)) XAPIAN_CONST_FUNCTION;

    /** @private @internal Extract how to convert the case of a Unicode
     *  character from its info.
     */
    inline int get_case_type(int info) { return ((info & 0xe0) >> 5); }

    /** @private @internal Extract the category of a Unicode character from its
     *  info.
     */
    inline category get_category(int info) {
	return static_cast<category>(info & 0x1f);
    }

    /** @private @internal Extract the delta to use for case conversion of a
     *  character from its info.
     */
    inline int get_delta(int info) {
	/* It's implementation defined if sign extension happens when right
	 * shifting a signed int, although in practice sign extension is what
	 * most compilers implement.
	 *
	 * Some compilers are smart enough to spot common idioms for sign
	 * extension, but not all (e.g. GCC < 7 doesn't spot the one used
	 * below), so check what the implementation-defined behaviour is with
	 * a constant conditional which should get optimised away.
	 *
	 * We use the ternary operator here to avoid various compiler
	 * warnings which writing this as an `if` results in.
	 */
	return ((-1 >> 1) == -1 ?
		// Right shift sign-extends.
		info >> 8 :
		// Right shift shifts in zeros so bitwise-not before and after
		// the shift for negative values.
		(info >= 0) ? (info >> 8) : (~(~info >> 8)));
    }
}

/** Convert a single non-ASCII Unicode character to UTF-8.
 *
 *  This is intended mainly as a helper method for to_utf8().
 *
 *  @param ch	The character (which must be > 128) to write to @a buf.
 *  @param buf	The buffer to write the character to - it must have
 *		space for (at least) 4 bytes.
 *
 *  @return	The length of the resultant UTF-8 character in bytes.
 */
XAPIAN_VISIBILITY_DEFAULT
unsigned nonascii_to_utf8(unsigned ch, char* buf);

/** Convert a single Unicode character to UTF-8.
 *
 *  @param ch	The character to write to @a buf.
 *  @param buf	The buffer to write the character to - it must have
 *		space for (at least) 4 bytes.
 *
 *  @return	The length of the resultant UTF-8 character in bytes.
 */
inline unsigned to_utf8(unsigned ch, char* buf) {
    if (ch < 128) {
	*buf = static_cast<unsigned char>(ch);
	return 1;
    }
    return Xapian::Unicode::nonascii_to_utf8(ch, buf);
}

/** Append the UTF-8 representation of a single Unicode character to a
 *  std::string.
 */
inline void append_utf8(std::string& s, unsigned ch) {
    char buf[4];
    s.append(buf, to_utf8(ch, buf));
}

/// Return the category which a given Unicode character falls into.
inline category get_category(unsigned ch) {
    return Internal::get_category(Internal::get_character_info(ch));
}

/// Test if a given Unicode character is "word character".
inline bool is_wordchar(unsigned ch) {
    const unsigned int WORDCHAR_MASK =
	    (1 << Xapian::Unicode::UPPERCASE_LETTER) |
	    (1 << Xapian::Unicode::LOWERCASE_LETTER) |
	    (1 << Xapian::Unicode::TITLECASE_LETTER) |
	    (1 << Xapian::Unicode::MODIFIER_LETTER) |
	    (1 << Xapian::Unicode::OTHER_LETTER) |
	    (1 << Xapian::Unicode::NON_SPACING_MARK) |
	    (1 << Xapian::Unicode::ENCLOSING_MARK) |
	    (1 << Xapian::Unicode::COMBINING_SPACING_MARK) |
	    (1 << Xapian::Unicode::DECIMAL_DIGIT_NUMBER) |
	    (1 << Xapian::Unicode::LETTER_NUMBER) |
	    (1 << Xapian::Unicode::OTHER_NUMBER) |
	    (1 << Xapian::Unicode::CONNECTOR_PUNCTUATION);
    return ((WORDCHAR_MASK >> get_category(ch)) & 1);
}

/// Test if a given Unicode character is a whitespace character.
inline bool is_whitespace(unsigned ch) {
    const unsigned int WHITESPACE_MASK =
	    (1 << Xapian::Unicode::CONTROL) | // For TAB, CR, LF, FF.
	    (1 << Xapian::Unicode::SPACE_SEPARATOR) |
	    (1 << Xapian::Unicode::LINE_SEPARATOR) |
	    (1 << Xapian::Unicode::PARAGRAPH_SEPARATOR);
    return ((WHITESPACE_MASK >> get_category(ch)) & 1);
}

/// Test if a given Unicode character is a currency symbol.
inline bool is_currency(unsigned ch) {
    return (get_category(ch) == Xapian::Unicode::CURRENCY_SYMBOL);
}

/// Convert a Unicode character to lowercase.
inline unsigned tolower(unsigned ch) {
    int info = Xapian::Unicode::Internal::get_character_info(ch);
    if (!(Internal::get_case_type(info) & 2))
	return ch;
    return ch + Internal::get_delta(info);
}

/// Convert a Unicode character to uppercase.
inline unsigned toupper(unsigned ch) {
    int info = Xapian::Unicode::Internal::get_character_info(ch);
    if (!(Internal::get_case_type(info) & 4))
	return ch;
    return ch - Internal::get_delta(info);
}

/// Convert a UTF-8 std::string to lowercase.
inline std::string
tolower(const std::string& term)
{
    std::string result;
    result.reserve(term.size());
    for (Utf8Iterator i(term); i != Utf8Iterator(); ++i) {
	append_utf8(result, tolower(*i));
    }
    return result;
}

/// Convert a UTF-8 std::string to uppercase.
inline std::string
toupper(const std::string& term)
{
    std::string result;
    result.reserve(term.size());
    for (Utf8Iterator i(term); i != Utf8Iterator(); ++i) {
	append_utf8(result, toupper(*i));
    }
    return result;
}

}

}

#endif // XAPIAN_INCLUDED_UNICODE_H

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#45246: 28.0.50; etags assertion error
  2020-12-14 23:38 bug#45246: 28.0.50; etags assertion error Gregor Zattler
@ 2022-06-07 11:35 ` Lars Ingebrigtsen
  2022-06-07 14:26   ` Gregor Zattler
  0 siblings, 1 reply; 17+ messages in thread
From: Lars Ingebrigtsen @ 2022-06-07 11:35 UTC (permalink / raw)
  To: Gregor Zattler; +Cc: 45246

Gregor Zattler <grfz@gmx.de> writes:

> and I get an assertion error when executing the following line:
>
> ~/src$ find . -type f -print0 | egrep -zZ -- '(\.el|\.c|\.h)(\.gz)?$'
> | xargs -0IXXXXX sh -c "/home/grfz/src/emacs-master/lib-src/etags
> XXXXX || echo XXXXX"
> etags: etags.c:4153: C_entries: Assertion `bracelev == typdefbracelev' failed.
> Aborted

(I'm going through old bug reports that unfortunately weren't resolved
at the time.)

I tried saying

etags unicode.h

on the supplied file, but I didn't see any assertion errors, either with
the etags from Emacs 28 or 29.

Do you still see this problem in recent Emacs versions?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#45246: 28.0.50; etags assertion error
  2022-06-07 11:35 ` Lars Ingebrigtsen
@ 2022-06-07 14:26   ` Gregor Zattler
  2022-06-07 15:58     ` Eli Zaretskii
  0 siblings, 1 reply; 17+ messages in thread
From: Gregor Zattler @ 2022-06-07 14:26 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 45246

[-- Attachment #1: Type: text/plain, Size: 2594 bytes --]

Hi Lars,
* Lars Ingebrigtsen <larsi@gnus.org> [2022-06-07; 13:35]:
> Gregor Zattler <grfz@gmx.de> writes:
>
>> and I get an assertion error when executing the following line:
>>
>> ~/src$ find . -type f -print0 | egrep -zZ -- '(\.el|\.c|\.h)(\.gz)?$'
>> | xargs -0IXXXXX sh -c "/home/grfz/src/emacs-master/lib-src/etags
>> XXXXX || echo XXXXX"
>> etags: etags.c:4153: C_entries: Assertion `bracelev == typdefbracelev' failed.
>> Aborted
>
> (I'm going through old bug reports that unfortunately weren't resolved
> at the time.)
>
> I tried saying
>
> etags unicode.h
>
> on the supplied file, but I didn't see any assertion errors, either with
> the etags from Emacs 28 or 29.
>
> Do you still see this problem in recent Emacs versions?


Yes:

$ /home/grfz/src/emacs/lib-src/etags /usr/include/xapian/unicode.h
etags: etags.c:4188: C_entries: Assertion `bracelev == typdefbracelev' failed.
Aborted


This is on debian/bullseye.  etags was build in the same
process as this Emacs:



In GNU Emacs 29.0.50 (build 3, x86_64-pc-linux-gnu, X toolkit, cairo version 1.16.0)
 of 2022-05-15 built on no
Repository revision: b26574d7d7c458fec7494484ea5bceeed45f2f02
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12011000
System Description: Debian GNU/Linux 11 (bullseye)

Configured using:
 'configure -C --prefix=/usr/local/stow/emacs-snapshot
 --enable-locallisppath=/etc/emacs:/usr/local/share/emacs/29.0/site-lisp:/usr/local/share/emacs/site-lisp:/usr/share/emacs/29.0/site-lisp:/usr/share/emacs/site-lisp
 --with-sound=yes --without-gconf --with-mailutils --build
 x86_64-linux-gnu
 --infodir=/usr/local/share/info:/usr/share/info --with-json
 --with-file-notification=yes --with-cairo --with-x=yes
 --with-x-toolkit=lucid --without-toolkit-scroll-bars
 --enable-checking=yes,glyphs
 --enable-check-lisp-object-type --with-native-compilation
 'CFLAGS=-g3 -O3
 -ffile-prefix-map=/home/grfz/src/emacs=. -fstack-protector-strong
 -Wformat -Werror=format-security ''

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS
HARFBUZZ JPEG JSON LCMS2 LIBOTF LIBSELINUX LIBSYSTEMD
LIBXML2 M17N_FLT MODULES NATIVE_COMP NOTIFY INOTIFY PDUMPER
PNG RSVG SECCOMP SOUND THREADS TIFF X11 XAW3D XDBE XIM
XINPUT2 XPM LUCID ZLIB





Since there is no unicode.h under ~/src/ ATM, I used a
fifferent unicode.h file this time,.  It's attached.

For me this is not an important bug.  If you want to
investigate: Is there anything I can do to help you?

Ciao,
--
Gregor

[-- Attachment #2: unicode.h --]
[-- Type: text/plain, Size: 15197 bytes --]

/** @file
 * @brief Unicode and UTF-8 related classes and functions.
 */
/* Copyright (C) 2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2019 Olly Betts
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
 */

#ifndef XAPIAN_INCLUDED_UNICODE_H
#define XAPIAN_INCLUDED_UNICODE_H

#if !defined XAPIAN_IN_XAPIAN_H && !defined XAPIAN_LIB_BUILD
# error Never use <xapian/unicode.h> directly; include <xapian.h> instead.
#endif

#include <xapian/attributes.h>
#include <xapian/visibility.h>

#include <string>

namespace Xapian {

/** An iterator which returns Unicode character values from a UTF-8 encoded
 *  string.
 */
class XAPIAN_VISIBILITY_DEFAULT Utf8Iterator {
    const unsigned char* p;
    const unsigned char* end;
    mutable unsigned seqlen;

    bool XAPIAN_NOTHROW(calculate_sequence_length() const);

    unsigned get_char() const;

    Utf8Iterator(const unsigned char* p_,
		 const unsigned char* end_,
		 unsigned seqlen_)
	: p(p_), end(end_), seqlen(seqlen_) { }

  public:
    /** Return the raw const char* pointer for the current position. */
    const char* raw() const {
	return reinterpret_cast<const char*>(p ? p : end);
    }

    /** Return the number of bytes left in the iterator's buffer. */
    size_t left() const { return p ? end - p : 0; }

    /** Assign a new string to the iterator.
     *
     *  The iterator will forget the string it was iterating through, and
     *  return characters from the start of the new string when next called.
     *  The string is not copied into the iterator, so it must remain valid
     *  while the iteration is in progress.
     *
     *  @param p_ A pointer to the start of the string to read.
     *
     *  @param len The length of the string to read.
     */
    void assign(const char* p_, size_t len) {
	if (len) {
	    p = reinterpret_cast<const unsigned char*>(p_);
	    end = p + len;
	    seqlen = 0;
	} else {
	    p = NULL;
	}
    }

    /** Assign a new string to the iterator.
     *
     *  The iterator will forget the string it was iterating through, and
     *  return characters from the start of the new string when next called.
     *  The string is not copied into the iterator, so it must remain valid
     *  while the iteration is in progress.
     *
     *  @param s The string to read.  Must not be modified while the iteration
     *		 is in progress.
     */
    void assign(const std::string& s) { assign(s.data(), s.size()); }

    /** Create an iterator given a pointer to a null terminated string.
     *
     *  The iterator will return characters from the start of the string when
     *  next called.  The string is not copied into the iterator, so it must
     *  remain valid while the iteration is in progress.
     *
     *  @param p_ A pointer to the start of the null terminated string to read.
     */
    explicit Utf8Iterator(const char* p_);

    /** Create an iterator given a pointer and a length.
     *
     *  The iterator will return characters from the start of the string when
     *  next called.  The string is not copied into the iterator, so it must
     *  remain valid while the iteration is in progress.
     *
     *  @param p_ A pointer to the start of the string to read.
     *
     *  @param len The length of the string to read.
     */
    Utf8Iterator(const char* p_, size_t len) { assign(p_, len); }

    /** Create an iterator given a string.
     *
     *  The iterator will return characters from the start of the string when
     *  next called.  The string is not copied into the iterator, so it must
     *  remain valid while the iteration is in progress.
     *
     *  @param s The string to read.  Must not be modified while the iteration
     *		 is in progress.
     */
    Utf8Iterator(const std::string& s) { assign(s.data(), s.size()); }

    /** Create an iterator which is at the end of its iteration.
     *
     *  This can be compared to another iterator to check if the other iterator
     *  has reached its end.
     */
    XAPIAN_NOTHROW(Utf8Iterator())
	: p(NULL), end(0), seqlen(0) { }

    /** Get the current Unicode character value pointed to by the iterator.
     *
     *  If an invalid UTF-8 sequence is encountered, then the byte values
     *  comprising it are returned until valid UTF-8 or the end of the input is
     *  reached.
     *
     *  Returns unsigned(-1) if the iterator has reached the end of its buffer.
     */
    unsigned XAPIAN_NOTHROW(operator*() const) XAPIAN_PURE_FUNCTION;

    /** @private @internal Get the current Unicode character
     *  value pointed to by the iterator.
     *
     *  If an invalid UTF-8 sequence is encountered, then the byte values
     *  comprising it are returned with the top bit set (so the caller can
     *  differentiate these from the same values arising from valid UTF-8)
     *  until valid UTF-8 or the end of the input is reached.
     *
     *  Returns unsigned(-1) if the iterator has reached the end of its buffer.
     */
    unsigned XAPIAN_NOTHROW(strict_deref() const) XAPIAN_PURE_FUNCTION;

    /** Move forward to the next Unicode character.
     *
     *  @return An iterator pointing to the position before the move.
     */
    Utf8Iterator operator++(int) {
	// If we've not calculated seqlen yet, do so.
	if (seqlen == 0) calculate_sequence_length();
	const unsigned char* old_p = p;
	unsigned old_seqlen = seqlen;
	p += seqlen;
	if (p == end) p = NULL;
	seqlen = 0;
	return Utf8Iterator(old_p, end, old_seqlen);
    }

    /** Move forward to the next Unicode character.
     *
     *  @return A reference to this object.
     */
    Utf8Iterator& operator++() {
	if (seqlen == 0) calculate_sequence_length();
	p += seqlen;
	if (p == end) p = NULL;
	seqlen = 0;
	return *this;
    }

    /** Test two Utf8Iterators for equality.
     *
     *  @param other	The Utf8Iterator to compare this one with.
     *  @return true iff the iterators point to the same position.
     */
    bool XAPIAN_NOTHROW(operator==(const Utf8Iterator& other) const) {
	return p == other.p;
    }

    /** Test two Utf8Iterators for inequality.
     *
     *  @param other	The Utf8Iterator to compare this one with.
     *  @return true iff the iterators do not point to the same position.
     */
    bool XAPIAN_NOTHROW(operator!=(const Utf8Iterator& other) const) {
	return p != other.p;
    }

    /// We implement the semantics of an STL input_iterator.
    //@{
    typedef std::input_iterator_tag iterator_category;
    typedef unsigned value_type;
    typedef size_t difference_type;
    typedef const unsigned* pointer;
    typedef const unsigned& reference;
    //@}
};

/// Functions associated with handling Unicode characters.
namespace Unicode {

/** Each Unicode character is in exactly one of these categories.
 *
 * The Unicode standard calls this the "General Category", and uses a
 * "Major, minor" convention to derive a two letter code.
 */
typedef enum {
    UNASSIGNED,                         /**< Other, not assigned (Cn) */
    UPPERCASE_LETTER,                   /**< Letter, uppercase (Lu) */
    LOWERCASE_LETTER,                   /**< Letter, lowercase (Ll) */
    TITLECASE_LETTER,                   /**< Letter, titlecase (Lt) */
    MODIFIER_LETTER,                    /**< Letter, modifier (Lm) */
    OTHER_LETTER,                       /**< Letter, other (Lo) */
    NON_SPACING_MARK,                   /**< Mark, nonspacing (Mn) */
    ENCLOSING_MARK,                     /**< Mark, enclosing (Me) */
    COMBINING_SPACING_MARK,             /**< Mark, spacing combining (Mc) */
    DECIMAL_DIGIT_NUMBER,               /**< Number, decimal digit (Nd) */
    LETTER_NUMBER,                      /**< Number, letter (Nl) */
    OTHER_NUMBER,                       /**< Number, other (No) */
    SPACE_SEPARATOR,                    /**< Separator, space (Zs) */
    LINE_SEPARATOR,                     /**< Separator, line (Zl) */
    PARAGRAPH_SEPARATOR,                /**< Separator, paragraph (Zp) */
    CONTROL,                            /**< Other, control (Cc) */
    FORMAT,                             /**< Other, format (Cf) */
    PRIVATE_USE,                        /**< Other, private use (Co) */
    SURROGATE,                          /**< Other, surrogate (Cs) */
    CONNECTOR_PUNCTUATION,              /**< Punctuation, connector (Pc) */
    DASH_PUNCTUATION,                   /**< Punctuation, dash (Pd) */
    OPEN_PUNCTUATION,                   /**< Punctuation, open (Ps) */
    CLOSE_PUNCTUATION,                  /**< Punctuation, close (Pe) */
    INITIAL_QUOTE_PUNCTUATION,          /**< Punctuation, initial quote (Pi) */
    FINAL_QUOTE_PUNCTUATION,            /**< Punctuation, final quote (Pf) */
    OTHER_PUNCTUATION,                  /**< Punctuation, other (Po) */
    MATH_SYMBOL,                        /**< Symbol, math (Sm) */
    CURRENCY_SYMBOL,                    /**< Symbol, currency (Sc) */
    MODIFIER_SYMBOL,                    /**< Symbol, modified (Sk) */
    OTHER_SYMBOL                        /**< Symbol, other (So) */
} category;

namespace Internal {
    /** @private @internal Extract the information about a character from the
     *  Unicode character tables.
     *
     *  Characters outside of the Unicode range (i.e. ch >= 0x110000) are
     *  treated as UNASSIGNED with no case variants.
     */
    XAPIAN_VISIBILITY_DEFAULT
    int XAPIAN_NOTHROW(get_character_info(unsigned ch)) XAPIAN_CONST_FUNCTION;

    /** @private @internal Extract how to convert the case of a Unicode
     *  character from its info.
     */
    inline int get_case_type(int info) { return ((info & 0xe0) >> 5); }

    /** @private @internal Extract the category of a Unicode character from its
     *  info.
     */
    inline category get_category(int info) {
	return static_cast<category>(info & 0x1f);
    }

    /** @private @internal Extract the delta to use for case conversion of a
     *  character from its info.
     */
    inline int get_delta(int info) {
	/* It's implementation defined if sign extension happens when right
	 * shifting a signed int, although in practice sign extension is what
	 * most compilers implement.
	 *
	 * Some compilers are smart enough to spot common idioms for sign
	 * extension, but not all (e.g. GCC < 7 doesn't spot the one used
	 * below), so check what the implementation-defined behaviour is with
	 * a constant conditional which should get optimised away.
	 *
	 * We use the ternary operator here to avoid various compiler
	 * warnings which writing this as an `if` results in.
	 */
	return ((-1 >> 1) == -1 ?
		// Right shift sign-extends.
		info >> 8 :
		// Right shift shifts in zeros so bitwise-not before and after
		// the shift for negative values.
		(info >= 0) ? (info >> 8) : (~(~info >> 8)));
    }
}

/** Convert a single non-ASCII Unicode character to UTF-8.
 *
 *  This is intended mainly as a helper method for to_utf8().
 *
 *  @param ch	The character (which must be > 128) to write to @a buf.
 *  @param buf	The buffer to write the character to - it must have
 *		space for (at least) 4 bytes.
 *
 *  @return	The length of the resultant UTF-8 character in bytes.
 */
XAPIAN_VISIBILITY_DEFAULT
unsigned nonascii_to_utf8(unsigned ch, char* buf);

/** Convert a single Unicode character to UTF-8.
 *
 *  @param ch	The character to write to @a buf.
 *  @param buf	The buffer to write the character to - it must have
 *		space for (at least) 4 bytes.
 *
 *  @return	The length of the resultant UTF-8 character in bytes.
 */
inline unsigned to_utf8(unsigned ch, char* buf) {
    if (ch < 128) {
	*buf = static_cast<unsigned char>(ch);
	return 1;
    }
    return Xapian::Unicode::nonascii_to_utf8(ch, buf);
}

/** Append the UTF-8 representation of a single Unicode character to a
 *  std::string.
 */
inline void append_utf8(std::string& s, unsigned ch) {
    char buf[4];
    s.append(buf, to_utf8(ch, buf));
}

/// Return the category which a given Unicode character falls into.
inline category get_category(unsigned ch) {
    return Internal::get_category(Internal::get_character_info(ch));
}

/// Test if a given Unicode character is "word character".
inline bool is_wordchar(unsigned ch) {
    const unsigned int WORDCHAR_MASK =
	    (1 << Xapian::Unicode::UPPERCASE_LETTER) |
	    (1 << Xapian::Unicode::LOWERCASE_LETTER) |
	    (1 << Xapian::Unicode::TITLECASE_LETTER) |
	    (1 << Xapian::Unicode::MODIFIER_LETTER) |
	    (1 << Xapian::Unicode::OTHER_LETTER) |
	    (1 << Xapian::Unicode::NON_SPACING_MARK) |
	    (1 << Xapian::Unicode::ENCLOSING_MARK) |
	    (1 << Xapian::Unicode::COMBINING_SPACING_MARK) |
	    (1 << Xapian::Unicode::DECIMAL_DIGIT_NUMBER) |
	    (1 << Xapian::Unicode::LETTER_NUMBER) |
	    (1 << Xapian::Unicode::OTHER_NUMBER) |
	    (1 << Xapian::Unicode::CONNECTOR_PUNCTUATION);
    return ((WORDCHAR_MASK >> get_category(ch)) & 1);
}

/// Test if a given Unicode character is a whitespace character.
inline bool is_whitespace(unsigned ch) {
    const unsigned int WHITESPACE_MASK =
	    (1 << Xapian::Unicode::CONTROL) | // For TAB, CR, LF, FF.
	    (1 << Xapian::Unicode::SPACE_SEPARATOR) |
	    (1 << Xapian::Unicode::LINE_SEPARATOR) |
	    (1 << Xapian::Unicode::PARAGRAPH_SEPARATOR);
    return ((WHITESPACE_MASK >> get_category(ch)) & 1);
}

/// Test if a given Unicode character is a currency symbol.
inline bool is_currency(unsigned ch) {
    return (get_category(ch) == Xapian::Unicode::CURRENCY_SYMBOL);
}

/// Convert a Unicode character to lowercase.
inline unsigned tolower(unsigned ch) {
    int info = Xapian::Unicode::Internal::get_character_info(ch);
    if (!(Internal::get_case_type(info) & 2))
	return ch;
    return ch + Internal::get_delta(info);
}

/// Convert a Unicode character to uppercase.
inline unsigned toupper(unsigned ch) {
    int info = Xapian::Unicode::Internal::get_character_info(ch);
    if (!(Internal::get_case_type(info) & 4))
	return ch;
    return ch - Internal::get_delta(info);
}

/// Convert a UTF-8 std::string to lowercase.
inline std::string
tolower(const std::string& term)
{
    std::string result;
    result.reserve(term.size());
    for (Utf8Iterator i(term); i != Utf8Iterator(); ++i) {
	append_utf8(result, tolower(*i));
    }
    return result;
}

/// Convert a UTF-8 std::string to uppercase.
inline std::string
toupper(const std::string& term)
{
    std::string result;
    result.reserve(term.size());
    for (Utf8Iterator i(term); i != Utf8Iterator(); ++i) {
	append_utf8(result, toupper(*i));
    }
    return result;
}

}

}

#endif // XAPIAN_INCLUDED_UNICODE_H

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#45246: 28.0.50; etags assertion error
  2022-06-07 14:26   ` Gregor Zattler
@ 2022-06-07 15:58     ` Eli Zaretskii
  2022-06-07 16:38       ` Andreas Schwab
                         ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Eli Zaretskii @ 2022-06-07 15:58 UTC (permalink / raw)
  To: Gregor Zattler; +Cc: larsi, 45246

> Cc: 45246@debbugs.gnu.org
> From: Gregor Zattler <grfz@gmx.de>
> Date: Tue, 07 Jun 2022 16:26:42 +0200
> 
> > on the supplied file, but I didn't see any assertion errors, either with
> > the etags from Emacs 28 or 29.
> >
> > Do you still see this problem in recent Emacs versions?
> 
> 
> Yes:
> 
> $ /home/grfz/src/emacs/lib-src/etags /usr/include/xapian/unicode.h
> etags: etags.c:4188: C_entries: Assertion `bracelev == typdefbracelev' failed.
> Aborted

Lars, I guess you were trying this in an optimized build, where all
the assertions compile to nothing.

I see this here and will try to take a look when I have time.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#45246: 28.0.50; etags assertion error
  2022-06-07 15:58     ` Eli Zaretskii
@ 2022-06-07 16:38       ` Andreas Schwab
  2022-06-07 17:15         ` Eli Zaretskii
  2022-06-07 17:08       ` Eli Zaretskii
  2022-06-07 17:13       ` Lars Ingebrigtsen
  2 siblings, 1 reply; 17+ messages in thread
From: Andreas Schwab @ 2022-06-07 16:38 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, 45246, Gregor Zattler

On Jun 07 2022, Eli Zaretskii wrote:

> Lars, I guess you were trying this in an optimized build, where all
> the assertions compile to nothing.

It's not about optimized or not, it's controlled by --enable-checking.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#45246: 28.0.50; etags assertion error
  2022-06-07 15:58     ` Eli Zaretskii
  2022-06-07 16:38       ` Andreas Schwab
@ 2022-06-07 17:08       ` Eli Zaretskii
  2022-06-09 17:42         ` Eli Zaretskii
  2022-06-07 17:13       ` Lars Ingebrigtsen
  2 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2022-06-07 17:08 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, 45246, grfz

> Cc: larsi@gnus.org, 45246@debbugs.gnu.org
> Date: Tue, 07 Jun 2022 18:58:13 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> 
> > $ /home/grfz/src/emacs/lib-src/etags /usr/include/xapian/unicode.h
> > etags: etags.c:4188: C_entries: Assertion `bracelev == typdefbracelev' failed.
> > Aborted
> 
> Lars, I guess you were trying this in an optimized build, where all
> the assertions compile to nothing.
> 
> I see this here and will try to take a look when I have time.

A much smaller test case:

namespace Unicode {

typedef enum {
    UNASSIGNED,
    OTHER_SYMBOL
} category;

}

Hmm...





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#45246: 28.0.50; etags assertion error
  2022-06-07 15:58     ` Eli Zaretskii
  2022-06-07 16:38       ` Andreas Schwab
  2022-06-07 17:08       ` Eli Zaretskii
@ 2022-06-07 17:13       ` Lars Ingebrigtsen
  2 siblings, 0 replies; 17+ messages in thread
From: Lars Ingebrigtsen @ 2022-06-07 17:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 45246, Gregor Zattler

Eli Zaretskii <eliz@gnu.org> writes:

> Lars, I guess you were trying this in an optimized build, where all
> the assertions compile to nothing.

Yup.

> I see this here and will try to take a look when I have time.

Thanks.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#45246: 28.0.50; etags assertion error
  2022-06-07 16:38       ` Andreas Schwab
@ 2022-06-07 17:15         ` Eli Zaretskii
  2022-06-07 17:34           ` Andreas Schwab
  0 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2022-06-07 17:15 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: larsi, 45246, grfz

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: Gregor Zattler <grfz@gmx.de>,  larsi@gnus.org,  45246@debbugs.gnu.org
> Date: Tue, 07 Jun 2022 18:38:46 +0200
> 
> On Jun 07 2022, Eli Zaretskii wrote:
> 
> > Lars, I guess you were trying this in an optimized build, where all
> > the assertions compile to nothing.
> 
> It's not about optimized or not, it's controlled by --enable-checking.

In etags.c?  I see no ENABLE_CHECKING there.  I do see

 #include <assert.h>






^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#45246: 28.0.50; etags assertion error
  2022-06-07 17:15         ` Eli Zaretskii
@ 2022-06-07 17:34           ` Andreas Schwab
  2022-06-07 18:25             ` Eli Zaretskii
  0 siblings, 1 reply; 17+ messages in thread
From: Andreas Schwab @ 2022-06-07 17:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, 45246, grfz

On Jun 07 2022, Eli Zaretskii wrote:

> In etags.c?  I see no ENABLE_CHECKING there.

See src/conf_post.h.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#45246: 28.0.50; etags assertion error
  2022-06-07 17:34           ` Andreas Schwab
@ 2022-06-07 18:25             ` Eli Zaretskii
  0 siblings, 0 replies; 17+ messages in thread
From: Eli Zaretskii @ 2022-06-07 18:25 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: larsi, 45246, grfz

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: grfz@gmx.de,  larsi@gnus.org,  45246@debbugs.gnu.org
> Date: Tue, 07 Jun 2022 19:34:03 +0200
> 
> On Jun 07 2022, Eli Zaretskii wrote:
> 
> > In etags.c?  I see no ENABLE_CHECKING there.
> 
> See src/conf_post.h.

Right, thanks.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#45246: 28.0.50; etags assertion error
  2022-06-07 17:08       ` Eli Zaretskii
@ 2022-06-09 17:42         ` Eli Zaretskii
  2022-06-09 18:43           ` Lars Ingebrigtsen
  0 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2022-06-09 17:42 UTC (permalink / raw)
  To: grfz, larsi; +Cc: 45246

> Date: Tue, 07 Jun 2022 20:08:55 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: grfz@gmx.de, larsi@gnus.org, 45246@debbugs.gnu.org
> 
> > Cc: larsi@gnus.org, 45246@debbugs.gnu.org
> > Date: Tue, 07 Jun 2022 18:58:13 +0300
> > From: Eli Zaretskii <eliz@gnu.org>
> > 
> > > $ /home/grfz/src/emacs/lib-src/etags /usr/include/xapian/unicode.h
> > > etags: etags.c:4188: C_entries: Assertion `bracelev == typdefbracelev' failed.
> > > Aborted
> > 
> > Lars, I guess you were trying this in an optimized build, where all
> > the assertions compile to nothing.
> > 
> > I see this here and will try to take a look when I have time.
> 
> A much smaller test case:
> 
> namespace Unicode {
> 
> typedef enum {
>     UNASSIGNED,
>     OTHER_SYMBOL
> } category;
> 
> }
> 
> Hmm...

Heh, turns out it's a "feature": when etags sees a closing brace in
column zero, it by default assumes that's the final brace of a
function or a struct definition, so it resets the brace level.  As you
can see, the above test case (and the original Unicode.h) have the
closing brace of the "typedef enum" in column zero.  If you mark the
entire typedef and type "M-C-\", Emacs will indent it, and the problem
will go away.

"etags --help" says:

  -I, --ignore-indentation
	  In C and C++ do not assume that a closing brace in the first
	  column is the final brace of a function or structure definition.

And indeed, invoking "etags -I" compiled with --enable-checking with
the original file avoids the assertion violation.  And in a production
build, etags produces a valid TAGS file even if -I is omitted.

So I think there's nothing to do here, and we should close this bug as
notabug.  Does anyone disagree?





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#45246: 28.0.50; etags assertion error
  2022-06-09 17:42         ` Eli Zaretskii
@ 2022-06-09 18:43           ` Lars Ingebrigtsen
  2022-06-09 18:59             ` Eli Zaretskii
  0 siblings, 1 reply; 17+ messages in thread
From: Lars Ingebrigtsen @ 2022-06-09 18:43 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 45246, grfz

Eli Zaretskii <eliz@gnu.org> writes:

> "etags --help" says:
>
>   -I, --ignore-indentation
> 	  In C and C++ do not assume that a closing brace in the first
> 	  column is the final brace of a function or structure definition.
>
> And indeed, invoking "etags -I" compiled with --enable-checking with
> the original file avoids the assertion violation.  And in a production
> build, etags produces a valid TAGS file even if -I is omitted.
>
> So I think there's nothing to do here, and we should close this bug as
> notabug.  Does anyone disagree?

I think that sounds correct.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#45246: 28.0.50; etags assertion error
  2022-06-09 18:43           ` Lars Ingebrigtsen
@ 2022-06-09 18:59             ` Eli Zaretskii
  2022-06-09 22:33               ` Gregor Zattler
  0 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2022-06-09 18:59 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 45246, grfz

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: grfz@gmx.de,  45246@debbugs.gnu.org
> Date: Thu, 09 Jun 2022 20:43:04 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > "etags --help" says:
> >
> >   -I, --ignore-indentation
> > 	  In C and C++ do not assume that a closing brace in the first
> > 	  column is the final brace of a function or structure definition.
> >
> > And indeed, invoking "etags -I" compiled with --enable-checking with
> > the original file avoids the assertion violation.  And in a production
> > build, etags produces a valid TAGS file even if -I is omitted.
> >
> > So I think there's nothing to do here, and we should close this bug as
> > notabug.  Does anyone disagree?
> 
> I think that sounds correct.

Gregor, any objections to closing this bug?





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#45246: 28.0.50; etags assertion error
  2022-06-09 18:59             ` Eli Zaretskii
@ 2022-06-09 22:33               ` Gregor Zattler
  2022-06-10  7:25                 ` Eli Zaretskii
  2022-06-10  7:26                 ` Eli Zaretskii
  0 siblings, 2 replies; 17+ messages in thread
From: Gregor Zattler @ 2022-06-09 22:33 UTC (permalink / raw)
  To: Eli Zaretskii, Lars Ingebrigtsen; +Cc: 45246

Hi Eli, Lars,
* Eli Zaretskii <eliz@gnu.org> [2022-06-09; 21:59]:
>> From: Lars Ingebrigtsen <larsi@gnus.org>
>> Cc: grfz@gmx.de,  45246@debbugs.gnu.org
>> Date: Thu, 09 Jun 2022 20:43:04 +0200
>>
>> Eli Zaretskii <eliz@gnu.org> writes:
>>
>> > "etags --help" says:
>> >
>> >   -I, --ignore-indentation
>> > 	  In C and C++ do not assume that a closing brace in the first
>> > 	  column is the final brace of a function or structure definition.
>> >
>> > And indeed, invoking "etags -I" compiled with --enable-checking with
>> > the original file avoids the assertion violation.  And in a production
>> > build, etags produces a valid TAGS file even if -I is omitted.
>> >
>> > So I think there's nothing to do here, and we should close this bug as
>> > notabug.  Does anyone disagree?
>>
>> I think that sounds correct.

I confirm -I avoids the assertion.

> Gregor, any objections to closing this bug?

no.

I must admit, I did not read the man pager closely but
anyway I wouldn't have understood the consequences of
setting vs not setting -I.  Perhaps the documentation could
be amended somehow?  I assume this is a trade-of between
speed and robustness?


... Some highly unscientific tests:
I do not see a difference in speed neither between etags as
of emacs26 as it comes with debian bullseye with or without
-I.  The optimized build etags from emacs29 is a bit faster
with -I than etags as of emacs26 with or without -I.


I do not have objections to closing this bug
report, but I wonder why etags treats closing braces in
the first column special if it does not speed up things?





Ciao,
--
Gregor





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#45246: 28.0.50; etags assertion error
  2022-06-09 22:33               ` Gregor Zattler
@ 2022-06-10  7:25                 ` Eli Zaretskii
  2022-06-10  7:26                 ` Eli Zaretskii
  1 sibling, 0 replies; 17+ messages in thread
From: Eli Zaretskii @ 2022-06-10  7:25 UTC (permalink / raw)
  To: Gregor Zattler; +Cc: larsi, 45246-done

> From: Gregor Zattler <grfz@gmx.de>
> Cc: 45246@debbugs.gnu.org
> Date: Fri, 10 Jun 2022 00:33:07 +0200
> 
> >> > And indeed, invoking "etags -I" compiled with --enable-checking with
> >> > the original file avoids the assertion violation.  And in a production
> >> > build, etags produces a valid TAGS file even if -I is omitted.
> >> >
> >> > So I think there's nothing to do here, and we should close this bug as
> >> > notabug.  Does anyone disagree?
> >>
> >> I think that sounds correct.
> 
> I confirm -I avoids the assertion.
> 
> > Gregor, any objections to closing this bug?
> 
> no.

OK, done.

> I must admit, I did not read the man pager closely but
> anyway I wouldn't have understood the consequences of
> setting vs not setting -I.  Perhaps the documentation could
> be amended somehow?

I've added some notes about this to the manual and to the etags man
page, thanks.

> I assume this is a trade-of between speed and robustness?

No, I think it's more about the correctness of the produced TAGS file
than about speed.  etags's C/C++ parser is extremely naïve and largely
ignores the complicated syntax of the C dialects.  So using the
"closing brace in column zero ends all top-level definitions"
heuristic is useful for preventing 'etags' from being utterly confused
by some sophisticated use of C/C++ facilities, such as macros and the
more arcane syntactic constructs in modern C++: it makes sure the
confusion ends as early as possible.

> I do not have objections to closing this bug
> report, but I wonder why etags treats closing braces in
> the first column special if it does not speed up things?

See above.  Whether this is a real problem, I don't know.  I think
the only way to tell is to try.  At least with our test suite for
'etags', using -I causes regressions, e.g. in cp-src/c.C some tags are
not created.  So I think having this heuristic on by default is a good
thing, overall.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#45246: 28.0.50; etags assertion error
  2022-06-09 22:33               ` Gregor Zattler
  2022-06-10  7:25                 ` Eli Zaretskii
@ 2022-06-10  7:26                 ` Eli Zaretskii
  2022-06-10 14:01                   ` Francesco Potortì
  1 sibling, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2022-06-10  7:26 UTC (permalink / raw)
  To: Gregor Zattler; +Cc: larsi, 45246-done

> From: Gregor Zattler <grfz@gmx.de>
> Cc: 45246@debbugs.gnu.org
> Date: Fri, 10 Jun 2022 00:33:07 +0200
> 
> >> > And indeed, invoking "etags -I" compiled with --enable-checking with
> >> > the original file avoids the assertion violation.  And in a production
> >> > build, etags produces a valid TAGS file even if -I is omitted.
> >> >
> >> > So I think there's nothing to do here, and we should close this bug as
> >> > notabug.  Does anyone disagree?
> >>
> >> I think that sounds correct.
> 
> I confirm -I avoids the assertion.
> 
> > Gregor, any objections to closing this bug?
> 
> no.

OK, done.

> I must admit, I did not read the man pager closely but
> anyway I wouldn't have understood the consequences of
> setting vs not setting -I.  Perhaps the documentation could
> be amended somehow?

I've added some notes about this to the manual and to the etags man
page, thanks.

> I assume this is a trade-of between speed and robustness?

No, I think it's more about the correctness of the produced TAGS file
than about speed.  etags's C/C++ parser is extremely naïve and largely
ignores the complicated syntax of the C dialects.  So using the
"closing brace in column zero ends all top-level definitions"
heuristic is useful for preventing 'etags' from being utterly confused
by some sophisticated use of C/C++ facilities, such as macros and the
more arcane syntactic constructs in modern C++: it makes sure the
confusion ends as early as possible.

> I do not have objections to closing this bug
> report, but I wonder why etags treats closing braces in
> the first column special if it does not speed up things?

See above.  Whether this is a real problem, I don't know.  I think
the only way to tell is to try.  At least with our test suite for
'etags', using -I causes regressions, e.g. in cp-src/c.C some tags are
not created.  So I think having this heuristic on by default is a good
thing, overall.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#45246: 28.0.50; etags assertion error
  2022-06-10  7:26                 ` Eli Zaretskii
@ 2022-06-10 14:01                   ` Francesco Potortì
  0 siblings, 0 replies; 17+ messages in thread
From: Francesco Potortì @ 2022-06-10 14:01 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, 45246-done, Gregor Zattler

Gregor:
>> I confirm -I avoids the assertion.
>> I must admit, I did not read the man pager closely but
>> anyway I wouldn't have understood the consequences of
>> setting vs not setting -I.  Perhaps the documentation could
>> be amended somehow?
>
>> I assume this is a trade-of between speed and robustness?

Eli:
>No, I think it's more about the correctness of the produced TAGS file
>than about speed.  etags's C/C++ parser is extremely naïve and largely
>ignores the complicated syntax of the C dialects.  So using the
>"closing brace in column zero ends all top-level definitions"
>heuristic is useful for preventing 'etags' from being utterly confused
>by some sophisticated use of C/C++ facilities, such as macros and the
>more arcane syntactic constructs in modern C++: it makes sure the
>confusion ends as early as possible.
>
>> I do not have objections to closing this bug
>> report, but I wonder why etags treats closing braces in
>> the first column special if it does not speed up things?
>
>See above.  Whether this is a real problem, I don't know.  I think
>the only way to tell is to try.  At least with our test suite for
>'etags', using -I causes regressions, e.g. in cp-src/c.C some tags are
>not created.  So I think having this heuristic on by default is a good
>thing, overall.

I second all of Eli's statements.





^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-06-10 14:01 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-12-14 23:38 bug#45246: 28.0.50; etags assertion error Gregor Zattler
2022-06-07 11:35 ` Lars Ingebrigtsen
2022-06-07 14:26   ` Gregor Zattler
2022-06-07 15:58     ` Eli Zaretskii
2022-06-07 16:38       ` Andreas Schwab
2022-06-07 17:15         ` Eli Zaretskii
2022-06-07 17:34           ` Andreas Schwab
2022-06-07 18:25             ` Eli Zaretskii
2022-06-07 17:08       ` Eli Zaretskii
2022-06-09 17:42         ` Eli Zaretskii
2022-06-09 18:43           ` Lars Ingebrigtsen
2022-06-09 18:59             ` Eli Zaretskii
2022-06-09 22:33               ` Gregor Zattler
2022-06-10  7:25                 ` Eli Zaretskii
2022-06-10  7:26                 ` Eli Zaretskii
2022-06-10 14:01                   ` Francesco Potortì
2022-06-07 17:13       ` Lars Ingebrigtsen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).