From: Gregor Zattler <grfz@gmx.de>
To: Lars Ingebrigtsen <larsi@gnus.org>
Cc: 45246@debbugs.gnu.org
Subject: bug#45246: 28.0.50; etags assertion error
Date: Tue, 07 Jun 2022 16:26:42 +0200 [thread overview]
Message-ID: <87a6ao7dnh.fsf@no.workgroup> (raw)
In-Reply-To: <87o7z4k8ob.fsf@gnus.org>
[-- Attachment #1: Type: text/plain, Size: 2594 bytes --]
Hi Lars,
* Lars Ingebrigtsen <larsi@gnus.org> [2022-06-07; 13:35]:
> Gregor Zattler <grfz@gmx.de> writes:
>
>> and I get an assertion error when executing the following line:
>>
>> ~/src$ find . -type f -print0 | egrep -zZ -- '(\.el|\.c|\.h)(\.gz)?$'
>> | xargs -0IXXXXX sh -c "/home/grfz/src/emacs-master/lib-src/etags
>> XXXXX || echo XXXXX"
>> etags: etags.c:4153: C_entries: Assertion `bracelev == typdefbracelev' failed.
>> Aborted
>
> (I'm going through old bug reports that unfortunately weren't resolved
> at the time.)
>
> I tried saying
>
> etags unicode.h
>
> on the supplied file, but I didn't see any assertion errors, either with
> the etags from Emacs 28 or 29.
>
> Do you still see this problem in recent Emacs versions?
Yes:
$ /home/grfz/src/emacs/lib-src/etags /usr/include/xapian/unicode.h
etags: etags.c:4188: C_entries: Assertion `bracelev == typdefbracelev' failed.
Aborted
This is on debian/bullseye. etags was build in the same
process as this Emacs:
In GNU Emacs 29.0.50 (build 3, x86_64-pc-linux-gnu, X toolkit, cairo version 1.16.0)
of 2022-05-15 built on no
Repository revision: b26574d7d7c458fec7494484ea5bceeed45f2f02
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12011000
System Description: Debian GNU/Linux 11 (bullseye)
Configured using:
'configure -C --prefix=/usr/local/stow/emacs-snapshot
--enable-locallisppath=/etc/emacs:/usr/local/share/emacs/29.0/site-lisp:/usr/local/share/emacs/site-lisp:/usr/share/emacs/29.0/site-lisp:/usr/share/emacs/site-lisp
--with-sound=yes --without-gconf --with-mailutils --build
x86_64-linux-gnu
--infodir=/usr/local/share/info:/usr/share/info --with-json
--with-file-notification=yes --with-cairo --with-x=yes
--with-x-toolkit=lucid --without-toolkit-scroll-bars
--enable-checking=yes,glyphs
--enable-check-lisp-object-type --with-native-compilation
'CFLAGS=-g3 -O3
-ffile-prefix-map=/home/grfz/src/emacs=. -fstack-protector-strong
-Wformat -Werror=format-security ''
Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS
HARFBUZZ JPEG JSON LCMS2 LIBOTF LIBSELINUX LIBSYSTEMD
LIBXML2 M17N_FLT MODULES NATIVE_COMP NOTIFY INOTIFY PDUMPER
PNG RSVG SECCOMP SOUND THREADS TIFF X11 XAW3D XDBE XIM
XINPUT2 XPM LUCID ZLIB
Since there is no unicode.h under ~/src/ ATM, I used a
fifferent unicode.h file this time,. It's attached.
For me this is not an important bug. If you want to
investigate: Is there anything I can do to help you?
Ciao,
--
Gregor
[-- Attachment #2: unicode.h --]
[-- Type: text/plain, Size: 15197 bytes --]
/** @file
* @brief Unicode and UTF-8 related classes and functions.
*/
/* Copyright (C) 2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2019 Olly Betts
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*/
#ifndef XAPIAN_INCLUDED_UNICODE_H
#define XAPIAN_INCLUDED_UNICODE_H
#if !defined XAPIAN_IN_XAPIAN_H && !defined XAPIAN_LIB_BUILD
# error Never use <xapian/unicode.h> directly; include <xapian.h> instead.
#endif
#include <xapian/attributes.h>
#include <xapian/visibility.h>
#include <string>
namespace Xapian {
/** An iterator which returns Unicode character values from a UTF-8 encoded
* string.
*/
class XAPIAN_VISIBILITY_DEFAULT Utf8Iterator {
const unsigned char* p;
const unsigned char* end;
mutable unsigned seqlen;
bool XAPIAN_NOTHROW(calculate_sequence_length() const);
unsigned get_char() const;
Utf8Iterator(const unsigned char* p_,
const unsigned char* end_,
unsigned seqlen_)
: p(p_), end(end_), seqlen(seqlen_) { }
public:
/** Return the raw const char* pointer for the current position. */
const char* raw() const {
return reinterpret_cast<const char*>(p ? p : end);
}
/** Return the number of bytes left in the iterator's buffer. */
size_t left() const { return p ? end - p : 0; }
/** Assign a new string to the iterator.
*
* The iterator will forget the string it was iterating through, and
* return characters from the start of the new string when next called.
* The string is not copied into the iterator, so it must remain valid
* while the iteration is in progress.
*
* @param p_ A pointer to the start of the string to read.
*
* @param len The length of the string to read.
*/
void assign(const char* p_, size_t len) {
if (len) {
p = reinterpret_cast<const unsigned char*>(p_);
end = p + len;
seqlen = 0;
} else {
p = NULL;
}
}
/** Assign a new string to the iterator.
*
* The iterator will forget the string it was iterating through, and
* return characters from the start of the new string when next called.
* The string is not copied into the iterator, so it must remain valid
* while the iteration is in progress.
*
* @param s The string to read. Must not be modified while the iteration
* is in progress.
*/
void assign(const std::string& s) { assign(s.data(), s.size()); }
/** Create an iterator given a pointer to a null terminated string.
*
* The iterator will return characters from the start of the string when
* next called. The string is not copied into the iterator, so it must
* remain valid while the iteration is in progress.
*
* @param p_ A pointer to the start of the null terminated string to read.
*/
explicit Utf8Iterator(const char* p_);
/** Create an iterator given a pointer and a length.
*
* The iterator will return characters from the start of the string when
* next called. The string is not copied into the iterator, so it must
* remain valid while the iteration is in progress.
*
* @param p_ A pointer to the start of the string to read.
*
* @param len The length of the string to read.
*/
Utf8Iterator(const char* p_, size_t len) { assign(p_, len); }
/** Create an iterator given a string.
*
* The iterator will return characters from the start of the string when
* next called. The string is not copied into the iterator, so it must
* remain valid while the iteration is in progress.
*
* @param s The string to read. Must not be modified while the iteration
* is in progress.
*/
Utf8Iterator(const std::string& s) { assign(s.data(), s.size()); }
/** Create an iterator which is at the end of its iteration.
*
* This can be compared to another iterator to check if the other iterator
* has reached its end.
*/
XAPIAN_NOTHROW(Utf8Iterator())
: p(NULL), end(0), seqlen(0) { }
/** Get the current Unicode character value pointed to by the iterator.
*
* If an invalid UTF-8 sequence is encountered, then the byte values
* comprising it are returned until valid UTF-8 or the end of the input is
* reached.
*
* Returns unsigned(-1) if the iterator has reached the end of its buffer.
*/
unsigned XAPIAN_NOTHROW(operator*() const) XAPIAN_PURE_FUNCTION;
/** @private @internal Get the current Unicode character
* value pointed to by the iterator.
*
* If an invalid UTF-8 sequence is encountered, then the byte values
* comprising it are returned with the top bit set (so the caller can
* differentiate these from the same values arising from valid UTF-8)
* until valid UTF-8 or the end of the input is reached.
*
* Returns unsigned(-1) if the iterator has reached the end of its buffer.
*/
unsigned XAPIAN_NOTHROW(strict_deref() const) XAPIAN_PURE_FUNCTION;
/** Move forward to the next Unicode character.
*
* @return An iterator pointing to the position before the move.
*/
Utf8Iterator operator++(int) {
// If we've not calculated seqlen yet, do so.
if (seqlen == 0) calculate_sequence_length();
const unsigned char* old_p = p;
unsigned old_seqlen = seqlen;
p += seqlen;
if (p == end) p = NULL;
seqlen = 0;
return Utf8Iterator(old_p, end, old_seqlen);
}
/** Move forward to the next Unicode character.
*
* @return A reference to this object.
*/
Utf8Iterator& operator++() {
if (seqlen == 0) calculate_sequence_length();
p += seqlen;
if (p == end) p = NULL;
seqlen = 0;
return *this;
}
/** Test two Utf8Iterators for equality.
*
* @param other The Utf8Iterator to compare this one with.
* @return true iff the iterators point to the same position.
*/
bool XAPIAN_NOTHROW(operator==(const Utf8Iterator& other) const) {
return p == other.p;
}
/** Test two Utf8Iterators for inequality.
*
* @param other The Utf8Iterator to compare this one with.
* @return true iff the iterators do not point to the same position.
*/
bool XAPIAN_NOTHROW(operator!=(const Utf8Iterator& other) const) {
return p != other.p;
}
/// We implement the semantics of an STL input_iterator.
//@{
typedef std::input_iterator_tag iterator_category;
typedef unsigned value_type;
typedef size_t difference_type;
typedef const unsigned* pointer;
typedef const unsigned& reference;
//@}
};
/// Functions associated with handling Unicode characters.
namespace Unicode {
/** Each Unicode character is in exactly one of these categories.
*
* The Unicode standard calls this the "General Category", and uses a
* "Major, minor" convention to derive a two letter code.
*/
typedef enum {
UNASSIGNED, /**< Other, not assigned (Cn) */
UPPERCASE_LETTER, /**< Letter, uppercase (Lu) */
LOWERCASE_LETTER, /**< Letter, lowercase (Ll) */
TITLECASE_LETTER, /**< Letter, titlecase (Lt) */
MODIFIER_LETTER, /**< Letter, modifier (Lm) */
OTHER_LETTER, /**< Letter, other (Lo) */
NON_SPACING_MARK, /**< Mark, nonspacing (Mn) */
ENCLOSING_MARK, /**< Mark, enclosing (Me) */
COMBINING_SPACING_MARK, /**< Mark, spacing combining (Mc) */
DECIMAL_DIGIT_NUMBER, /**< Number, decimal digit (Nd) */
LETTER_NUMBER, /**< Number, letter (Nl) */
OTHER_NUMBER, /**< Number, other (No) */
SPACE_SEPARATOR, /**< Separator, space (Zs) */
LINE_SEPARATOR, /**< Separator, line (Zl) */
PARAGRAPH_SEPARATOR, /**< Separator, paragraph (Zp) */
CONTROL, /**< Other, control (Cc) */
FORMAT, /**< Other, format (Cf) */
PRIVATE_USE, /**< Other, private use (Co) */
SURROGATE, /**< Other, surrogate (Cs) */
CONNECTOR_PUNCTUATION, /**< Punctuation, connector (Pc) */
DASH_PUNCTUATION, /**< Punctuation, dash (Pd) */
OPEN_PUNCTUATION, /**< Punctuation, open (Ps) */
CLOSE_PUNCTUATION, /**< Punctuation, close (Pe) */
INITIAL_QUOTE_PUNCTUATION, /**< Punctuation, initial quote (Pi) */
FINAL_QUOTE_PUNCTUATION, /**< Punctuation, final quote (Pf) */
OTHER_PUNCTUATION, /**< Punctuation, other (Po) */
MATH_SYMBOL, /**< Symbol, math (Sm) */
CURRENCY_SYMBOL, /**< Symbol, currency (Sc) */
MODIFIER_SYMBOL, /**< Symbol, modified (Sk) */
OTHER_SYMBOL /**< Symbol, other (So) */
} category;
namespace Internal {
/** @private @internal Extract the information about a character from the
* Unicode character tables.
*
* Characters outside of the Unicode range (i.e. ch >= 0x110000) are
* treated as UNASSIGNED with no case variants.
*/
XAPIAN_VISIBILITY_DEFAULT
int XAPIAN_NOTHROW(get_character_info(unsigned ch)) XAPIAN_CONST_FUNCTION;
/** @private @internal Extract how to convert the case of a Unicode
* character from its info.
*/
inline int get_case_type(int info) { return ((info & 0xe0) >> 5); }
/** @private @internal Extract the category of a Unicode character from its
* info.
*/
inline category get_category(int info) {
return static_cast<category>(info & 0x1f);
}
/** @private @internal Extract the delta to use for case conversion of a
* character from its info.
*/
inline int get_delta(int info) {
/* It's implementation defined if sign extension happens when right
* shifting a signed int, although in practice sign extension is what
* most compilers implement.
*
* Some compilers are smart enough to spot common idioms for sign
* extension, but not all (e.g. GCC < 7 doesn't spot the one used
* below), so check what the implementation-defined behaviour is with
* a constant conditional which should get optimised away.
*
* We use the ternary operator here to avoid various compiler
* warnings which writing this as an `if` results in.
*/
return ((-1 >> 1) == -1 ?
// Right shift sign-extends.
info >> 8 :
// Right shift shifts in zeros so bitwise-not before and after
// the shift for negative values.
(info >= 0) ? (info >> 8) : (~(~info >> 8)));
}
}
/** Convert a single non-ASCII Unicode character to UTF-8.
*
* This is intended mainly as a helper method for to_utf8().
*
* @param ch The character (which must be > 128) to write to @a buf.
* @param buf The buffer to write the character to - it must have
* space for (at least) 4 bytes.
*
* @return The length of the resultant UTF-8 character in bytes.
*/
XAPIAN_VISIBILITY_DEFAULT
unsigned nonascii_to_utf8(unsigned ch, char* buf);
/** Convert a single Unicode character to UTF-8.
*
* @param ch The character to write to @a buf.
* @param buf The buffer to write the character to - it must have
* space for (at least) 4 bytes.
*
* @return The length of the resultant UTF-8 character in bytes.
*/
inline unsigned to_utf8(unsigned ch, char* buf) {
if (ch < 128) {
*buf = static_cast<unsigned char>(ch);
return 1;
}
return Xapian::Unicode::nonascii_to_utf8(ch, buf);
}
/** Append the UTF-8 representation of a single Unicode character to a
* std::string.
*/
inline void append_utf8(std::string& s, unsigned ch) {
char buf[4];
s.append(buf, to_utf8(ch, buf));
}
/// Return the category which a given Unicode character falls into.
inline category get_category(unsigned ch) {
return Internal::get_category(Internal::get_character_info(ch));
}
/// Test if a given Unicode character is "word character".
inline bool is_wordchar(unsigned ch) {
const unsigned int WORDCHAR_MASK =
(1 << Xapian::Unicode::UPPERCASE_LETTER) |
(1 << Xapian::Unicode::LOWERCASE_LETTER) |
(1 << Xapian::Unicode::TITLECASE_LETTER) |
(1 << Xapian::Unicode::MODIFIER_LETTER) |
(1 << Xapian::Unicode::OTHER_LETTER) |
(1 << Xapian::Unicode::NON_SPACING_MARK) |
(1 << Xapian::Unicode::ENCLOSING_MARK) |
(1 << Xapian::Unicode::COMBINING_SPACING_MARK) |
(1 << Xapian::Unicode::DECIMAL_DIGIT_NUMBER) |
(1 << Xapian::Unicode::LETTER_NUMBER) |
(1 << Xapian::Unicode::OTHER_NUMBER) |
(1 << Xapian::Unicode::CONNECTOR_PUNCTUATION);
return ((WORDCHAR_MASK >> get_category(ch)) & 1);
}
/// Test if a given Unicode character is a whitespace character.
inline bool is_whitespace(unsigned ch) {
const unsigned int WHITESPACE_MASK =
(1 << Xapian::Unicode::CONTROL) | // For TAB, CR, LF, FF.
(1 << Xapian::Unicode::SPACE_SEPARATOR) |
(1 << Xapian::Unicode::LINE_SEPARATOR) |
(1 << Xapian::Unicode::PARAGRAPH_SEPARATOR);
return ((WHITESPACE_MASK >> get_category(ch)) & 1);
}
/// Test if a given Unicode character is a currency symbol.
inline bool is_currency(unsigned ch) {
return (get_category(ch) == Xapian::Unicode::CURRENCY_SYMBOL);
}
/// Convert a Unicode character to lowercase.
inline unsigned tolower(unsigned ch) {
int info = Xapian::Unicode::Internal::get_character_info(ch);
if (!(Internal::get_case_type(info) & 2))
return ch;
return ch + Internal::get_delta(info);
}
/// Convert a Unicode character to uppercase.
inline unsigned toupper(unsigned ch) {
int info = Xapian::Unicode::Internal::get_character_info(ch);
if (!(Internal::get_case_type(info) & 4))
return ch;
return ch - Internal::get_delta(info);
}
/// Convert a UTF-8 std::string to lowercase.
inline std::string
tolower(const std::string& term)
{
std::string result;
result.reserve(term.size());
for (Utf8Iterator i(term); i != Utf8Iterator(); ++i) {
append_utf8(result, tolower(*i));
}
return result;
}
/// Convert a UTF-8 std::string to uppercase.
inline std::string
toupper(const std::string& term)
{
std::string result;
result.reserve(term.size());
for (Utf8Iterator i(term); i != Utf8Iterator(); ++i) {
append_utf8(result, toupper(*i));
}
return result;
}
}
}
#endif // XAPIAN_INCLUDED_UNICODE_H
next prev parent reply other threads:[~2022-06-07 14:26 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-14 23:38 bug#45246: 28.0.50; etags assertion error Gregor Zattler
2022-06-07 11:35 ` Lars Ingebrigtsen
2022-06-07 14:26 ` Gregor Zattler [this message]
2022-06-07 15:58 ` Eli Zaretskii
2022-06-07 16:38 ` Andreas Schwab
2022-06-07 17:15 ` Eli Zaretskii
2022-06-07 17:34 ` Andreas Schwab
2022-06-07 18:25 ` Eli Zaretskii
2022-06-07 17:08 ` Eli Zaretskii
2022-06-09 17:42 ` Eli Zaretskii
2022-06-09 18:43 ` Lars Ingebrigtsen
2022-06-09 18:59 ` Eli Zaretskii
2022-06-09 22:33 ` Gregor Zattler
2022-06-10 7:25 ` Eli Zaretskii
2022-06-10 7:26 ` Eli Zaretskii
2022-06-10 14:01 ` Francesco Potortì
2022-06-07 17:13 ` Lars Ingebrigtsen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87a6ao7dnh.fsf@no.workgroup \
--to=grfz@gmx.de \
--cc=45246@debbugs.gnu.org \
--cc=larsi@gnus.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).