Hi Eli, Eli Zaretskii writes: >> From: Robert Pluim >> Cc: Po Lu >> Date: Thu, 30 Mar 2023 11:34:42 +0200 >> >> Fstring_lessp has: >> >> /* Check whether the platform allows access to unaligned addresses for >> size_t integers without trapping or undue penalty (a few cycles is OK). >> >> This whitelist is incomplete but since it is only used to improve >> performance, omitting cases is safe. */ >> #if defined __x86_64__|| defined __amd64__ \ >> || defined __i386__ || defined __i386 \ >> || defined __arm64__ || defined __aarch64__ \ >> || defined __powerpc__ || defined __powerpc \ >> || defined __ppc__ || defined __ppc \ >> || defined __s390__ || defined __s390x__ >> #define HAVE_FAST_UNALIGNED_ACCESS 1 >> #else >> #define HAVE_FAST_UNALIGNED_ACCESS 0 >> #endif >> >> but even if unaligned access is normally permitted by a machine, it is >> still undefined behavior to dereference an unaligned pointer. > > This is incorrect. There's nothing undefined about x86 unaligned > accesses. C standards can regard this as UB, but we are using > machine-specific knowledge here You're making a faulty assumption here, there's no guarantee that such an access happens at all. You're, of course, right in that an x86 CPU will have no (visible) qualms about making such a mov, but you're also assuming that the compiler emits a mov. This is not guaranteed anywhere, and guaranteeing so would be terrible for optimization in general. As an example, the compiler is free to, for instance, vectorize a loop, emitting instructions that very much have alignment checking even on x86 (the loop in question is very much parallelizable and vectorizable, as it feels like a textbook example of such operations). > (and Emacs cannot be built with a strict adherence to C standards > anyway). That is indeed correct; there's, however, a difference in how necessary it is here (and I argue it is not, with reasoning presented below). >> Instead, HAVE_FAST_UNALIGNED_ACCESS and UNALIGNED_LOAD_SIZE should be >> removed and memcpy used instead: >> >> word_t a, c; >> >> memcpy (&a, w1 + b / ws, sizeof a); >> memcpy (&c, w2 + b / ws, sizeof c); >> >> doing so will make the compiler itself generate the right sequence of >> instructions for performing unaligned accesses, normally with only a few >> cycles penalty. > > We don't want that penalty here, that's all. At any optimization level, you don't get one (on x86_64). I haven't checked -O0, as it's not worth using (rather, one should use -O2/-O3/-Og/-Oz). >> I would like to install such a change on emacs-29. > > No, please don't. > >> Emacs currently crashes when built with various compilers performing >> pointer alignment checks. > > Details, please. Which compilers, on what platforms, for what target > architectures, etc. Sam presented a decent example (though, sanitizers seem to have been taken into account in this particular example). > Unconditionally removing the fast copy there is a non-starter. You're assuming that alternatives to these "fast" accesses are slow - they are not. The following code... int f_broken (void* x) { return *((int*)x); } int f (void* x) { int v; memcpy (&v, x, sizeof (v)); return v; } ... generates the following code on gcc 12.2.0 with -O1... f_broken: movl (%rdi), %eax ret f: movl (%rdi), %eax ret As a matter of fact, implementing a "skip common prefix" loop with just chars results in code /shorter/ code on the same compiler (and does not violate aliasing rules, since the data FAM is a char one). Some other portable methods could include Duff's device (using memcpy loads), or word-size memcmp calls in a loop. IMO, it is quite a fault in the compiler if Emacs needs to resort to such hacks (and even if we accept that as something that is our problem, we should have an abstraction boundary on it). Note that I did not try hacking Emacs code to benchmark the actual thing being discussed (as I am not in a position to do so conveniently at the moment), but I invite you to try that and reconsider removing such code. Even in the case there is a penalty to this change, I'd argue it is far better for us to fix that in GCC or implement it a "skip common prefix" function in Gnulib (so that it's behind a layer of abstraction) rather than placing this assumption implicitly in this function. I suspect the least intrusive change possible would emit the same code as the current implementation, that change being merely using memcpy to load the words rather than direct dereferences, except in the cases where the current code is entirely broken, and correct code isn't. Thanks in advance, have a lovely day. -- Arsen Arsenović