Optimize glyph row clearing and copying routines

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Optimize glyph row clearing and copying routines
@ 2013-09-24  6:35 Eli Zaretskii
  2013-09-24 10:10 ` Dmitry Antipov
  2013-09-24 13:40 ` John Yates
  0 siblings, 2 replies; 10+ messages in thread
From: Eli Zaretskii @ 2013-09-24  6:35 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: emacs-devel

Does this change really speed up the code?  AFAIU, previously the
struct assignment could use word-size copies (because a struct is
always aligned), but now you cast the arguments to 'char *' and use
memcpy, which could fall back on copying single bytes or shorter
words.

Did you measure the impact?  Was it significant enough to justify this
change?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Optimize glyph row clearing and copying routines
  2013-09-24  6:35 Optimize glyph row clearing and copying routines Eli Zaretskii
@ 2013-09-24 10:10 ` Dmitry Antipov
  2013-09-24 11:50   ` Eli Zaretskii
  2013-09-24 13:40 ` John Yates
  1 sibling, 1 reply; 10+ messages in thread
From: Dmitry Antipov @ 2013-09-24 10:10 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

On 09/24/2013 10:35 AM, Eli Zaretskii wrote:

> Does this change really speed up the code?

YMMV, as usual with benchmarks. Time is in CPU cycles, smaller is better:

It was:

                          gcc 4.8.1  gcc 4.8.1           Intel C 13.1.3   clang 3.3
                          -O2 -g3    -O3 -march=native   -O3 -xHOST       -O3 -march=native
------------------------------------------------------------------------------------------
clear_glyph_row          100        140                 90               40
copy_row_except_pointers 90         160                 80               100


Now it is:

                          gcc 4.8.1  gcc 4.8.1           Intel C 13.1.3   clang 3.3
                          -O2 -g3    -O3 -march=native   -O3 -xHOST       -O3 -march=native
------------------------------------------------------------------------------------------
clear_glyph_row          75         60                  35               35
copy_row_except_pointers 95         150                 50               70

(Intel C and clang makes heavy use of SSE, but gcc isn't).

It would be interesting to add MSVC to this table :-).

In short, I believe that a good compiler should get more optimization opportunities from
new code rather than from old. For this particular case, Intel C is "definitely good",
and gcc, hm...looks controversial at least.

Dmitry




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Optimize glyph row clearing and copying routines
  2013-09-24 10:10 ` Dmitry Antipov
@ 2013-09-24 11:50   ` Eli Zaretskii
  2013-09-24 12:42     ` Dmitry Antipov
  0 siblings, 1 reply; 10+ messages in thread
From: Eli Zaretskii @ 2013-09-24 11:50 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: emacs-devel

> Date: Tue, 24 Sep 2013 14:10:07 +0400
> From: Dmitry Antipov <dmantipov@yandex.ru>
> CC: emacs-devel@gnu.org
> 
> On 09/24/2013 10:35 AM, Eli Zaretskii wrote:
> 
> > Does this change really speed up the code?
> 
> YMMV, as usual with benchmarks. Time is in CPU cycles, smaller is better:
> 
> It was:
> 
>                           gcc 4.8.1  gcc 4.8.1           Intel C 13.1.3   clang 3.3
>                           -O2 -g3    -O3 -march=native   -O3 -xHOST       -O3 -march=native
> ------------------------------------------------------------------------------------------
> clear_glyph_row          100        140                 90               40
> copy_row_except_pointers 90         160                 80               100
> 
> 
> Now it is:
> 
>                           gcc 4.8.1  gcc 4.8.1           Intel C 13.1.3   clang 3.3
>                           -O2 -g3    -O3 -march=native   -O3 -xHOST       -O3 -march=native
> ------------------------------------------------------------------------------------------
> clear_glyph_row          75         60                  35               35
> copy_row_except_pointers 95         150                 50               70
> 
> (Intel C and clang makes heavy use of SSE, but gcc isn't).
> 
> It would be interesting to add MSVC to this table :-).
> 
> In short, I believe that a good compiler should get more optimization opportunities from
> new code rather than from old. For this particular case, Intel C is "definitely good",
> and gcc, hm...looks controversial at least.

I think -O3 is not interesting in this case.  For -O2 with GCC, which
is the most important case (as Emacs is normally built like that), you
get net loss, because AFAIR copy_row_except_pointers is called much
more often than clear_glyph_row.  The important comparison is, of
course, as part of some redisplay scenario, not just cycle comparison.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Optimize glyph row clearing and copying routines
  2013-09-24 11:50   ` Eli Zaretskii
@ 2013-09-24 12:42     ` Dmitry Antipov
  0 siblings, 0 replies; 10+ messages in thread
From: Dmitry Antipov @ 2013-09-24 12:42 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

On 09/24/2013 03:50 PM, Eli Zaretskii wrote:

> I think -O3 is not interesting in this case.  For -O2 with GCC, which
> is the most important case (as Emacs is normally built like that), you
> get net loss, because AFAIR copy_row_except_pointers is called much
> more often than clear_glyph_row.  The important comparison is, of
> course, as part of some redisplay scenario, not just cycle comparison.

1) I do not believe in 5 CPU cycles difference too much. This is
    just ~5%, which may be explained by some irregular inaccuracy.

2) I'm talking about GCC 4.8.1, but people from this list still
    uses 4.2.x. or even older. Unfortunately GCC is known to have
    some serious regressions from time to time.

3) As it's proven by icc/clang, good compiler _really_ gets more
    optimization opportunities from new code rather than from old.
    IMHO we should provide workaround if it's known that gcc-X.Y.Z
    has serious bug; but we should not reject generally better
    code if we have just an unconvincing suspicion that gcc-P.Q.R
    introduces some miserable regression.

Dmitry




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Optimize glyph row clearing and copying routines
  2013-09-24  6:35 Optimize glyph row clearing and copying routines Eli Zaretskii
  2013-09-24 10:10 ` Dmitry Antipov
@ 2013-09-24 13:40 ` John Yates
  2013-09-24 17:04   ` Eli Zaretskii
  1 sibling, 1 reply; 10+ messages in thread
From: John Yates @ 2013-09-24 13:40 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Dmitry Antipov, Emacs developers

[-- Attachment #1: Type: text/plain, Size: 1017 bytes --]

It is true that the standard's definition of memcpy is in terms of copying
a sequence of bytes.  It is also true that memcpy is one of the most
important and most heavily optimized library functions.

These days any credible compiler has a means of determining that an
invocation of a function named 'memcpy' is actually an invocation of the
standard's memcpy.  E.g. gcc's exposed memcpy is an inline whose body
simply calls __builtin_memcpy.

With such knowledge a compiler can bring to bear all kinds of
optimizations.  Dmitry's measurements seem to bear this out.

/john

On Tue, Sep 24, 2013 at 2:35 AM, Eli Zaretskii <eliz@gnu.org> wrote:

> Does this change really speed up the code?  AFAIU, previously the
> struct assignment could use word-size copies (because a struct is
> always aligned), but now you cast the arguments to 'char *' and use
> memcpy, which could fall back on copying single bytes or shorter
> words.
>
> Did you measure the impact?  Was it significant enough to justify this
> change?
>
>

[-- Attachment #2: Type: text/html, Size: 1446 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Optimize glyph row clearing and copying routines
  2013-09-24 13:40 ` John Yates
@ 2013-09-24 17:04   ` Eli Zaretskii
  2013-09-24 17:28     ` John Yates
  2013-09-24 18:03     ` Paul Eggert
  0 siblings, 2 replies; 10+ messages in thread
From: Eli Zaretskii @ 2013-09-24 17:04 UTC (permalink / raw)
  To: John Yates; +Cc: dmantipov, emacs-devel

> Date: Tue, 24 Sep 2013 09:40:53 -0400
> From: John Yates <john@yates-sheets.org>
> Cc: Dmitry Antipov <dmantipov@yandex.ru>, Emacs developers <emacs-devel@gnu.org>
> 
> It is true that the standard's definition of memcpy is in terms of copying
> a sequence of bytes.  It is also true that memcpy is one of the most
> important and most heavily optimized library functions.
> 
> These days any credible compiler has a means of determining that an
> invocation of a function named 'memcpy' is actually an invocation of the
> standard's memcpy.  E.g. gcc's exposed memcpy is an inline whose body
> simply calls __builtin_memcpy.

That's not the issue here.  Because this:

  struct foo *foo, *bar;
  ...
  *foo = *bar;

(which was the old code) is also implemented with a memcpy, whether
builtin or not.

The issue here is a _partial_ copy of a struct, starting with some
offset.  That offset can be aligned less favorably than the struct
itself, which will cause memcpy be less efficient.

The old code copied the entire struct, and then "fixed" a small number
of members that cannot be copied.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Optimize glyph row clearing and copying routines
  2013-09-24 17:04   ` Eli Zaretskii
@ 2013-09-24 17:28     ` John Yates
  2013-09-24 18:03     ` Paul Eggert
  1 sibling, 0 replies; 10+ messages in thread
From: John Yates @ 2013-09-24 17:28 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Dmitry Antipov, Emacs developers

[-- Attachment #1: Type: text/plain, Size: 1296 bytes --]

Got it.  I jumped to a conclusion based on too little evidence.

/john


On Tue, Sep 24, 2013 at 1:04 PM, Eli Zaretskii <eliz@gnu.org> wrote:

> > Date: Tue, 24 Sep 2013 09:40:53 -0400
> > From: John Yates <john@yates-sheets.org>
> > Cc: Dmitry Antipov <dmantipov@yandex.ru>, Emacs developers <
> emacs-devel@gnu.org>
> >
> > It is true that the standard's definition of memcpy is in terms of
> copying
> > a sequence of bytes.  It is also true that memcpy is one of the most
> > important and most heavily optimized library functions.
> >
> > These days any credible compiler has a means of determining that an
> > invocation of a function named 'memcpy' is actually an invocation of the
> > standard's memcpy.  E.g. gcc's exposed memcpy is an inline whose body
> > simply calls __builtin_memcpy.
>
> That's not the issue here.  Because this:
>
>   struct foo *foo, *bar;
>   ...
>   *foo = *bar;
>
> (which was the old code) is also implemented with a memcpy, whether
> builtin or not.
>
> The issue here is a _partial_ copy of a struct, starting with some
> offset.  That offset can be aligned less favorably than the struct
> itself, which will cause memcpy be less efficient.
>
> The old code copied the entire struct, and then "fixed" a small number
> of members that cannot be copied.
>
>

[-- Attachment #2: Type: text/html, Size: 1911 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Optimize glyph row clearing and copying routines
  2013-09-24 17:04   ` Eli Zaretskii
  2013-09-24 17:28     ` John Yates
@ 2013-09-24 18:03     ` Paul Eggert
  2013-09-24 18:32       ` Eli Zaretskii
  1 sibling, 1 reply; 10+ messages in thread
From: Paul Eggert @ 2013-09-24 18:03 UTC (permalink / raw)
  To: emacs-devel

On 09/24/13 10:04, Eli Zaretskii wrote:
> The issue here is a _partial_ copy of a struct, starting with some
> offset.  That offset can be aligned less favorably than the struct
> itself, which will cause memcpy be less efficient.

Yes, I can verify that this is indeed an issue on my platform
(Fedora 19 x86-64, bundled GCC 4.8.1).  On my platform, this code:

  enum { off = offsetof (struct glyph_row, x) };
  memcpy (&to->x, &from->x, sizeof *to - off);

generates a complicated sequence of 176 instructions, whereas this:

  *to = *from;

generates just two:

   mov 0x20,%ecx
   rep movsq %ds:(%rsi),%es:(%rdi)

I don't know whether this is faster, but it sure is simpler.
One possible way to get the simplification at the assembly level,
might be to package the part of the structure that
one wants to copy inside another structure designed just for
that purpose, so that one could write (to->substruct = from->substruct).

The new code does have an advantage of clarity.
The old code squirreled away some stuff in a temporary copy
and then copied it back, which was hard to follow,
whereas the new code simply copies from source to
destination.  This may be worth any minor performance loss.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Optimize glyph row clearing and copying routines
  2013-09-24 18:03     ` Paul Eggert
@ 2013-09-24 18:32       ` Eli Zaretskii
  2013-09-26 12:35         ` Juanma Barranquero
  0 siblings, 1 reply; 10+ messages in thread
From: Eli Zaretskii @ 2013-09-24 18:32 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel

> Date: Tue, 24 Sep 2013 11:03:32 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> 
> The new code does have an advantage of clarity.

I don't think I agree.  The new code is _simpler_, allright, but it
obfuscates the meaning:

  enum { off = offsetof (struct glyph_row, x) };

  memcpy (&to->x, &from->x, sizeof *to - off);

You will not understand what is going on, until you look up the
definition of 'struct glyph_row' (on another file), and find out that
the members mentioned in the commentary (IF you've read the
commentary) are all before 'x' in the struct layout.

In the old, all this was explicit in the source code.

> The old code squirreled away some stuff in a temporary copy
> and then copied it back, which was hard to follow,
> whereas the new code simply copies from source to
> destination.  This may be worth any minor performance loss.

I don't see any advantages either way, so I think we are talking
stylistic preferences here.  Making changes just because one doesn't
like the existing style is maintenance burden and waste of energy.  It
does have one clear disadvantage: it makes the familiar code no longer
that, and makes it harder to merge from long-living branches.

In general, I'm amazed how much of the changes lately are minor
changes due to obscure or stylistic issues, with no real gain in
functionality.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Optimize glyph row clearing and copying routines
  2013-09-24 18:32       ` Eli Zaretskii
@ 2013-09-26 12:35         ` Juanma Barranquero
  0 siblings, 0 replies; 10+ messages in thread
From: Juanma Barranquero @ 2013-09-26 12:35 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Paul Eggert, Emacs developers

On Tue, Sep 24, 2013 at 8:32 PM, Eli Zaretskii <eliz@gnu.org> wrote:

> In general, I'm amazed how much of the changes lately are minor
> changes due to obscure or stylistic issues, with no real gain in
> functionality.

+1



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-09-26 12:35 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-24  6:35 Optimize glyph row clearing and copying routines Eli Zaretskii
2013-09-24 10:10 ` Dmitry Antipov
2013-09-24 11:50   ` Eli Zaretskii
2013-09-24 12:42     ` Dmitry Antipov
2013-09-24 13:40 ` John Yates
2013-09-24 17:04   ` Eli Zaretskii
2013-09-24 17:28     ` John Yates
2013-09-24 18:03     ` Paul Eggert
2013-09-24 18:32       ` Eli Zaretskii
2013-09-26 12:35         ` Juanma Barranquero

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).