unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Optimizing memory footprint of bidi code
@ 2010-06-07 14:30 Eli Zaretskii
  2010-06-07 15:14 ` Tom Tromey
  2010-06-07 16:01 ` Stefan Monnier
  0 siblings, 2 replies; 5+ messages in thread
From: Eli Zaretskii @ 2010-06-07 14:30 UTC (permalink / raw)
  To: emacs-devel

In a recent bug report ("bug#6365: bidi data structure
inefficiencies") Dan Nicolaescu writes:

> Some bidi data structures are bigger that they need to be, this
> probably results in additional cache misses.
> Examples:
> 
> struct bidi_saved_info could  use bitfields for the bidi_type_t members
> Same for bidi_stack
> 
> bidi_it could use bitfields for a lot of it's members.

I see a couple of boolean flags that could indeed become one-bit
fields.

I see a bidi_dir_t enumeration that has only 3 possible values.  Can
we trust modern compilers to use the smallest integer type for
enumerations?  If so, this part is already taken care of.  If not, how
to make this type smaller without losing the benefits of an enumerated
type (I don't want to manipulate magic values or macros)?

The other members which could be converted to bitfields are small
integers, such as bidi_type_t (20 possible values) or stack_idx (up to
62).  While it would be possible to use 5 bits for the former and 6
bits for the latter, at the time this code was designed, 9 years ago,
there was a significant performance hit for manipulating bits on the
then popular architectures.  Or at least that's what I thought.  These
members use `int' deliberately, because I was (and still am) afraid of
the slow-down from replacing a simple increment of 2 integer variables
(in the unidirectional display engine) with the 1500-line reordering
code.  After all, this code runs in the innermost loops of Emacs
display engine, and it runs for each character we display in each
window.

Maybe manipulating bitfields is no longer an issue with current
architectures and compiler technology.  Maybe this problem didn't
exist, except in my head, even back when I designed this.  But before
I change the code, I'd like to know for sure.  Because if it's still
more expensive today to extract bitfields, then it may be worth our
while to give up some memory for extra performance.  And if the
benefits from using bitfields are insignificant, perhaps it's not
worth the hassle to change the code for gaining a few hundred bytes.

Can someone "in the know" tell what is the current state of art in
these matters?  Failing that, can someone measure the differences and
publish the results?

A few more things we may wish to consider:

  . struct bidi_it weighs in at 712 bytes on a 64-bit machine.  Can
    this amount of memory cause more frequent cache misses, with
    today's cache sizes?

  . Out of 712 bytes, the level stack takes 512.  The Unicode Standard
    requires support for 62 levels (I will remove the extra 2 when I
    have time), but I have never seen more than 4 used in any
    real-life text.  I don't want to give up full compliance with
    UAX#9, and I don't want to allocate memory from the heap for each
    extra level.  But maybe starting with a 4-level stack and
    allocating the other 58 when the 5th is required will be a good
    enough optimization, leaving us with a 250-byte structure for all
    but the very extreme use-cases?

  . OTOH, the code on bidi.c consults a much larger char-table of
    bidirectional properties _for_each_character_it_processes_.  Given
    this, how significant is the impact of a few hundred bytes in
    struct bidi_it for causing a possible increase in cache misses?

Comments welcome.  I will add to my todo implementation of any
conclusions we will reach here (and encourage anyone with enough time
to beat me to it).

Thanks in advance.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Optimizing memory footprint of bidi code
  2010-06-07 14:30 Optimizing memory footprint of bidi code Eli Zaretskii
@ 2010-06-07 15:14 ` Tom Tromey
  2010-06-07 15:46   ` Andreas Schwab
  2010-06-07 16:01 ` Stefan Monnier
  1 sibling, 1 reply; 5+ messages in thread
From: Tom Tromey @ 2010-06-07 15:14 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

>>>>> "Eli" == Eli Zaretskii <eliz@gnu.org> writes:

Eli> I see a bidi_dir_t enumeration that has only 3 possible values.  Can
Eli> we trust modern compilers to use the smallest integer type for
Eli> enumerations?  If so, this part is already taken care of.  If not, how
Eli> to make this type smaller without losing the benefits of an enumerated
Eli> type (I don't want to manipulate magic values or macros)?

In C an enum is always an int.  In a struct the compiler will not
automatically shrink this to a bitfield.

However, you can declare it as a bitfield.  GCC will check to make sure
that all the enumeration values fit in the bitfield, so you will get an
error if you add a constant to the enum causing it to no longer fit.
Bitfield enums are not portable.  So, gcc and gdb do this:

#if defined(__GNUC__) && (__GNUC__ >= 2)
#define ENUM_BITFIELD(TYPE) enum TYPE
#else
#define ENUM_BITFIELD(TYPE) unsigned int
#endif

Then in the definition:

struct main_type
{
  ...
  ENUM_BITFIELD(type_code) code : 8;
  ...
};

I think this approach results in a good tradeoff of type safety, size,
and portability, as long as a significant fraction of developers are
using GCC.

Eli> Maybe manipulating bitfields is no longer an issue with current
Eli> architectures and compiler technology.

It increases code size, which can matter.  The only way to know for sure
is to try it and profile.

I can't answer the rest of your questions.

Tom



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Optimizing memory footprint of bidi code
  2010-06-07 15:14 ` Tom Tromey
@ 2010-06-07 15:46   ` Andreas Schwab
  0 siblings, 0 replies; 5+ messages in thread
From: Andreas Schwab @ 2010-06-07 15:46 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Eli Zaretskii, emacs-devel

Tom Tromey <tromey@redhat.com> writes:

> In C an enum is always an int.

This is not true.  Only the enum constants are of always of type int.
The underlying type of an enum can be any type that is wide enough to
include all defined values.

> In a struct the compiler will not automatically shrink this to a
> bitfield.

The width is the same as outside of structs.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Optimizing memory footprint of bidi code
  2010-06-07 14:30 Optimizing memory footprint of bidi code Eli Zaretskii
  2010-06-07 15:14 ` Tom Tromey
@ 2010-06-07 16:01 ` Stefan Monnier
  2010-06-07 23:10   ` Eli Zaretskii
  1 sibling, 1 reply; 5+ messages in thread
From: Stefan Monnier @ 2010-06-07 16:01 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

> I see a bidi_dir_t enumeration that has only 3 possible values.  Can
> we trust modern compilers to use the smallest integer type for
> enumerations?

Not that I know, no.

> If so, this part is already taken care of.  If not, how to make this
> type smaller without losing the benefits of an enumerated type (I
> don't want to manipulate magic values or macros)?

I'm not sure what you're worried about.  What's wrong with adding a ":2"
to the field?

> bits for the latter, at the time this code was designed, 9 years ago,
> there was a significant performance hit for manipulating bits on the
> then popular architectures.  Or at least that's what I thought.  These

Accessing and modifying a bitfield is and has always been slower than an
"int", yes.  Some machines (typically the P4) have relatively slow
bit-shifting operations, among other things.

> Maybe manipulating bitfields is no longer an issue with current
> architectures and compiler technology.  Maybe this problem didn't

I think the issue is not so much whether bitfield access has gotten
faster, but that processors in general have gotten faster, while memory
has sped up much less.  So in many cases it's worthwhile to do a bit
more work if it lets you access less memory.  In many cases (tho
obviously not all), time can be estimated fairly well by looking at the
amount of memory that's accessed, treating the actual computation as
being completely free.

> Can someone "in the know" tell what is the current state of art in
> these matters?

I'm not sure I'm "in the know" enough, but the general rule is that if
those fields amount to a significant quantity of memory (as compared to
the "working set" within the various levels of caches) and if those
fields are often accessed at about the same time (at the time-sale of
the corresponding level of cache), then packing them in bitfelds can be
a win.
If they're not accessed at the same time, then you don't save any memory
(you still bring in a whole cache line to access either the int or the
bitfield).
Also if you almost constantly access the same fields (i.e. you'd almost
want them in registers), then bitfields are probably not a good choice.

>   . struct bidi_it weighs in at 712 bytes on a 64-bit machine.  Can
>     this amount of memory cause more frequent cache misses, with
>     today's cache sizes?

If there's only one such struct used, then it's probably small enough
that it doesn't matter.

Typical L1 caches are in the 16KB range ("have been and always will be"
it seems: apparently smaller would make them useless and larger makes
them slow enough that it's worthwhile to add a smaller lower-level
cache), so 712 bytes are not lost in the noise at that scale, but they
fit well.

>   . Out of 712 bytes, the level stack takes 512.  The Unicode Standard
>     requires support for 62 levels (I will remove the extra 2 when I
>     have time), but I have never seen more than 4 used in any
>     real-life text.  I don't want to give up full compliance with
>     UAX#9, and I don't want to allocate memory from the heap for each
>     extra level.  But maybe starting with a 4-level stack and
>     allocating the other 58 when the 5th is required will be a good
>     enough optimization, leaving us with a 250-byte structure for all
>     but the very extreme use-cases?

That's not worth the trouble: as long as these extra bytes are contiguous
and are not accessed they only cost "main memory" but don't slow us down
(they won't be brought into the cache at all).

>   . OTOH, the code on bidi.c consults a much larger char-table of
>     bidirectional properties _for_each_character_it_processes_.  Given
>     this, how significant is the impact of a few hundred bytes in
>     struct bidi_it for causing a possible increase in cache misses?

For a single bidi_it, with fields accessed "all the time", you're
probably better off with int than with bitfields (bitfields will slow
you down everytime you access them, and they'll mostly stay in cache if
they're used often enough, so their cost is only in terms of pushing out
other useful things from the cache).

Of course, only measurements can give reliable answers.


        Stefan



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Optimizing memory footprint of bidi code
  2010-06-07 16:01 ` Stefan Monnier
@ 2010-06-07 23:10   ` Eli Zaretskii
  0 siblings, 0 replies; 5+ messages in thread
From: Eli Zaretskii @ 2010-06-07 23:10 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: emacs-devel@gnu.org
> Date: Mon, 07 Jun 2010 12:01:26 -0400
> 
> > If so, this part is already taken care of.  If not, how to make this
> > type smaller without losing the benefits of an enumerated type (I
> > don't want to manipulate magic values or macros)?
> 
> I'm not sure what you're worried about.  What's wrong with adding a ":2"
> to the field?

Nothing's wrong.  I was saying that I would like to still be able to
use enumerated values and not just literal numerical constants.  But I
think Tom explained that with GCC this is not a problem.

> >   . struct bidi_it weighs in at 712 bytes on a 64-bit machine.  Can
> >     this amount of memory cause more frequent cache misses, with
> >     today's cache sizes?
> 
> If there's only one such struct used, then it's probably small enough
> that it doesn't matter.

There's only one such struct, it is part of struct it, the display
iterator used by redisplay.  Since windows are redisplayed one at a
time, there's only one instance of struct it and correspondingly only
one instance of struct bidi_it.

There's a cache inside bidi.c that caches bidi_it states.  That one
holds snapshots of struct bidi_it, but it is actively used only when
text actually needs reordering; otherwise, there's only one instance
there.  When reordering _is_ needed, the number of instances there
could be potentially large, but the intent of this cache is to prevent
repeated application of UAX#9 rules to characters we already saw and
processed.  This cache is on the heap.

> Of course, only measurements can give reliable answers.

I'd surely love to see some.



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-06-07 23:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-07 14:30 Optimizing memory footprint of bidi code Eli Zaretskii
2010-06-07 15:14 ` Tom Tromey
2010-06-07 15:46   ` Andreas Schwab
2010-06-07 16:01 ` Stefan Monnier
2010-06-07 23:10   ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).