lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Wed, 10 Oct 2018 08:59:13 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Eric Dumazet' <eric.dumazet@...il.com>,
        Heiner Kallweit <hkallweit1@...il.com>,
        David Ahern <dsahern@...il.com>,
        David Miller <davem@...emloft.net>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: [PATCH net-next v2] net: core: change bool members of struct
 net_device to bitfield members

From: Eric Dumazet
> Sent: 09 October 2018 21:52
> 
> On 10/09/2018 01:24 PM, Heiner Kallweit wrote:
> 
> > Reordering the struct members to fill the holes could be a little tricky
> > and could have side effects because it may make a performance difference
> > whether certain members are in one cacheline or not.
> > And whether it's worth to spend this effort (incl. the related risks)
> > just to save a few bytes (also considering that typically we have quite
> > few instances of struct net_device)?
> 
> Not really.
> 
> In fact we probably should spend time reordering fields for performance,
> since some new fields were added a bit randomly, breaking the goal of data locality.
> 
> Some fields are used in control path only can could be moved out of the cache lines
> needed in data path (fast path).

Interesting thought....
The memory allocator rounds sizes up to a power of 2 and gives out memory
aligned to that value.
This means that the cache lines just above powers of 2 are used far
more frequently than those below one.
This will be made worse because the commonly used fields are normally at
the start of a structure.
This ought to be measurable?

Has anyone tried randomly splitting the padding between the start
and end of the allocation (while maintaining cache alignment)?
(Not sure how this would affect kfree().)

Or splitting pages (or small groups of pages) into non-power of 2
size blocks?
For instance you get three 1344 (21*64) byte blocks and five 768 byte
blocks into a 4k page.
These could give a significant footprint reduction as well as
balancing out cache line usage. 

I also wonder whether it is right to add a lot of padding to cache-line
align structure members on systems with large cache lines.
The intention is probably to get a few fields into the same cache line
not to add padding that may be larger than aggregate size of the fields.

Oh - and it is somewhat pointless because kmalloc() isn't guaranteed
to give out cache-line aligned buffers.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ