lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 12 Feb 2013 11:49:37 -0500
From:	Paul Gortmaker <paul.gortmaker@...driver.com>
To:	Claudiu Manoil <claudiu.manoil@...escale.com>
CC:	<netdev@...r.kernel.org>, "David S. Miller" <davem@...emloft.net>
Subject: Re: [PATCH net-next 2/5] gianfar: Cleanup and optimize struct gfar_private

On 13-02-12 10:58 AM, Claudiu Manoil wrote:
> On 2/12/2013 5:11 PM, Paul Gortmaker wrote:
>> On 13-02-12 07:47 AM, Claudiu Manoil wrote:
>>> * group run-time critical fields within the 1st cacheline
>>>    followed by the tx|rx_queue reference arrays and the interrupt
>>>    group instances (gfargrp) (all cacheline aligned)
>>> * change 'padding' from unsigned short to u16
>>> * clear 20+ byte memory hole
>>
>> Per prev. mail, it gets harder to see which change is where,
>> when they are all lumped together like this.  For example, it
>> wasn't obvious to me where the 20 byte hole was.  Also, it
>> doesn't look like you changed the padding, but rather instead
>> totally re-purposed it, leaving no alignment padding after the
>> uchar bitfields (where it was originally).
>>
>> P.
> 
> The 20 byte hole was here:
> 
> struct gfar_private {
>          unsigned int               num_tx_queues;
>          unsigned int               num_rx_queues;
>          unsigned int               num_grps;
>          unsigned int               mode;
>          unsigned int               total_tx_ring_size;
>          unsigned int               total_rx_ring_size;
>          struct device_node *       node;
>          struct net_device *        ndev;
>          /* --- cacheline 1 boundary (32 bytes) --- */
>          struct platform_device *   ofdev;
>          enum gfar_errata           errata;
> 
>          /* XXX 24 bytes hole, try to pack */

OK, so this is good information, that wasn't in your commit log.
(Is it 20 or 24 though?  I'm guessing 24, and original commit
log was wrong in saying 20.)

Something like:

"The default gcc layout of gfar_private leaves an implicit 24
byte hole (bytes XYZ --> ABC in the 1st cache line) after the
errata enum...."

> 
>          /* --- cacheline 2 boundary (64 bytes) --- */
>          struct gfar_priv_grp       gfargrp[2];
> 
> 
> At the end of the patch series the first cacheline is without holes.
> Please note that the re-grouping of members and their order is most
> important. For instance why keep in the first cacheline something

The importance of ordering and alignment was never in question.  What
was missing, was conveying to the reviewer exactly how you'd made it
better, without forcing them to manually deconstruct gfar_priv in the
before and after cases, esp. when you'd already done that analysis.

> as unimportant as total_rx_ring_size? Members like rx_buffer_size,
> or padding, or even errata, are critical however for the fast path.
> Rx processing (gfar_poll + clean_rx_ring) is the bottleneck here,
> keeping the CPU to 100%. So the main goal is to optimize this path,
> including memory access/cache optimizations. For instance, better 
> results were obtained by inverting rx|tx_queue[] with gfargrp[], originally:
>          /* --- cacheline 1 boundary (32 bytes) --- */
>          struct gfar_priv_tx_q *    tx_queue[8];  /*    32    32 */
>          /* --- cacheline 2 boundary (64 bytes) --- */
>          struct gfar_priv_rx_q *    rx_queue[8]; /*    64    32 */
>          /* --- cacheline 3 boundary (96 bytes) --- */
>          struct gfar_priv_grp       gfargrp[2];  /*    96   192 */
> 
> The uchar bitfields are unimportant here (used at "init time"), and
> they take 4 bytes including padding anyway. So whether it's uchar or
> uint, it's the same, maybe better left uchar to discourage the abuse
> of these bitfields. (A 2-3byte hole here doesn't change anything
> to the whole structure size, which is padded to be at least a 32B
> multiple.)

This here too, i.e. why the padding after the bitfields is unimportant
and hence removed should also be in the long log.

Thanks,
Paul.
--

> 
> Thanks,
> Claudiu
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ