netdev - Re: [PATCH net-next 2/5] gianfar: Cleanup and optimize struct gfar

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <511A66A3.7040704@freescale.com>
Date:	Tue, 12 Feb 2013 17:58:27 +0200
From:	Claudiu Manoil <claudiu.manoil@...escale.com>
To:	Paul Gortmaker <paul.gortmaker@...driver.com>
CC:	<netdev@...r.kernel.org>, "David S. Miller" <davem@...emloft.net>
Subject: Re: [PATCH net-next 2/5] gianfar: Cleanup and optimize struct gfar_private

On 2/12/2013 5:11 PM, Paul Gortmaker wrote:
> On 13-02-12 07:47 AM, Claudiu Manoil wrote:
>> * group run-time critical fields within the 1st cacheline
>>    followed by the tx|rx_queue reference arrays and the interrupt
>>    group instances (gfargrp) (all cacheline aligned)
>> * change 'padding' from unsigned short to u16
>> * clear 20+ byte memory hole
>
> Per prev. mail, it gets harder to see which change is where,
> when they are all lumped together like this.  For example, it
> wasn't obvious to me where the 20 byte hole was.  Also, it
> doesn't look like you changed the padding, but rather instead
> totally re-purposed it, leaving no alignment padding after the
> uchar bitfields (where it was originally).
>
> P.

The 20 byte hole was here:

struct gfar_private {
         unsigned int               num_tx_queues;
         unsigned int               num_rx_queues;
         unsigned int               num_grps;
         unsigned int               mode;
         unsigned int               total_tx_ring_size;
         unsigned int               total_rx_ring_size;
         struct device_node *       node;
         struct net_device *        ndev;
         /* --- cacheline 1 boundary (32 bytes) --- */
         struct platform_device *   ofdev;
         enum gfar_errata           errata;

         /* XXX 24 bytes hole, try to pack */

         /* --- cacheline 2 boundary (64 bytes) --- */
         struct gfar_priv_grp       gfargrp[2];


At the end of the patch series the first cacheline is without holes.
Please note that the re-grouping of members and their order is most
important. For instance why keep in the first cacheline something
as unimportant as total_rx_ring_size? Members like rx_buffer_size,
or padding, or even errata, are critical however for the fast path.
Rx processing (gfar_poll + clean_rx_ring) is the bottleneck here,
keeping the CPU to 100%. So the main goal is to optimize this path,
including memory access/cache optimizations. For instance, better 
results were obtained by inverting rx|tx_queue[] with gfargrp[], originally:
         /* --- cacheline 1 boundary (32 bytes) --- */
         struct gfar_priv_tx_q *    tx_queue[8];  /*    32    32 */
         /* --- cacheline 2 boundary (64 bytes) --- */
         struct gfar_priv_rx_q *    rx_queue[8]; /*    64    32 */
         /* --- cacheline 3 boundary (96 bytes) --- */
         struct gfar_priv_grp       gfargrp[2];  /*    96   192 */

The uchar bitfields are unimportant here (used at "init time"), and
they take 4 bytes including padding anyway. So whether it's uchar or
uint, it's the same, maybe better left uchar to discourage the abuse
of these bitfields. (A 2-3byte hole here doesn't change anything
to the whole structure size, which is padded to be at least a 32B
multiple.)

Thanks,
Claudiu


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html