lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LSU.2.00.1103161123360.14076@sister.anvils>
Date:	Wed, 16 Mar 2011 11:42:14 -0700 (PDT)
From:	Hugh Dickins <hughd@...gle.com>
To:	George Spelvin <linux@...izon.com>
cc:	herbert@...dor.hengli.com.au, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org, mpm@...enic.com, penberg@...helsinki.fi
Subject: Re: [PATCH 1/8] drivers/random: Cache align ip_random better

On Wed, 16 Mar 2011, George Spelvin wrote:

> > I'm intrigued: please educate me.  On what architectures does cache-
> > aligning a 48-byte buffer (previously offset by 4 bytes) speed up
> > copying from it, and why?  Does the copying involve 8-byte or 16-byte
> > instructions that benefit from that alignment, rather than cacheline
> > alignment?
> 
> I had two thoughts in my head when I wrote that:
> 1) A smart compiler could note the alignment and issue wider copy
>    instructions.  (Especially on alignment-required architectures.)

Right, that part of it would benefit from stronger alignment,
but does not generally need cacheline alignment.

> 2) The cacheline fetch would get more data faster.  The data would
>    be transferred in the first 6 beats of the load from RAM (assuming a
>    64-bit data bus) rather than waiting for 7, so you'd finish the copy
>    1 ns sooner or so.  Similar 1-cycle win on a 128-bit Ln->L(n-1) cache
>    transfer.

That argument worries me.  I don't know enough to say whether you are
correct or not.  But if you are correct, then it worries me that your
patch will be the first of a trickle growing to a stream to an avalanche
of patches where people align and reorder structures so that the most
commonly accessed fields are at the beginnng of the cacheline, so that
those can then be accessed minutely faster.

Aargh, and now I am setting off the avalanche with that remark.
Please, someone, save us by discrediting George's argument.

> 
> As I said, "infinitesimal".  The main reason that I bothered to
> generate a patch was that it appealed to my sense of neatness to
> keep the 3x16-byte buffer 16-byte aligned.

Ah, now you come clean!  Yes, it does feel neater to me too;
but I doubt that would be sufficient justification by itself.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ