lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Mon, 23 Apr 2007 15:56:28 +0200
From:	"Francis Moreau" <francis.moro@...il.com>
To:	"Roland Dreier" <rdreier@...co.com>
Cc:	"Herbert Xu" <herbert@...dor.apana.org.au>,
	helge.hafting@...el.hist.no, linux-kernel@...r.kernel.org,
	linux-crypto@...r.kernel.org
Subject: Re: [CRYPTO] is it really optimized ?

Hi

[Sorry for the late answer]

On 4/19/07, Francis Moreau <francis.moro@...il.com> wrote:
> On 4/17/07, Roland Dreier <rdreier@...co.com> wrote:
> >  > > It seems trivial to keep the last key you were given and do a quick
> >  > > memcmp in your setkey method to see if it's different from the last
> >  > > key you pushed to hardware, and set a flag if it is.  Then only do
> >  > > your set_key() if you have a new key to pass to hardware.
> >  > >
> >  > > I'm assuming the expense is in the aes_write() calls, and you could
> >  > > avoid them if you know you're not writing something new.
> >
> >  > that's a wrong assumption. aes_write()/aes_read() are both used to
> >  > access to the controller and are slow (no cache involved).
> >
> > Sorry, I wasn't clear.  I meant that the hardware access is what is
> > slow, and that anything you do on the CPU is relatively cheap compared
> > to that.
> >
> > So my suggestion is just to keep a cache (in CPU memory) of what you
> > have already loaded into the HW, and before reloading the HW just
> > check the cache and don't do the actual HW access if you're not going
> > to change the HW contents.  So you avoid any extra aes_write and
> > aes_read calls in the cache hit case.
> >
> > This would have the advantage of making anything that does lots of
> > bulk encryption fast without special casing ecryptfs.
> >
>
> I'm not sure how "memcmp(key, cache, KEY_SIZE)" would impact AES
> performance. I need to give it a test but can't today. I'll do
> tomorrow and give you back the result.
>

OK, I gave it a test and it appears that the cache hit case is
slightly worse than unconditionnal key loading. So it means that
testing that hte key is cached is as long as loading the key into the
controller. Here is what I did in set_key() function:

static void set_key(const char *key)
{
	static u32 my_key[4] __cacheline_aligned;

	u32 key0 = *(const u32 *)(key + 12);
	u32 key1 = *(const u32 *)(key + 8);
	u32 key2 = *(const u32 *)(key + 4);
	u32 key3 = *(const u32 *)(key);
	int timeout = 100;
	u32 miss = 0;

	miss |= key0 ^ my_key[0];
	miss |= key1 ^ my_key[1];
	miss |= key2 ^ my_key[2];
	miss |= key3 ^ my_key[3];
	if (miss == 0)
		return;

	my_key[0] = key0;
	my_key[1] = key1;
	my_key[2] = key2;
	my_key[3] = key3;
	
	aes_write(be32_to_cpu(key0), AES_KEY0);
	aes_write(be32_to_cpu(key1), AES_KEY1);
	aes_write(be32_to_cpu(key2), AES_KEY2);
	aes_write(be32_to_cpu(key3), AES_KEY3);

	/* generate dkey: should take 11 cycles */
	aes_write(aes_read(AES_CR) | CR_DKEYGEN, AES_CR);

	while (aes_read(AES_CR) & CR_DKEYGEN) {
		if (--timeout == 0)
			break;
	}
}

So I was wrong, hardware access is not so expensive as I thought. But
it also means that all instructions executed in the drivers'
encrypt()/decrypt() methods have a real cost and skipping key loadings
is a win.

Using the driver exclusively doesn't seem to be the right solution,
but I don't see another way to do that...
-- 
Francis
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists