linux-kernel - Re: [PATCH 25/32] aio: use xchg() instead of completion

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130107232115.GF26407@google.com>
Date:	Mon, 7 Jan 2013 15:21:15 -0800
From:	Kent Overstreet <koverstreet@...gle.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	linux-kernel@...r.kernel.org, linux-aio@...ck.org,
	linux-fsdevel@...r.kernel.org, zab@...hat.com, bcrl@...ck.org,
	jmoyer@...hat.com, axboe@...nel.dk, viro@...iv.linux.org.uk,
	tytso@....edu
Subject: Re: [PATCH 25/32] aio: use xchg() instead of completion_lock

On Thu, Jan 03, 2013 at 03:34:14PM -0800, Andrew Morton wrote:
> On Wed, 26 Dec 2012 18:00:04 -0800
> Kent Overstreet <koverstreet@...gle.com> wrote:
> 
> > So, for sticking kiocb completions on the kioctx ringbuffer, we need a
> > lock - it unfortunately can't be lockless.
> > 
> > When the kioctx is shared between threads on different cpus and the rate
> > of completions is high, this lock sees quite a bit of contention - in
> > terms of cacheline contention it's the hottest thing in the aio
> > subsystem.
> > 
> > That means, with a regular spinlock, we're going to take a cache miss
> > to grab the lock, then another cache miss when we touch the data the
> > lock protects - if it's on the same cacheline as the lock, other cpus
> > spinning on the lock are going to be pulling it out from under us as
> > we're using it.
> > 
> > So, we use an old trick to get rid of this second forced cache miss -
> > make the data the lock protects be the lock itself, so we grab them both
> > at once.
> 
> Boy I hope you got that right.
> 
> Did you consider using bit_spin_lock() on the upper bit of `tail'? 
> We've done that in other places and we at least know that it works. 
> And it has the optimisations for CONFIG_SMP=n, understands
> CONFIG_DEBUG_SPINLOCK, has arch-specific optimisations, etc.

I hadn't thought of that - I think it'd suffer from the same problem as
a regular spinlock, where you grab the lock, then go to grab your data
but a different CPU grabbed the cacheline you need...

But the lock debugging would be nice. It'd probably work to make
something generic like bit_spinlock() that also returns some value - or,
the recent patches for making spinlocks back off will also help with
this problem. So maybe between that and batch completion this patch
could be dropped at some point.

So, yeah. The code's plenty tested and I went over the barriers, it
already had all the needed barriers due to the ringbuffer... and I've
done this sort of thing elsewhere too. But it certaintly is a hack and I
wouldn't be sad to see it go.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/