linux-kernel - Re: Announce: Semaphore-Removal tree

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20080430110159.GN108924158@sgi.com>
Date:	Wed, 30 Apr 2008 21:01:59 +1000
From:	David Chinner <dgc@....com>
To:	Matthew Wilcox <matthew@....cx>
Cc:	David Chinner <dgc@....com>, linux-kernel@...r.kernel.org,
	Stephen Rothwell <sfr@...b.auug.org.au>
Subject: Re: Announce: Semaphore-Removal tree

On Wed, Apr 30, 2008 at 04:06:13AM -0600, Matthew Wilcox wrote:
> On Tue, Apr 29, 2008 at 10:09:30AM +1000, David Chinner wrote:
> > > 2. l_flushsema
> > > 
> > > This seems to be a completion.  ie you're using it to wait for the log
> > > to be flushed.
> > 
> > Yes, that could probably be a completion. I'm assuming that a completion
> > can handle several thousand waiting processes, right?
> 
> By the way ... is it common that you get several thousand waiting
> processes?  I ask because you wake them all up, then the herd thunders
> into the l_icloglock spinlock.  Or is this a worst-case scenario that
> happens once in a blue moon?

I'm thinking of a certain 2048p machine at NASA where they run
MPI jobs that do synchronised exit() calls with about 6-7 open
file descriptors that all run into ->release at the same time
and try to do EOF truncation transcations all at the same time.

We get the first 100-200 processes filling all the log buffers
and forcing them to disk, and then the rest start waiting on
the flush sema.

Given that we're already pushing 4096p support for the next gen
machines, this problem isn't going to get any better...

> If l_flushsema does typically get more than one waiter, we can instead
> wake the waiters one at a time.

The current code does them one at a time, but in such a way that is
not any better than a thundering herd. wake_up_all() is probably
a more efficient thundering herd as it doesn't require picking up
and dropping a spinlock for every task being woken.

I think that we need to redesign this code to prevent the thundering
herd problem - it's not problem solvable by just changing the wakeup
call. Besides, the evidence points to the fact that a thundering
herd on the flush sema is not the limiting factor in thoughput.
There's still several layers of the onion to peel before
we get to that point.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/