linux-kernel - Re: [bisected] pre-3.16 regression on open() scalability

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140618010423.GW4669@linux.vnet.ibm.com>
Date:	Tue, 17 Jun 2014 18:04:23 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Andi Kleen <ak@...ux.intel.com>
Cc:	Dave Hansen <dave.hansen@...el.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Josh Triplett <josh@...htriplett.org>,
	"Chen, Tim C" <tim.c.chen@...el.com>,
	Christoph Lameter <cl@...ux.com>, peterz@...radead.org
Subject: Re: [bisected] pre-3.16 regression on open() scalability

On Tue, Jun 17, 2014 at 05:15:17PM -0700, Andi Kleen wrote:
> > It also ends up eating a new cacheline in a bunch of pretty hot paths.
> > It would be nice to be able to keep the fast path part of this as at
> > least read-only.
> > 
> > Could we do something (functionally) like the attached patch?  Instead
> > of counting cond_resched() calls, we could just specify some future time
> > by which we want have a quiescent state.  We could even push the time to
> > be something _just_ before we would have declared a stall.
> 
> I still think it's totally the wrong place. cond_resched() is in so
> many fast paths (every lock, every allocation). It just doesn't
> make sense to add non essential things like this to it.
> 
> I would be rather to just revert the original patch.

OK.  What would you suggest instead?  If all we do is to revert the
original patch, we once again end up with long-running in-kernel code
paths stalling the RCU grace period.  The cond_resched() calls sprinkled
through them once again won't help with this.

Or are you suggesting leveraging the now-deprecated set_need_resched()
so that the checks happen deeper in the scheduler?  Looks like grabbing
the offending CPU's task and doing set_tsk_need_resched() on that task
is the replacement.

CCing Peter Zijlstra for his thoughts on this.

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/