linux-kernel - Re: [git pull] x86 fixes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20090126202844.GB8867@elte.hu>
Date:	Mon, 26 Jan 2009 21:28:44 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	torvalds@...ux-foundation.org, linux-kernel@...r.kernel.org,
	tglx@...utronix.de, hpa@...or.com,
	Rusty Russell <rusty@...tcorp.com.au>,
	Mike Travis <travis@....com>
Subject: Re: [git pull] x86 fixes


* Andrew Morton <akpm@...ux-foundation.org> wrote:

> On Mon, 26 Jan 2009 20:59:10 +0100
> Ingo Molnar <mingo@...e.hu> wrote:
> 
> > * Andrew Morton <akpm@...ux-foundation.org> wrote:
> > 
> > > On Mon, 26 Jan 2009 20:20:16 +0100
> > > Ingo Molnar <mingo@...e.hu> wrote:
> > > 
> > > > 
> > > > * Andrew Morton <akpm@...ux-foundation.org> wrote:
> > > > 
> > > > > On Mon, 26 Jan 2009 18:17:23 +0100
> > > > > Ingo Molnar <mingo@...e.hu> wrote:
> > > > > 
> > > > > > Rusty Russell (2):
> > > > > >       ...
> > > > > >       work_on_cpu: Use our own workqueue.
> > > > > 
> > > > > wtf?
> > > > 
> > > > an x86 fix depends on it:
> > > > 
> > > > 7285908: cpufreq: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write
> > > > 
> > > 
> > > Right.  And these are currently under active (albeit rather slow) 
> > > discussion.
> > > 
> > > The changelogs suck, nobody can be assed actually telling us what the 
> > > bug is and the patches just casually toss yet another gaggle of kernel 
> > > threads into there.
> > 
> > One problem is that for example do_dbs_timer() [used both in the 
> > cpufreq_ondemand and cpufreq_conservative cpufreq drivers] gets queued 
> > into the generic kevent workqueue via schedule_work(). But 
> > work_on_cpu() needs to serialize on the worklet - i.e. it needs to do 
> > a flush_work() - and does this with the cpufreq lock held.
> 
> We've discovered many times that doing a workqueue flush with a lock 
> held is a bad idea.  If any callback takes that lock, or takes some 
> other lock which someone else takes while holding the original lock, 
> etc, we deadlock.

This is a bona fide deadlock in the _same_ callpath though - lockdep will 
catch it and it is straightforward to avoid.

The problem we had with work_on_cpu() was different: the "drive-by 
shooting" aspect of the keventd workqueue: it creates 'casual' (and hence 
hard to get right) dependencies between various, seemingly detached 
worklets.

Were were lucky that the tester was both willing to go through a dozen 
bisection steps to pin down the bug and also was kind enough to try 
countless rfc patches to figure it all out.

The only other maintainable way would be to remove work_on_cpu() - but 
it's a useful facility IMO.

> > So we have a 'worklet inversion' bug here - and this got reported as a 
> > hard to debug boot hang on some systems.
> > The root cause is that kevents is not that do_dbs_timer() uses 
> > schedule_work() - the root cause is that kevents workqueue is a too 
> > generic workqueue that is the union of all casual workqueue users in the 
> > kernel. That is fine (and it is its purpose) but it should not be used for 
> > core kernel facilities such as work_on_cpu() - precisely because doing so 
> > would limit that facility's genericity.
> 
> Yes, that's a point.
> 
> > This bug was always there but dormant until work_on_cpu() was used 
> > from a deep enough codepath.
> > 
> > So the solution here is to isolate work_on_cpu()s mechanisms from the 
> > 'misc' workqueue that schedule_work() deals with - this is what 
> > Rusty's patch does.
> 
> But it's still deadlockable, isn't it?  If you do a work_on_cpu() while 
> holding a lock, and the callback function takes that lock, deadlock? If 
> so, things didn't get much better?

Yes, but it's much different: the situation you describe is that if we 
think of work_on_cpu() as a 'remote function call' it can deadlock if a 
called function takes the same lock.

That's not news at all and developers should be careful not to take the 
same lock twice. work_on_cpu() does not change or extend that aspect of 
lock programming at all.

But the previous, fragile version of work_on_cpu() had a much more subtle 
dependency not related to the function that was called but related to 
work_on_cpu() _itself_ running in a context (the keventd workqueue 
threads) that have casual dependencies.

That's much harder to debug, much less intuitive, and also, IMO, a bug in 
the facility.

> > Your observation about there being too many workqueue threads is 
> > correct but this commit is IMO a valid use of the workqueue facility. 
> > This workqueue it only gets created on CONFIG_SMP so there cost is 
> > about ~10K RAM per CPU.
> 
> mm..  This is why I'd like to expend a little more effort trying to 
> avoid having to make this change.

work_on_cpu() clearly cannot be in the kevent workqueue. I can only see 
two other solutions:

 1) turn it into a singlethreaded workqueue. (Probably fine as all users 
    are not performance sensitive.)

 2) remove it altogether and convert all users to atomic call_on_cpu() 
    calls. Probably doable albeit a bit restrictive IMO.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/