lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150214083555.GA30176@opentech.at>
Date:	Sat, 14 Feb 2015 09:35:55 +0100
From:	Nicholas Mc Guire <der.herr@...r.at>
To:	Oleg Nesterov <oleg@...hat.com>
Cc:	Davidlohr Bueso <dave@...olabs.net>, paulmck@...ux.vnet.ibm.com,
	linux-kernel@...r.kernel.org, waiman.long@...com,
	peterz@...radead.org, raghavendra.kt@...ux.vnet.ibm.com
Subject: Re: BUG: spinlock bad magic on CPU#0, migration/0/9

On Fri, 13 Feb 2015, Oleg Nesterov wrote:

> On 02/13, Nicholas Mc Guire wrote:
> >
> > On Thu, 12 Feb 2015, Oleg Nesterov wrote:
> >
> > > Nicholas, sorry, I sent the patch but forgot to CC you.
> > > See https://lkml.org/lkml/2015/2/12/587
> > >
> > > And please note that "completion" was specially designed to guarantee
> > > that complete() can't play with this memory after wait_for_completion/etc
> > > returns.
> > >
> >
> > hmmm.... I guess that "falling out of context" can happen in a number of cases
> > with completion - any of the timeout/interruptible variants e.g:
> >
> > 	void xxx(void)
> > 	{
> > 		struct completion c;
> >
> > 		init_completion(&c);
> >
> > 		expose_this_completion(&c);
> >
> > 		wait_for_completion_timeout(&c,A_FEW_JIFFIES);
> > 	}
> >
> > and if the other side did not call complete() within A_FEW_JIFFIES then
> > it would result in the same failure - I don't think the API can prevent
> > this type of bug.
> 
> Yes sure, but in this case the user of wait_for_completion_timeout() should
> blame itself, it is simply buggy.
> 
> > Tt has to be ensured by additional locking
> 
> Yes, but
> 
> > drivers/misc/tifm_7xx1.c:tifm_7xx1_resume() resolve this issue by resetting
> > the completion to NULL and testing for !NULL before calling complete()
> > with appropriate locking protection access.
> 
> I don't understand this code, I can be easily wrong. but at first glance it
> doesn't need completion at all. Exactly because it relies on the additional
> fm->lock. ->finish_me could be "task_struct *", the tifm_7xx1_resume() could
> simply do schedule_timeout(), tifm_7xx1_isr() could do wake_up_process().
> Nevermind, this is off-topic and most probably I misread this code.
>

this is unfortunately true for a few other places as well - so the problem of
going out of scope with the _timeout/interruptible variants is quite general
and there is no clean solution. you are right that its the users code that is
buggy if the struct completion drops out of context - was jsut thinking if it
were not a resonable extension of the competion API to eliminate that need to
mess with locks to resolve this by adding a caccelation mechanism that would
resolve this at the API level. 

Basically if you call wait_for_completion_timeout and the timeout condition
occures you always need some way of notifying the completing end that it 
should no longer call complete()/complete_all().

> > Never the less of course the proposed change in completion_done() was a bug -
> > many thanks for catching that so quickly !
> 
> OK, perhaps you can ack the fix I sent?
>

the only question I still have is that there would be no matching
smp_wmb() to the smp_rmb() you are using (atleast I did not figure out where).


thx!
hofrat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ