lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 29 May 2008 12:13:54 +0200 From: Jens Axboe <jens.axboe@...cle.com> To: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com> Cc: Alexey Dobriyan <adobriyan@...il.com>, torvalds@...l.org, Andrew Morton <akpm@...ux-foundation.org>, linux-kernel@...r.kernel.org Subject: Re: 2.6.26-rc4: RIP __call_for_each_cic+0x20/0x50 On Thu, May 29 2008, Paul E. McKenney wrote: > On Thu, May 29, 2008 at 08:42:02AM +0200, Jens Axboe wrote: > > On Thu, May 29 2008, Jens Axboe wrote: > > > > But one additional question... > > > > > > > > static void cfq_cic_free_rcu(struct rcu_head *head) > > > > { > > > > struct cfq_io_context *cic; > > > > > > > > cic = container_of(head, struct cfq_io_context, rcu_head); > > > > > > > > kmem_cache_free(cfq_ioc_pool, cic); > > > > elv_ioc_count_dec(ioc_count); > > > > > > > > if (ioc_gone && !elv_ioc_count_read(ioc_count)) > > > > complete(ioc_gone); > > > > } > > > > > > > > Suppose that a pair of tasks both execute the elv_ioc_count_dec() > > > > at the same time, so that all counters are now zero. Both then > > > > find that there is still an ioc_gone, and that the count is > > > > now zero. One of the tasks invokes complete(ioc_gone). This > > > > awakens the corresponding cfq_exit(), which now returns, getting > > > > rid of its stack frame -- and corrupting the all_gone auto variable > > > > that ioc_gone references. > > > > > > > > Now the second task gets a big surprise when it tries to invoke > > > > complete(ioc_gone). > > > > > > > > Or is there something else that I am missing here? > > > > > > No, I think that's a problem spot as well. To my knowledge, nobody has > > > ever hit that. The anticipatory scheduler has the same code. > > > > > > What we want to avoid here is making cfq_cic_free_rcu() a lot more > > > expensive, which is why the elv_ioc_count_read() is behind that > > > ioc_gone check. I'll need to think a bit on how to handle that > > > better :-) > > > > So how about this? Add a spinlock for checking and clearing ioc_gone > > back to NULL. It doesn't matter if we make the ioc_gone != NULL > > case a little more expensive, as it will only happen on cfq-iosched > > module unload. And it seems the clearest way of making this safe. > > The last hunk should really not be necessary, as ioc_gone wont be > > set back to NULL before wait_for_completion() is entered. > > Looks better! I do have one scenario that seems troublesome, but > it should be easy to fix, see below. (Assuming it really is a > problem, that is...) > > Thanx, Paul > > > An identical patch is needed in AS as well. > > > > diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c > > index d01b411..32aa367 100644 > > --- a/block/cfq-iosched.c > > +++ b/block/cfq-iosched.c > > @@ -48,6 +48,7 @@ static struct kmem_cache *cfq_ioc_pool; > > > > static DEFINE_PER_CPU(unsigned long, ioc_count); > > static struct completion *ioc_gone; > > +static DEFINE_SPINLOCK(ioc_gone_lock); > > > > #define CFQ_PRIO_LISTS IOPRIO_BE_NR > > #define cfq_class_idle(cfqq) ((cfqq)->ioprio_class == IOPRIO_CLASS_IDLE) > > @@ -1177,8 +1178,19 @@ static void cfq_cic_free_rcu(struct rcu_head *head) > > kmem_cache_free(cfq_ioc_pool, cic); > > elv_ioc_count_dec(ioc_count); > > > > - if (ioc_gone && !elv_ioc_count_read(ioc_count)) > > - complete(ioc_gone); > > + if (ioc_gone) { > > + /* > > + * CFQ scheduler is exiting, grab exit lock and check > > + * the pending io context count. If it hits zero, > > + * complete ioc_gone and set it back to NULL > > + */ > > Suppose that at this point some other CPU does the last complete(). > They have set ioc_gone to NULL, so everything is fine. But suppose > that in the meantime, some other CPU sets up a cfq and then starts > tearing it down. Then ioc_gone would be non-NULL, and we would cause > this new teardown to end prematurely. > > If this is a real problem, one way to get around it is to have a > generation number. We capture this before doing the elv_ioc_count_dec() > (alas, with a memory barrier between the capture and the elv_ioc_count_dec()), > and then check it under the lock. If it has changed, we know someone else > has already done the awakening for us. Increment the generation number > in the same place that ioc_gone is set to NULL. > > Seem reasonable? This isn't a problem, since cfq_exit() cannot be called before all block queues in the system have been detached from CFQ. cfq_exit() calls elv_unregister() before setting ioc_gone, so when elv_unregister() has returned, CFQ is in its own little world. Do we need an smp_wmb() between elv_unregister() and the ioc_gone assignment to ensure this ordering as well? IIRC, the spin_lock and spin_unlock in elv_unregister() isn't enough to guarentee this. We are really down to splitting hairs now, but better safe than sorry :-) -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists