lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 26 Apr 2007 14:59:18 +0200
From:	Jarek Poplawski <jarkao2@...pl>
To:	Oleg Nesterov <oleg@...sign.ru>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Ingo Molnar <mingo@...e.hu>, linux-kernel@...r.kernel.org,
	David Howells <dhowells@...hat.com>
Subject: Re: Fw: [PATCH -mm] workqueue: debug possible endless loop in cancel_rearming_delayed_work

On Wed, Apr 25, 2007 at 06:47:59PM +0400, Oleg Nesterov wrote:
> On 04/25, Oleg Nesterov wrote:
> >
> > On 04/25, Jarek Poplawski wrote:
> > >
> > >                   Probably this is also possible without timer i.e.
> > >  with queue_work.
> > 
> > Yes, thanks. While adding cpu-hotplug check I forgot to add ->current_work
> > check, which is needed to actually implement this
> > 
> > 	> > Note that cancel_rearming_delayed_work() now can handle the works
> > 	> > which re-arm itself via queue_work(), not only queue_delayed_work().
> > 
> > part. I'll resend after fix.
> 
> Hm. But can't we do better? Looks like we don't need to check ->current_work,
> 
> 	void cancel_rearming_delayed_work(struct delayed_work *dwork)
> 	{
> 		struct work_struct *work = &dwork->work;
> 		struct cpu_workqueue_struct *cwq = get_wq_data(work);
> 		int done;

I don't understand, why you think cwq cannot be NULL here.

> 
> 		do {
> 			done = 1;
> 			spin_lock_irq(&cwq->lock);
> 
> 			if (!list_empty(&work->entry))
> 				list_del_init(&work->entry);

BTW, isn't needs_a_good_name needles after this and after del_timer positive?

> 			else if (test_and_set_bit(WORK_STRUCT_PENDING, work_data_bits(work)))
> 				done = del_timer(&dwork->timer)

If this runs while a work function is fired in run_workqueue,
it sets _PENDING bit, but if the work skips rearming, we have probably
endless loop, again.

> 
> 			spin_unlock_irq(&cwq->lock);
> 		} while (!done);
> 
> 		/*
> 		 * Nobody can clear WORK_STRUCT_PENDING. This means that the
> 		 * work can't be re-queued and the timer can't be re-started.
> 		 */
> 		needs_a_good_name(cwq->wq, work);
> 		work_clear_pending(work);
> 	}
> 
> Jarek, I didn't think much about this, just a new idea. I am posting this code
> in a hope you can review it while I sleep on this... CPU-hotplug is ignored for
> now. Note that this version doesn't need the change in run_workqueue().

It's very interesting proposal, but also hard to analyse - the locks
are taken and given away, and there is hard to forsee, when and where
the loop regains the lock again. It is something alike to the current
way, with some added measures: you try to shoot a work on the run,
while queued or timer_pending, plus the _PENDING flag set, so it seems,
there is some risk of longer than planed looping.

I have to look at this more, at home and, if something new, I'll write
tomorrow. So, the good news, is you should have enough sleep this time!

Cheers,
Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists