lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 18 Mar 2013 14:23:56 -0400
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Tejun Heo <tj@...nel.org>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	RT <linux-rt-users@...r.kernel.org>,
	Clark Williams <clark@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: workqueue code needing preemption disabled

On Mon, 2013-03-18 at 09:43 -0700, Tejun Heo wrote:
> Hello, Steven.
> 
> On Mon, Mar 18, 2013 at 12:30:43PM -0400, Steven Rostedt wrote:
> > If you happen to know the critical areas that require preemption to be
> > disabled for real, we can encapsulate them with:
> > 
> > 	preempt_disable_rt();
> > 
> > 	preempt_enable_rt();
> > 
> > These are currently only in the -rt patch, but it annotates locations
> > that require preemption to be disabled even when -rt converts spin_locks
> > into mutexes. These obviously can not contain spin_locks() as
> > spin_locks() can block and schedule out.
> 
> Making gcwq locks disable preemption would be much safer / easier, but
> if that's not desirable, anything touching gcwq->idle_list would be a
> good place to start - worker_enter_idle() and worker_leave_idle().
> Hmmm... ignoring CPU hotplug, I think those two might just do it.
> Give it a try?  How reproducible is the problem?
> 

Hmm, the issue is that a "use to be" idle thread got migrated, and is
now being woken up by another worker. What can cause an established
worker to migrate without HOTPLUG being active?

Thomas,

I'm thinking that we should also modify the scheduler for -rt:

        if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
                if (unlikely(signal_pending_state(prev->state, prev))) {
                        prev->state = TASK_RUNNING;
                } else {
                        deactivate_task(rq, prev, DEQUEUE_SLEEP);
                        prev->on_rq = 0;

                        /*
                         * If a worker went to sleep, notify and ask workqueue
                         * whether it wants to wake up a task to maintain
                         * concurrency.
                         */
                        if (prev->flags & PF_WQ_WORKER) {
                                struct task_struct *to_wakeup;

                                to_wakeup = wq_worker_sleeping(prev, cpu);
                                if (to_wakeup)
                                        try_to_wake_up_local(to_wakeup);
                        }
                }
                switch_count = &prev->nvcsw;
        }


The code calls another worker when the previous worker sleeps,
presumably on IO or something. But in -rt, it could be sleeping on the
gcwq->lock itself, and this could cause many more wakeups. Easily where
the task being woken up will try to grab the same lock and sleep again.

Perhaps we should check:

	if (prev->flags & PF_WQ_WORKER && !prev->saved_state)

To keep the worker thread from waking up other workers just because it
blocked on a sleeping spin lock.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ