lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130318190616.GC3042@htj.dyndns.org>
Date:	Mon, 18 Mar 2013 12:06:16 -0700
From:	Tejun Heo <tj@...nel.org>
To:	Steven Rostedt <rostedt@...dmis.org>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	RT <linux-rt-users@...r.kernel.org>,
	Clark Williams <clark@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: workqueue code needing preemption disabled

On Mon, Mar 18, 2013 at 02:57:30PM -0400, Steven Rostedt wrote:
> I like the theory, but it has one flaw. I agree that the update should
> be wrapped in preempt_disable() but since this bug happens on the same
> CPU, the state of the list will be the same when it was preempted to
> when it bugged. That said:
> 
> static inline int list_empty(const struct list_head *head)
> {
> 	return head->next == head;
> }

Dang... right.  For some reason, I was thinking it was doing
head->next == head->prev.

> That means when the task was preempted, head->next will either be
> pointing to the next element or back to the list head. Which means if we
> get preempted while updating the list, it will either see the head->next
> == head or head->next == the next element.
> 
> first_worker() returns list_first_entry() which returns head->next. I
> can't see how it would see the list_head and have list_empty() return
> false.

Me neither.  Unfortunately, I'm out of ideas at the moment.
Hmm... last year, there was a similar issue, I think it was in AMD
cpufreq, which was caused by work function doing
set_cpus_allowed_ptr(), so the idle worker was on the correct CPU but
the one issuing local wake up was on the wrong one.  It could be that
there's another such usage in kernle which doesn't trigger easily w/o
RT.  As preemption doesn't trigger concurrency management wakeup, as
long as such user doesn't do something explicitly blocking, upstream
would be fine as long as it restores affinity before finishing but in
RT spinlocks become mutexes and can trigger local wakeups, so...

Anyways, having a crashdump would go a long way towards identifying
what's going on.  All we need to know are the work function which was
being executed, whether the worker was on the right CPU and which
worker it was trying to wake up.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ