linux-kernel - Re: crash in 3.12.51 (likely in 3.12.52 as well) in timer code

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1454588264.3407.142.camel@gmail.com>
Date:	Thu, 04 Feb 2016 13:17:44 +0100
From:	Mike Galbraith <umgwanakikbuti@...il.com>
To:	Nikolay Borisov <kernel@...p.com>,
	"Linux-Kernel@...r. Kernel. Org" <linux-kernel@...r.kernel.org>
Cc:	Jiri Slaby <jslaby@...e.cz>, Oleg Nesterov <oleg@...hat.com>,
	tglx@...utronix.de,
	SiteGround Operations <operations@...eground.com>
Subject: Re: crash in 3.12.51 (likely in 3.12.52 as well) in timer code

On Thu, 2016-02-04 at 13:51 +0200, Nikolay Borisov wrote:
> 
> On 02/04/2016 01:32 PM, Mike Galbraith wrote:
> > On Wed, 2016-02-03 at 12:58 +0200, Nikolay Borisov wrote:
> > > 
> > > So in this case the prev/next entries do not look like corrupted,
> > > whereas
> > > when manipulating the list inside detach_timer they do. This is
> > > really
> > > odd, any ideas how to further debug this?
> > 
> > Suspiciously similar to https://lkml.org/lkml/2016/2/4/247
> 
> Right, I've been cursory following this thread but I was left with the
> impression this only occurs on machines where the CPU can go offline,
> currently the server on which this happened should never offline any of
> its CPUs since the power management is disabled (though I will have to
> double check this).

AFAIU, hotplug isn't required, only mod_delayed_work() being called
from a different CPU than where the timer was born, migrating it at a
bad time.

> On a different note - is there a way to safely reproduce this so I can
> test the suggested fix by Thomas?

Hm, write a module to beat mod_delayed_work() to pulp with a NR_CPUS
horde, and run it in a vm where you don't care about shrapnel?

	-Mike