linux-kernel - Re: [PATCH] Fix a complex race in hrtimer code.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <AANLkTimd=Zu85hdPLzZ8dpGJKpx=0SWJTrniMyRVstdz@mail.gmail.com>
Date:	Tue, 12 Oct 2010 10:38:45 -0700
From:	Salman Qazi <sqazi@...gle.com>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH] Fix a complex race in hrtimer code.

On Tue, Oct 12, 2010 at 9:54 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
> On Tue, 12 Oct 2010, Salman Qazi wrote:
>> On Tue, Oct 12, 2010 at 1:49 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
>> > On Mon, 11 Oct 2010, Salman Qazi wrote:
>> >> /* There are other issues, like deadlocks between multiple hrtimer_start observed
>> >>  * calls, at least in 2.6.34 that this lock works around.  Will look into
>> >>  * those later.
>> >
>> > Well, we don't have to work around callsites not serializing themself
>> > in the core code, right ?
>>
>> I assumed that the semantics were that hrtimer_starts are serialized
>> with respect to each other and with respect to cancels.   You seem to
>> disagree.
>
> Yes, I disagree. The code makes sure that cancel/start does not
> conflict with a running callback, but it's not responsible for random
> code fiddling with the same timer, really.

Sounds reasonable.

>
> The outcome of random start/cancel operations on two cpus of the same
> timer is just unpredictible, so where is the point of caring about
> that in the core code ?
>
>> In any case, I have to rerun that test without this lock with the
>> patch present.  It's possible that it was a symptom of the same bug
>> that we just didn't observe in production.
>
> Which bug did you observe in production and what's the code which is
> triggering this?

The initial observation in production that there is a lockup (noticed
through the NMI watchdog) in rb_insert_color when running certain
networking workloads that use the net cgroup subsystem.  Further down
in the stack trace, we notice that hrtimer for qdisc_watchdog (see:
net/sch_api.c) is the one that is being started when this happened and
then I remembered that I had already fixed something similar in our
tree for 2.6.26 and basically forgotten about that patch.   So, I went
back and dug out this patch and it fixed the problem.  Looking back,
the symptom was almost identical in 2.6.26, with the exception that it
would crash in rb_erase instead of locking up in rb_insert_color.

I wrote the above test to mimic the behaviour of the qdisc_watchdog
timer (except taken to extremes): long running callback function and
returning with NORESTART and it reproduces the symptom reliably.

>
> Thanks,
>
>        tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/