lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 30 Aug 2013 17:48:54 -0700
From:	Stephen Boyd <sboyd@...eaurora.org>
To:	John Stultz <john.stultz@...aro.org>,
	Gerlando Falauto <gerlando.falauto@...mile.com>
CC:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Richard Cochran <richardcochran@...il.com>,
	Prarit Bhargava <prarit@...hat.com>,
	"Brunck, Holger" <Holger.Brunck@...mile.com>,
	"Longchamp, Valentin" <Valentin.Longchamp@...mile.com>,
	"Bigler, Stefan" <Stefan.Bigler@...mile.com>
Subject: Re: kernel deadlock

On 08/30/13 16:10, John Stultz wrote:
> On 08/30/2013 04:04 PM, Gerlando Falauto wrote:
>> Hi,
>>
>> sorry, it took me a while to narrow it down...
>>
>> On 08/30/2013 01:45 AM, John Stultz wrote:
>>> On 08/29/2013 01:56 PM, Falauto, Gerlando wrote:
>>>> Hi everyone,
>>>>
>>>> I ran into the deadlock situation reported at the bottom.
>>>> Actually, on my latest 3.10 kernel for some reason I don't get the
>>>> report (the kernel just hangs for some reason), so it took me quite
>>>> some
>>>> time to track it down.
>>>>
>>>> Once I figured the trigger to the machine hanging was adjtimex(), I
>>>> reverted everything (between 3.9 to 3.10) that was touching
>>>> kernel/time/timekeeping/timekeeping.c and kernel/time/ntp.c, I double
>>>> checked that indeed the problem was not happening anymore, and finally
>>>> started bisecting, landing on the following offending commit.
>>>> THEN, and ONLY THEN, did I get the &%""รง+"% deadlock report.
>>>>
>>>> Do you guys have any ideas what could be wrong and how to fix it?
>>> Thanks for the report!
>>>
>>> What exactly is your process for reproducing the issue?
>> Now (well, now...), it's quite easy.
>>
>> Three ingredients:
>>
>> 1) Kernel 3.10
>>
>> 2) Enable HRTICK
>>
>> diff --git a/kernel/sched/features.h b/kernel/sched/features.h
>> index 99399f8..294e3ca 100644
>> --- a/kernel/sched/features.h
>> +++ b/kernel/sched/features.h
>> @@ -41,7 +41,7 @@ SCHED_FEAT(WAKEUP_PREEMPTION, true)
>>   */
>>  SCHED_FEAT(ARCH_POWER, true)
>>
>> -SCHED_FEAT(HRTICK, false)
>> +SCHED_FEAT(HRTICK, true)
>>  SCHED_FEAT(DOUBLE_TICK, false)
>>  SCHED_FEAT(LB_BIAS, true)
>>
>> 3) Run the following:
>>
>> #include <stdio.h>
>> #include <sys/timex.h>
>>
>> int main(void)
>> {
>>     int i;
>>
>>     for (i = 0 ; ; i++) {
>>     struct timex adj = {};
>>     printf("%d\r", i);
>>     fflush(stdout);
>>     adjtimex(&adj);
>>     }
>>     return 0;
>> }
>>
>> Notice how:
>> 1) The original issue (with a bit more complicated scenario) was seen
>> on ARM and PowerPC platforms
>> 2) Under the above test conditions (on ARM) I *don't* get any deadlock
>> report printed, the machine just hangs
>> 3) The offending commit (below) I had found through a weird (manual)
>> process of reverting and re-reverting (where some commits could have
>> been reverted out of order), so I'm not 100% sure you'd come to the
>> same conclusions.
>>
>> commit 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40
>> Author: John Stultz <john.stultz@...aro.org>
>> Date:   Fri Mar 22 11:37:28 2013 -0700
>>
>>      timekeeping: Hold timekeepering locks in do_adjtimex and hardpps
>>
>> I'm not able to perform any further testing at this very moment, but
>> if needed, I can try bisecting again sometime next week, so to make an
>> even more reliable statement.
>>
>>

Just curious. Do you have this patch from 3.11 applied to your 3.10
kernel tree?

commit 971ee28cbd1ccd87b3164facd9359a534c1d2892
Author: Peter Zijlstra <peterz@...radead.org>
Date:   Fri Jun 28 11:18:53 2013 +0200

    sched: Fix HRTICK
   
    David reported that the HRTICK sched feature was borken; which was
enough
    motivation for me to finally fix it ;-)
   
    We should not allow hrtimer code to do softirq wakeups while holding
schedul
er
    locks. The hrtimer code only needs this when we accidentally try to
program
an
    expired time. We don't much care about those anyway since we have
the regula
r
    tick to fall back to.
   
    Reported-by: David Ahern <dsahern@...il.com>
    Tested-by: David Ahern <dsahern@...il.com>
    Signed-off-by: Peter Zijlstra <peterz@...radead.org>
    Link:
http://lkml.kernel.org/r/20130628091853.GE29209@dyad.programming.kicks
-ass.net
    Signed-off-by: Ingo Molnar <mingo@...nel.org>

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ