linux-kernel - Re: [PATCH] Timekeeping: Fix dead lock in update_wall

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <4e5ebad51003161958n6db293f0y6da99483bf96ada0@mail.gmail.com>
Date:	Wed, 17 Mar 2010 10:58:59 +0800
From:	Sonic Zhang <sonic.adi@...il.com>
To:	john stultz <johnstul@...ibm.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Linux Kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] Timekeeping: Fix dead lock in update_wall_time by correct 
	shift convertion.

On Wed, Mar 17, 2010 at 2:18 AM, john stultz <johnstul@...ibm.com> wrote:
> On Tue, 2010-03-16 at 18:33 +0800, sonic zhang wrote:
>> update_wall_time() runs into dead lock after kernel traps into kgdb and exits
>> per user's request some seconds layer. This is root caused to be wrong
>> calculation of maxshift in update_wall_time().
>>
>> The shift in update_wall_time() and logarithmic_accumulation() is
>> clock shift. In order to generate ntp_error and maxshift correctly,
>> shift convertion between clock and ntp should be done properly.
>
> Hmmm. I don't believe this patch is correct, as it redefines the units
> that ntp_error accumulates (which will cause problems in ntp correction)
>

What units does the ntp_error accumulates? NTP tick or clock source
tick? shift varable here is clock source shift.


> Could you provide some more details on how you triggered the issue (what
> hardware, and what clocksource was being used at the time), as well as
> the analysis you did that suggested this solution?
>
If you enabled KGDB over ethernet debugging in kernel hacking, you
will see the dead lock after you connect your gdb to kernel and set a
breakpoint, wait for 30 seconds and continue kernel. The direct cause
is the clock shift calculated after kernel resumes is bigger than the
maxshift calculated based on NTP tick_length. So, the loop "while
(offset >= timekeeper.cycle_interval)" never exists, because offset is
too big to be reduced properly before shift(limited to maxshift)
becomes negative value. The offset value in my test is 1553274374.

The hardware is blackfin and any hardware clock source, gptimer or core counter.

> It may be that we are hitting an overflow in the ntp_error value, which
> could possibly cause some strange clock steering. Not sure about a
> deadlock though.
>
> Can you trigger this issue when you replace:
>
>        offset = logarithmic_accumulation(offset, shift);
> with
>        offset = logarithmic_accumulation(offset, 0);
>
> ?

Yes, this workaround never dead loop because the shift is never
decreased to negative value and need about 3883 loops to finish when
offset is 1553274374.

>
> thanks
> -john
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/