linux-kernel - Re: [PATCH] acpi_pm: Reduce PMTMR counter read contention

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8ffea578-5eb6-f479-7bd4-668df84d930f@oracle.com>
Date:   Mon, 18 Feb 2019 11:48:37 +0800
From:   Zhenzhong Duan <zhenzhong.duan@...cle.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     linux-kernel@...r.kernel.org, x86@...nel.org,
        Daniel Lezcano <daniel.lezcano@...aro.org>,
        Waiman Long <longman@...hat.com>,
        Srinivas Eeda <srinivas.eeda@...cle.com>, kin.cho@...cle.com
Subject: Re: [PATCH] acpi_pm: Reduce PMTMR counter read contention

On 2019/2/11 5:08, Thomas Gleixner wrote:
> On Sat, 2 Feb 2019, Zhenzhong Duan wrote:
>> On 2019/1/31 22:26, Thomas Gleixner wrote:
> 
>>>>> I'm not against the change per se, but I really want to understand
>>>>> why we need all the complexity for something which should never be
>>>>> used in a real world deployment.
>>>>>
>>>> Hmm, it's a strong word of "never be used". Customers may happen to
>>>> use nohpet(sanity test?) and report bug to us. Sometimes they does
>>>> report a bug that reproduce with their customed config. There may
>>>> also be BIOS setting HPET disabled.
>>>
>>> And because the customer MAY do completely nonsensical things (and there
>>> are a lot more than the HPET) the kernel has to handle all of them?
>>
>> Ok, then. I don't have more suggestion to convince you.
> 
> You give up too fast :)

Ah, because I thought of a simple fix.
> 
> The point is, that we really want proper justifications for changes like
> this. Some 'may, could and more handwaving' simply does not cut it.
> 
> So if you can just describe a realistic scenario, which does not involve
> thoughtless flipping of BIOS options, then this becomes way more palatable.

I indeed don't see a realistic scenario in a product env needing to use 
nohpet. My only justification is now that we have nohpet as kernel 
parameter, we should fix the softlockup in large machines for enterprise 
use.
> 
>> I just think of a simple fix as below. I think it will work for both hpet
>> and pmtmr. We will test it when the env is available.
> 
>> --- a/kernel/time/timekeeping.c
>> +++ b/kernel/time/timekeeping.c
>> @@ -1353,6 +1353,7 @@ static int change_clocksource(void *data)
>>
>>          write_seqcount_end(&tk_core.seq);
>>          raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
>> +       tick_clock_notify();
>>
>>          return 0;
>>   }
>> @@ -1371,7 +1372,6 @@ int timekeeping_notify(struct clocksource *clock)
>>          if (tk->tkr_mono.clock == clock)
>>                  return 0;
>>          stop_machine(change_clocksource, clock, NULL);
>> -       tick_clock_notify();
>>          return tk->tkr_mono.clock == clock ? 0 : -1;
>>   }
> 
> This won't resolve the concurrency issues of HPET or PMTIMER in any
> way.

Just got chance to test and Kin confirmed it fix the softlockup of 
PMTMR(with nohpet) and HPET(without nohpet, revert previous hpet commit) 
at bootup stage.

My understandig is, at bootup stage tick device is firstly initialized 
in periodic mode and then switch to one-shot mode. In periodic mode 
clock event interrupt is triggered every 1ms(HZ=1000), contention in 
HPET or PMTIMER exceeds 1ms and delayed the clock interrupt. Then CPUs 
continue to process interrupt one by one without a break, 
tick_clock_notify() have no chance to be called and we never switch to 
one-shot mode.

In one-shot mode, the contention is still there but next event is always 
set with a future value. We may missed some ticks, but the timer code is 
smart enough to pick up those missed ticks.

By moving tick_clock_notify() in stop_machine, kernel changes to 
one-shot mode early before the contention accumulate and lockup system.

> Instead it breaks the careful orchestrated mechanism of clocksource
> change.

Sorry, I didn't get a idea how it breaks, tick_clock_notify() is a 
simple function setting bitmask in percpu variable. Could you explain a bit?

Thanks
Zhenzhong