linux-kernel - Re: [PATCH] acpi_pm: Reduce PMTMR counter read contention

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <0a0d304a-3612-879a-435e-27336e1b67ce@oracle.com>
Date:   Thu, 28 Feb 2019 11:33:10 +0800
From:   Zhenzhong Duan <zhenzhong.duan@...cle.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     linux-kernel@...r.kernel.org, x86@...nel.org,
        Daniel Lezcano <daniel.lezcano@...aro.org>,
        Waiman Long <longman@...hat.com>,
        Srinivas Eeda <srinivas.eeda@...cle.com>, kin.cho@...cle.com,
        linux_lkml_grp@...cle.com
Subject: Re: [PATCH] acpi_pm: Reduce PMTMR counter read contention

On 2019/2/18 11:48, Zhenzhong Duan wrote:
> On 2019/2/11 5:08, Thomas Gleixner wrote:
>> On Sat, 2 Feb 2019, Zhenzhong Duan wrote:
>>> On 2019/1/31 22:26, Thomas Gleixner wrote:
>>
>>>>>> I'm not against the change per se, but I really want to understand
>>>>>> why we need all the complexity for something which should never be
>>>>>> used in a real world deployment.
>>>>>>
>>>>> Hmm, it's a strong word of "never be used". Customers may happen to
>>>>> use nohpet(sanity test?) and report bug to us. Sometimes they does
>>>>> report a bug that reproduce with their customed config. There may
>>>>> also be BIOS setting HPET disabled.
>>>>
>>>> And because the customer MAY do completely nonsensical things (and 
>>>> there
>>>> are a lot more than the HPET) the kernel has to handle all of them?
>>>
>>> Ok, then. I don't have more suggestion to convince you.
>>
>> You give up too fast :)
> 
> Ah, because I thought of a simple fix.
>>
>> The point is, that we really want proper justifications for changes like
>> this. Some 'may, could and more handwaving' simply does not cut it.
>>
>> So if you can just describe a realistic scenario, which does not involve
>> thoughtless flipping of BIOS options, then this becomes way more 
>> palatable.
> 
> I indeed don't see a realistic scenario in a product env needing to use 
> nohpet. My only justification is now that we have nohpet as kernel 
> parameter, we should fix the softlockup in large machines for enterprise 
> use.
>>
>>> I just think of a simple fix as below. I think it will work for both 
>>> hpet
>>> and pmtmr. We will test it when the env is available.
>>
>>> --- a/kernel/time/timekeeping.c
>>> +++ b/kernel/time/timekeeping.c
>>> @@ -1353,6 +1353,7 @@ static int change_clocksource(void *data)
>>>
>>>          write_seqcount_end(&tk_core.seq);
>>>          raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
>>> +       tick_clock_notify();
>>>
>>>          return 0;
>>>   }
>>> @@ -1371,7 +1372,6 @@ int timekeeping_notify(struct clocksource *clock)
>>>          if (tk->tkr_mono.clock == clock)
>>>                  return 0;
>>>          stop_machine(change_clocksource, clock, NULL);
>>> -       tick_clock_notify();
>>>          return tk->tkr_mono.clock == clock ? 0 : -1;
>>>   }
>>
>> This won't resolve the concurrency issues of HPET or PMTIMER in any
>> way.
> 
> Just got chance to test and Kin confirmed it fix the softlockup of 
> PMTMR(with nohpet) and HPET(without nohpet, revert previous hpet commit) 
> at bootup stage.
> 
> My understandig is, at bootup stage tick device is firstly initialized 
> in periodic mode and then switch to one-shot mode. In periodic mode 
> clock event interrupt is triggered every 1ms(HZ=1000), contention in 
> HPET or PMTIMER exceeds 1ms and delayed the clock interrupt. Then CPUs 
> continue to process interrupt one by one without a break, 
> tick_clock_notify() have no chance to be called and we never switch to 
> one-shot mode.
> 
> In one-shot mode, the contention is still there but next event is always 
> set with a future value. We may missed some ticks, but the timer code is 
> smart enough to pick up those missed ticks.
> 
> By moving tick_clock_notify() in stop_machine, kernel changes to 
> one-shot mode early before the contention accumulate and lockup system.
> 
>> Instead it breaks the careful orchestrated mechanism of clocksource
>> change.
> 
> Sorry, I didn't get a idea how it breaks, tick_clock_notify() is a 
> simple function setting bitmask in percpu variable. Could you explain a 
> bit?

Hi Thomas,

May I have your further comments? I think applying a simple patch to fix 
both hpet and pmtmr softlockup is better?

Thanks
Zhenzhong