linux-kernel - Re: [PATCH RFC] time: validate watchdog clocksource using second best candidate

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <6fd42b2b-e29a-1fd6-03d1-e9da9192e6c5@yandex-team.ru>
Date:   Wed, 23 Oct 2019 13:18:34 +0300
From:   Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Stephen Boyd <sboyd@...nel.org>,
        John Stultz <john.stultz@...aro.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Andy Lutomirski <luto@...nel.org>,
        Arseny Smalyuk <smalukav@...dex-team.ru>
Subject: Re: [PATCH RFC] time: validate watchdog clocksource using second best
 candidate

On 20/05/2019 18.04, Konstantin Khlebnikov wrote:
> On 18.05.2019 21:26, Thomas Gleixner wrote:
>> On Sat, 18 May 2019, Konstantin Khlebnikov wrote:
>>
>>> On 18.05.2019 18:17, Thomas Gleixner wrote:
>>>> On Wed, 15 May 2019, Konstantin Khlebnikov wrote:
>>>>
>>>>> Timekeeping watchdog verifies doubtful clocksources using more reliable
>>>>> candidates. For x86 it likely verifies 'tsc' using 'hpet'. But 'hpet'
>>>>> is far from perfect too. It's better to have second opinion if possible.
>>>>>
>>>>> We're seeing sudden jumps of hpet counter to 0xffffffff:
>>>>
>>>> On which kind of hardware? A particular type of CPU or random ones?
>>>
>>> In general this is very rare event.
>>>
>>> This exact pattern have been seen ten times or so on several servers with
>>> Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz
>>> (this custom built platform with chipset Intel C610)
>>>
>>> and haven't seen for previous generation
>>> Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
>>> (this is another custom built platform)
>>
>> Same chipset? Note the HPET is part of the chipset not of the CPU.
> 
> Almost the same. Intel C600.
> 
>>
>>> Link was in patch: https://lore.kernel.org/patchwork/patch/667413/
>>
>> Hmm. Not really helpful either.
>>
>>>>> This patch uses second reliable clocksource as backup for validation.
>>>>> For x86 this is usually 'acpi_pm'. If watchdog and backup are not consent
>>>>> then other clocksources will not be marked as unstable at this iteration.
>>>>
>>>> The mess you add to the watchdog code is unholy and that's broken as there
>>>> is no guarantee for acpi_pm (or any other secondary watchdog) being
>>>> available.
>>>
>>> ACPI power management timer is a pretty standard x86 hardware.
>>
>> Used to be.
>>
>>> But my patch should work for any platform with any second reliable
>>> clocksource.
>>
>> Which is close to zero if PM timer is not exposed.
>>
>>> If there is no second clocksource my patch does noting:
>>> watchdog_backup stays NULL and backup_consent always true.
>>
>> That still does not justify the extra complexity for a few custom built
>> systems.
> 
>  >
>  > Aside of that this leaves the HPET in a half broken state. HPET is not only
>  > used as a clock event device it's also exposed by HPET device. So no, if we
>  > figure out that HPET is broken on some platforms we have to blacklist and
>  > disable it completely and not just duct tape the place which exposes the
>  > wreckage.
>  >
> 
> If re-reading helps then HPET is fine.
> This is temporary failure, probably bus issue.
> 
> 
> I'll add re-reading with debug logging and try to collect more information this year.

Good news, everyone! We've found conditions when HPET counter returns '-1'.

clocksource: timekeeping watchdog on CPU21: Marking clocksource 'tsc' as unstable because the skew is too large:
clocksource:                       'hpet' wd_now: ffffffff wd_last: 40bc1ee3 mask: ffffffff
clocksource:                       'tsc' cs_now: 919b39935acdaa cs_last: 919b3957c0ec24 mask: ffffffffffffffff
clocksource: Switched to clocksource hpet

This happens when user-space does inappropriate access to directly mapped HPET.
For example dumping whole "vvar" area: memcpy(buf, addr("vvar"), 0x3000).

In our case sequence was triggered by bug in "atop" crashed at "\n" in task comm. =)

In upstream everything is fine. Direct access to HPET was sealed in 4.7 (we seen bug in 4.4)
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1ed95e52d902035e39a715ff3a314a893a96e5b7

Kudos to Arseny Smalyuk for (accidental) reproducing and investigation.

---

#include <stdio.h>
#include <string.h>

char tmp[0x3000];

int main() {
     void* vvar = NULL;
     FILE* fp = fopen("/proc/self/maps", "r");

     char buf[4096];
     while (fgets(buf, 4096, fp)) {
         size_t slen = strlen(buf);
         if (slen > 0 && buf[slen - 1] == '\n') {
             --slen;
         }
         if (slen > 6 && !memcmp(buf + slen - 6, "[vvar]", 6)) {
             sscanf(buf, "%p", &vvar);
             break;
         }
     }

     fclose(fp);

     memcpy(tmp, vvar, 0x3000);
     return 0;
}