[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6fd42b2b-e29a-1fd6-03d1-e9da9192e6c5@yandex-team.ru>
Date: Wed, 23 Oct 2019 13:18:34 +0300
From: Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Stephen Boyd <sboyd@...nel.org>,
John Stultz <john.stultz@...aro.org>,
LKML <linux-kernel@...r.kernel.org>,
Andy Lutomirski <luto@...nel.org>,
Arseny Smalyuk <smalukav@...dex-team.ru>
Subject: Re: [PATCH RFC] time: validate watchdog clocksource using second best
candidate
On 20/05/2019 18.04, Konstantin Khlebnikov wrote:
> On 18.05.2019 21:26, Thomas Gleixner wrote:
>> On Sat, 18 May 2019, Konstantin Khlebnikov wrote:
>>
>>> On 18.05.2019 18:17, Thomas Gleixner wrote:
>>>> On Wed, 15 May 2019, Konstantin Khlebnikov wrote:
>>>>
>>>>> Timekeeping watchdog verifies doubtful clocksources using more reliable
>>>>> candidates. For x86 it likely verifies 'tsc' using 'hpet'. But 'hpet'
>>>>> is far from perfect too. It's better to have second opinion if possible.
>>>>>
>>>>> We're seeing sudden jumps of hpet counter to 0xffffffff:
>>>>
>>>> On which kind of hardware? A particular type of CPU or random ones?
>>>
>>> In general this is very rare event.
>>>
>>> This exact pattern have been seen ten times or so on several servers with
>>> Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz
>>> (this custom built platform with chipset Intel C610)
>>>
>>> and haven't seen for previous generation
>>> Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
>>> (this is another custom built platform)
>>
>> Same chipset? Note the HPET is part of the chipset not of the CPU.
>
> Almost the same. Intel C600.
>
>>
>>> Link was in patch: https://lore.kernel.org/patchwork/patch/667413/
>>
>> Hmm. Not really helpful either.
>>
>>>>> This patch uses second reliable clocksource as backup for validation.
>>>>> For x86 this is usually 'acpi_pm'. If watchdog and backup are not consent
>>>>> then other clocksources will not be marked as unstable at this iteration.
>>>>
>>>> The mess you add to the watchdog code is unholy and that's broken as there
>>>> is no guarantee for acpi_pm (or any other secondary watchdog) being
>>>> available.
>>>
>>> ACPI power management timer is a pretty standard x86 hardware.
>>
>> Used to be.
>>
>>> But my patch should work for any platform with any second reliable
>>> clocksource.
>>
>> Which is close to zero if PM timer is not exposed.
>>
>>> If there is no second clocksource my patch does noting:
>>> watchdog_backup stays NULL and backup_consent always true.
>>
>> That still does not justify the extra complexity for a few custom built
>> systems.
>
> >
> > Aside of that this leaves the HPET in a half broken state. HPET is not only
> > used as a clock event device it's also exposed by HPET device. So no, if we
> > figure out that HPET is broken on some platforms we have to blacklist and
> > disable it completely and not just duct tape the place which exposes the
> > wreckage.
> >
>
> If re-reading helps then HPET is fine.
> This is temporary failure, probably bus issue.
>
>
> I'll add re-reading with debug logging and try to collect more information this year.
Good news, everyone! We've found conditions when HPET counter returns '-1'.
clocksource: timekeeping watchdog on CPU21: Marking clocksource 'tsc' as unstable because the skew is too large:
clocksource: 'hpet' wd_now: ffffffff wd_last: 40bc1ee3 mask: ffffffff
clocksource: 'tsc' cs_now: 919b39935acdaa cs_last: 919b3957c0ec24 mask: ffffffffffffffff
clocksource: Switched to clocksource hpet
This happens when user-space does inappropriate access to directly mapped HPET.
For example dumping whole "vvar" area: memcpy(buf, addr("vvar"), 0x3000).
In our case sequence was triggered by bug in "atop" crashed at "\n" in task comm. =)
In upstream everything is fine. Direct access to HPET was sealed in 4.7 (we seen bug in 4.4)
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1ed95e52d902035e39a715ff3a314a893a96e5b7
Kudos to Arseny Smalyuk for (accidental) reproducing and investigation.
---
#include <stdio.h>
#include <string.h>
char tmp[0x3000];
int main() {
void* vvar = NULL;
FILE* fp = fopen("/proc/self/maps", "r");
char buf[4096];
while (fgets(buf, 4096, fp)) {
size_t slen = strlen(buf);
if (slen > 0 && buf[slen - 1] == '\n') {
--slen;
}
if (slen > 6 && !memcmp(buf + slen - 6, "[vvar]", 6)) {
sscanf(buf, "%p", &vvar);
break;
}
}
fclose(fp);
memcpy(tmp, vvar, 0x3000);
return 0;
}
Powered by blists - more mailing lists