[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0aff3f62-a8a5-4358-ae3f-2ded339aface@zhaoxin.com>
Date: Wed, 5 Jun 2024 14:23:32 +0800
From: Tony W Wang-oc <TonyWWang-oc@...oxin.com>
To: Thomas Gleixner <tglx@...utronix.de>, Linus Torvalds
<torvalds@...ux-foundation.org>
CC: Dave Hansen <dave.hansen@...el.com>, <mingo@...hat.com>, <bp@...en8.de>,
<dave.hansen@...ux.intel.com>, <x86@...nel.org>, <hpa@...or.com>,
<keescook@...omium.org>, <tony.luck@...el.com>, <gpiccoli@...lia.com>,
<mat.jonczyk@...pl>, <rdunlap@...radead.org>,
<alexandre.belloni@...tlin.com>, <mario.limonciello@....com>,
<yaolu@...inos.cn>, <bhelgaas@...gle.com>, <justinstitt@...gle.com>,
<linux-kernel@...r.kernel.org>, <linux-hardening@...r.kernel.org>,
<CobeChen@...oxin.com>, <TimGuo@...oxin.com>, <LeoLiu-oc@...oxin.com>
Subject: Re: [PATCH] x86/hpet: Read HPET directly if panic in progress
On 2024/5/29 15:42, Thomas Gleixner wrote:
>
>
> [这封邮件来自外部发件人 谨防风险]
>
> Linus!
>
> On Tue, May 28 2024 at 16:22, Linus Torvalds wrote:
>> On Tue, 28 May 2024 at 15:12, Thomas Gleixner <tglx@...utronix.de> wrote:
>> I see the smiley, but yeah, I don't think we really care about it.
>
> Indeed. But the same problem exists on other architectures as
> well. drivers/clocksource alone has 4 examples aside of i8253
>
>>> 1) Should we provide a panic mode read callback for clocksources which
>>> are affected by this?
>>
>> The current patch under discussion may be ugly, but looks workable.
>> Local ugliness isn't necessarily a show-stopper.
>>
>> So if the HPET is the *only* case which has this situation, I vote for
>> just doing the ugly thing.
>>
>> Now, if *other* cases exist, and can't be worked around in similar
>> ways, then that argues for a more "proper" fix.
>>
>> And no, I don't think i8253 is a strong enough argument. I don't
>> actually believe you can realistically find a machine that doesn't
>> have HPET or the TSC and really falls back on the i8253 any more. And
>> if you *do* find hw like that, is it SMP-capable? And can you find
>> somebody who cares?
>
> Probably not.
>
>>> 2) Is it correct to claim that a MCE which hits user space and ends up in
>>> mce_panic() is still just a regular exception or should we upgrade to
>>> NMI class context when we enter mce_panic() or even go as far to
>>> upgrade to NMI class context for any panic() invocation?
>>
After MCE has occurred, it is possible for the MCE handler to execute
the add_taint() function without panic. For example, the fake_panic is
configured.
So the above patch method does not seem to be able to cover the printk
deadlock caused by the add_taint() function in the MCE handler when a
MCE occurs in user space.
Sincerely
TonyWWang-oc
>> I do think that an NMI in user space should be considered mostly just
>> a normal exception. From a kernel perspective, the NMI'ness just
>> doesn't matter.
>
> That's correct. I don't want to change that at all especially not for
> recoverable MCEs.
>
>> That said, I find your suggestion of making 'panic()' just basically
>> act as an NMI context intriguing. And cleaner than the
>> atomic_read(&panic_cpu) thing.
>>
>> Are there any other situations than this odd HPET thing where that
>> would change semantics?
>
> I need to go and stare at this some more.
>
> Thanks,
>
> tglx
Powered by blists - more mailing lists