linux-kernel - Re: [PATCH] x86/hpet: Read HPET directly if panic in progress

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0aff3f62-a8a5-4358-ae3f-2ded339aface@zhaoxin.com>
Date: Wed, 5 Jun 2024 14:23:32 +0800
From: Tony W Wang-oc <TonyWWang-oc@...oxin.com>
To: Thomas Gleixner <tglx@...utronix.de>, Linus Torvalds
	<torvalds@...ux-foundation.org>
CC: Dave Hansen <dave.hansen@...el.com>, <mingo@...hat.com>, <bp@...en8.de>,
	<dave.hansen@...ux.intel.com>, <x86@...nel.org>, <hpa@...or.com>,
	<keescook@...omium.org>, <tony.luck@...el.com>, <gpiccoli@...lia.com>,
	<mat.jonczyk@...pl>, <rdunlap@...radead.org>,
	<alexandre.belloni@...tlin.com>, <mario.limonciello@....com>,
	<yaolu@...inos.cn>, <bhelgaas@...gle.com>, <justinstitt@...gle.com>,
	<linux-kernel@...r.kernel.org>, <linux-hardening@...r.kernel.org>,
	<CobeChen@...oxin.com>, <TimGuo@...oxin.com>, <LeoLiu-oc@...oxin.com>
Subject: Re: [PATCH] x86/hpet: Read HPET directly if panic in progress



On 2024/5/29 15:42, Thomas Gleixner wrote:
> 
> 
> [这封邮件来自外部发件人 谨防风险]
> 
> Linus!
> 
> On Tue, May 28 2024 at 16:22, Linus Torvalds wrote:
>> On Tue, 28 May 2024 at 15:12, Thomas Gleixner <tglx@...utronix.de> wrote:
>> I see the smiley, but yeah, I don't think we really care about it.
> 
> Indeed. But the same problem exists on other architectures as
> well. drivers/clocksource alone has 4 examples aside of i8253
> 
>>>    1) Should we provide a panic mode read callback for clocksources which
>>>       are affected by this?
>>
>> The current patch under discussion may be ugly, but looks workable.
>> Local ugliness isn't necessarily a show-stopper.
>>
>> So if the HPET is the *only* case which has this situation, I vote for
>> just doing the ugly thing.
>>
>> Now, if *other* cases exist, and can't be worked around in similar
>> ways, then that argues for a more "proper" fix.
>>
>> And no, I don't think i8253 is a strong enough argument. I don't
>> actually believe you can realistically find a machine that doesn't
>> have HPET or the TSC and really falls back on the i8253 any more. And
>> if you *do* find hw like that, is it SMP-capable? And can you find
>> somebody who cares?
> 
> Probably not.
> 
>>>    2) Is it correct to claim that a MCE which hits user space and ends up in
>>>       mce_panic() is still just a regular exception or should we upgrade to
>>>       NMI class context when we enter mce_panic() or even go as far to
>>>       upgrade to NMI class context for any panic() invocation?
>>

After MCE has occurred, it is possible for the MCE handler to execute 
the add_taint() function without panic. For example, the fake_panic is 
configured.

So the above patch method does not seem to be able to cover the printk 
deadlock caused by the add_taint() function in the MCE handler when a 
MCE occurs in user space.

Sincerely
TonyWWang-oc

>> I do think that an NMI in user space should be considered mostly just
>> a normal exception. From a kernel perspective, the NMI'ness just
>> doesn't matter.
> 
> That's correct. I don't want to change that at all especially not for
> recoverable MCEs.
> 
>> That said, I find your suggestion of making 'panic()' just basically
>> act as an NMI context intriguing. And cleaner than the
>> atomic_read(&panic_cpu) thing.
>>
>> Are there any other situations than this odd HPET thing where that
>> would change semantics?
> 
> I need to go and stare at this some more.
> 
> Thanks,
> 
>          tglx