linux-kernel - Re: Unknown NMI after S4 resume

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZqIMWDWn5W9+9RMA@duo.ucw.cz>
Date: Thu, 25 Jul 2024 10:27:04 +0200
From: Pavel Machek <pavel@....cz>
To: Tony W Wang-oc <TonyWWang-oc@...oxin.com>
Cc: rafael@...nel.org, len.brown@...el.com, linux-pm@...r.kernel.org,
	akpm@...ux-foundation.org, dianders@...omium.org,
	tglx@...utronix.de, song@...nel.org, liusong@...ux.alibaba.com,
	yaoma@...ux.alibaba.com, kjlx@...pleofstupid.com,
	lizhe.67@...edance.com, linux@...ssschuh.net,
	j.granados@...sung.com, linux-kernel@...r.kernel.org,
	"CobeChen@...oxin.com" <CobeChen@...oxin.com>,
	"TimGuo@...oxin.com" <TimGuo@...oxin.com>, SilviaZhao@...oxin.com,
	"Linda Chai(BJ-RD)" <LindaChai@...oxin.com>, Felixzhang@...oxin.com
Subject: Re: Unknown NMI after S4 resume

Hi!

> When running S4 test on Zhaoxin platform with Ubuntu22.04 kernel-6.10 we got
> unknown NMI messages after S4 resumed:
> 
> [  115.792224] Uhhuh. NMI received for unknown reason 2d on CPU 0.
> [  115.792226] Do you have a strange power saving mode enabled?
> [  115.792228] Dazed and confused, but trying to continue
> 
> And reproduced on Intel platform.
> 
> After tracing, we find that the reason for this Unknown NMI occurs is as
> follows:
> a, 1st kernel starts normally and NMI watchdog is enabled on all cores;
> b, NMI watchdog is disabled on all cores through the sys interface, then
> variable active_events goto zero;
> c, Start hibernate, create & save hibernation image, then go hibernated;
> d, S4 resume event happened, 2nd kernel starts normally and NMI watchdog is
> enabled on All cores;
> e, 2nd kernel find S4 image and try to restore S4 image;
> f, 2nd kernel disable non-boot CPUs, which would disable NMI watchdog for
> APs;
> g, Restore S4 image saved at step c;
> h, 1st-hibernated kernel restore, re-enable non-boot CPUs, as NMI watchdog
> is disabled in step b, this which would keep APs NMI watchdog disabled;
> Besides, the variable active_events is restored to zero;
> 
> But BSP NMI watchdog is still enabled, and the PMC will trigger NMI
> interrupt periodically.
> If PMC NMI triggered, perf_event_nmi_handler will be called, but it would
> see active_events is zero, so it goes out directly and return NMI_DONE;
> This then leads to unknown NMI messages.
> 
> static int
> perf_event_nmi_handler(unsigned int cmd, struct pt_regs *regs)
> {
>        u64 start_clock;
>        u64 finish_clock;
>        int ret;
> 
>        /*
>        * All PMUs/events that share this PMI handler should make sure to
>        * increment active_events for their events.
>        */
>        if (!atomic_read(&active_events))
>               return NMI_DONE;
> ......
> 
> It seems that the BSP does not refer to the settings of the NMI watchdog sys
> interface previously saved to the S4 image to configure the NMI watchdog
> when doing S4 resume.
> Should consider this situation and patch it?

Yes, please.

The watchdog driver should get suspend/resume hooks, and probably do
same init on boot and on resume.

Best regards,

								Pavel
-- 
People of Russia, stop Putin before his war on Ukraine escalates.

Download attachment "signature.asc" of type "application/pgp-signature" (196 bytes)