[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZqIMWDWn5W9+9RMA@duo.ucw.cz>
Date: Thu, 25 Jul 2024 10:27:04 +0200
From: Pavel Machek <pavel@....cz>
To: Tony W Wang-oc <TonyWWang-oc@...oxin.com>
Cc: rafael@...nel.org, len.brown@...el.com, linux-pm@...r.kernel.org,
akpm@...ux-foundation.org, dianders@...omium.org,
tglx@...utronix.de, song@...nel.org, liusong@...ux.alibaba.com,
yaoma@...ux.alibaba.com, kjlx@...pleofstupid.com,
lizhe.67@...edance.com, linux@...ssschuh.net,
j.granados@...sung.com, linux-kernel@...r.kernel.org,
"CobeChen@...oxin.com" <CobeChen@...oxin.com>,
"TimGuo@...oxin.com" <TimGuo@...oxin.com>, SilviaZhao@...oxin.com,
"Linda Chai(BJ-RD)" <LindaChai@...oxin.com>, Felixzhang@...oxin.com
Subject: Re: Unknown NMI after S4 resume
Hi!
> When running S4 test on Zhaoxin platform with Ubuntu22.04 kernel-6.10 we got
> unknown NMI messages after S4 resumed:
>
> [ 115.792224] Uhhuh. NMI received for unknown reason 2d on CPU 0.
> [ 115.792226] Do you have a strange power saving mode enabled?
> [ 115.792228] Dazed and confused, but trying to continue
>
> And reproduced on Intel platform.
>
> After tracing, we find that the reason for this Unknown NMI occurs is as
> follows:
> a, 1st kernel starts normally and NMI watchdog is enabled on all cores;
> b, NMI watchdog is disabled on all cores through the sys interface, then
> variable active_events goto zero;
> c, Start hibernate, create & save hibernation image, then go hibernated;
> d, S4 resume event happened, 2nd kernel starts normally and NMI watchdog is
> enabled on All cores;
> e, 2nd kernel find S4 image and try to restore S4 image;
> f, 2nd kernel disable non-boot CPUs, which would disable NMI watchdog for
> APs;
> g, Restore S4 image saved at step c;
> h, 1st-hibernated kernel restore, re-enable non-boot CPUs, as NMI watchdog
> is disabled in step b, this which would keep APs NMI watchdog disabled;
> Besides, the variable active_events is restored to zero;
>
> But BSP NMI watchdog is still enabled, and the PMC will trigger NMI
> interrupt periodically.
> If PMC NMI triggered, perf_event_nmi_handler will be called, but it would
> see active_events is zero, so it goes out directly and return NMI_DONE;
> This then leads to unknown NMI messages.
>
> static int
> perf_event_nmi_handler(unsigned int cmd, struct pt_regs *regs)
> {
> u64 start_clock;
> u64 finish_clock;
> int ret;
>
> /*
> * All PMUs/events that share this PMI handler should make sure to
> * increment active_events for their events.
> */
> if (!atomic_read(&active_events))
> return NMI_DONE;
> ......
>
> It seems that the BSP does not refer to the settings of the NMI watchdog sys
> interface previously saved to the S4 image to configure the NMI watchdog
> when doing S4 resume.
> Should consider this situation and patch it?
Yes, please.
The watchdog driver should get suspend/resume hooks, and probably do
same init on boot and on resume.
Best regards,
Pavel
--
People of Russia, stop Putin before his war on Ukraine escalates.
Download attachment "signature.asc" of type "application/pgp-signature" (196 bytes)
Powered by blists - more mailing lists