lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fb1e5c6f-cb0e-49d0-9e96-8e778573f49c@amd.com>
Date: Mon, 1 Dec 2025 13:20:40 -0500
From: Jason Andryuk <jason.andryuk@....com>
To: Marek Marczykowski-Górecki
	<marmarek@...isiblethingslab.com>
CC: Juergen Gross <jgross@...e.com>, Stefano Stabellini
	<sstabellini@...nel.org>, Oleksandr Tyshchenko
	<oleksandr_tyshchenko@...m.com>, Yann Sionneau <yann.sionneau@...es.tech>,
	<xen-devel@...ts.xenproject.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/2] xenbus: Use .freeze/.thaw to handle xenbus devices

On 2025-11-29 21:03, Marek Marczykowski-Górecki wrote:
> On Wed, Nov 19, 2025 at 05:47:29PM -0500, Jason Andryuk wrote:
>> The goal is to fix s2idle and S3 for Xen PV devices.
> 
> Can you give a little more context of this? We do have working S3 in
> qubes with no need for such change. We trigger it via the toolstack (libxl_domain_suspend_only()).
> Are you talking about guest-initiated suspend here?

This is intended to help domU s2idle/S3 and resume.  I guess that is 
what you mean by guest-initiated?  The domU can use 'echo mem > 
/sys/power/state' to enter s2idle/S3.  We also have the domU react to 
the ACPI sleep button from `xl trigger $dom sleep`.

AIUI, libxl_domain_suspend_only() triggers xenstore writes which Linux 
drivers/xen/manage.c:do_suspend() acts on.  `xl save/suspend/migrate` 
all use this path.

The terminology gets confusing.  Xen uses "suspend" for 
save/suspend/migrate, but the Linux power management codes uses 
freeze/thaw/restore.  AIUI, Linux's PMSG_SUSPEND/.suspend is for runtime 
power management.

When you call libxl_domain_suspend_only()/libxl_domain_resume(), you 
pass suspend_cancel==1.
  *  1. (fast=1) Resume the guest without resetting the domain
        environment.
  *     The guests's call to SCHEDOP_shutdown(SHUTDOWN_suspend) will
        return 1.

That ends up in Linux do_suspend() as si.cancelled = 1, which calls 
PMSG_THAW -> .thaw -> xenbus_dev_cancel() which is a no-op.  So it does 
not change the PV devices.

We needed guest user space to perform actions before entering s2idle. 
libxl_domain_suspend_only() triggers the Linux kernel path which does 
not notify user space.  The ACPI power buttons let user space perform 
actions (lock and blank the screen) before entering the idle state.

> We also have kinda working (host) s2idle. You may want to take a look at this
> work (some/most of it was posted upstream, but not all got
> committed/reviewed):
> https://github.com/QubesOS/qubes-issues/issues/6411#issuecomment-1538089344
> https://github.com/QubesOS/qubes-linux-kernel/pull/910 (some patches
> changed since that PR, see the current main too).

This would not affect host s2idle - it changes PV frontend devices.

Do you libxl_domain_suspend_only() all domUs and then put dom0 into s0ix?

>> A domain resuming
>> from s3 or s2idle disconnects its PV devices during resume.  The
>> backends are not expecting this and do not reconnect.
>>
>> b3e96c0c7562 ("xen: use freeze/restore/thaw PM events for suspend/
>> resume/chkpt") changed xen_suspend()/do_suspend() from
>> PMSG_SUSPEND/PMSG_RESUME to PMSG_FREEZE/PMSG_THAW/PMSG_RESTORE, but the
>> suspend/resume callbacks remained.
>>
>> .freeze/restore are used with hiberation where Linux restarts in a new
>> place in the future.  .suspend/resume are useful for runtime power
>> management for the duration of a boot.
>>
>> The current behavior of the callbacks works for an xl save/restore or
>> live migration where the domain is restored/migrated to a new location
>> and connecting to a not-already-connected backend.
>>
>> Change xenbus_pm_ops to use .freeze/thaw/restore and drop the
>> .suspend/resume hook.  This matches the use in drivers/xen/manage.c for
>> save/restore and live migration.  With .suspend/resume empty, PV devices
>> are left connected during s2idle and s3, so PV devices are not changed
>> and work after resume.
> 
> Is that intended? While it might work for suspend by a chance(*), I'm
> pretty sure not disconnecting + re-reconnecting PV devices across
> save/restore/live migration will break them.

save/restore/live migration keep using .freeze/thaw/restore, which 
disconnects and reconnects today.  Nothing changes there as 
xen_suspend()/do_suspend() call the power management code with 
PMSG_FREEZE/PMSG_THAW/PMSG_RESTORE.

This patches makes .suspend/resume no-ops for PMSG_SUSPEND/PMSG_RESUME. 
When a domU goes into s2idle/S3, the backend state remains connected. 
With this patch, when the domU wakes up, the frontends do nothing and 
remain connected.

> (*) and even that I'm not sure - with driver domains, depending on
> suspend order this feels like might result in a deadlock...

I'm not sure.  I don't think this patch changes anything with respect to 
them.

Thanks for testing.

Maybe the commit messages should change to highlight this is for domU PV 
devices?  struct xen_bus_type xenbus_backend does not define dev_pm_ops.

Regards,
Jason

>> Signed-off-by: Jason Andryuk <jason.andryuk@....com>
>> ---
>>   drivers/xen/xenbus/xenbus_probe_frontend.c | 4 +---
>>   1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c b/drivers/xen/xenbus/xenbus_probe_frontend.c
>> index 6d1819269cbe..199917b6f77c 100644
>> --- a/drivers/xen/xenbus/xenbus_probe_frontend.c
>> +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c
>> @@ -148,11 +148,9 @@ static void xenbus_frontend_dev_shutdown(struct device *_dev)
>>   }
>>   
>>   static const struct dev_pm_ops xenbus_pm_ops = {
>> -	.suspend	= xenbus_dev_suspend,
>> -	.resume		= xenbus_frontend_dev_resume,
>>   	.freeze		= xenbus_dev_suspend,
>>   	.thaw		= xenbus_dev_cancel,
>> -	.restore	= xenbus_dev_resume,
>> +	.restore	= xenbus_frontend_dev_resume,
>>   };
>>   
>>   static struct xen_bus_type xenbus_frontend = {
>> -- 
>> 2.34.1
>>
>>
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ