lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 14 Sep 2020 20:24:22 -0400
To:     Anchal Agarwal <>
Subject: Re: [PATCH v3 01/11] xen/manage: keep track of the on-going suspend

On 9/14/20 5:47 PM, Anchal Agarwal wrote:
> On Sun, Sep 13, 2020 at 11:43:30AM -0400, wrote:
>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>> On 8/21/20 6:25 PM, Anchal Agarwal wrote:
>>> Though, accquirng pm_mutex is still right thing to do, we may
>>> see deadlock if PM hibernation is interrupted by Xen suspend.
>>> PM hibernation depends on xenwatch thread to process xenbus state
>>> transactions, but the thread will sleep to wait pm_mutex which is
>>> already held by PM hibernation context in the scenario. Xen shutdown
>>> code may need some changes to avoid the issue.
>> Is it Xen's shutdown or suspend code that needs to address this? (Or I
>> may not understand what the problem is that you are describing)
> Its Xen suspend code I think. If we do not take the system_transition_mutex
> in do_suspend then if hibernation is triggered in parallel to xen suspend there
> could be issues. 

But you *are* taking this mutex to avoid this exact race, aren't you?

> Now this is still theoretical in my case and I havent been able
> to reproduce such a race. So the approach the original author took was to take
> this lock which to me seems right.
> And its Xen suspend and not Xen Shutdown. So basically if this scenario
> happens I am of the view one of other will fail to occur then how do we recover
> or avoid this at all.
> Does that answer your question?

>>> +
>>> +static int xen_setup_pm_notifier(void)
>>> +{
>>> +     if (!xen_hvm_domain() || xen_initial_domain())
>>> +             return -ENODEV;
>> I don't think this works anymore.
> What do you mean?
> The first check is for xen domain types and other is for architecture support. 
> The reason I put this check here is because I wanted to segregate the two.
> I do not want to register this notifier at all for !hmv guest and also if its
> an initial control domain.
> The arm check only lands in notifier because once hibernate() api is called ->
> calls pm_notifier_call_chain for PM_HIBERNATION_PREPARE this will fail for
> aarch64. 
> Once we have support for aarch64 this notifier can go away altogether. 
> Is there any other reason I may be missing why we should move this check to
> notifier?

Not registering this notifier is equivalent to having it return NOTIFY_OK.

In your earlier versions just returning NOTIFY_OK was not sufficient for
hibernation to proceed since the notifier would also need to set
suspend_mode appropriately. But now your notifier essentially filters
out unsupported configurations. And so if it is not called your
configuration (e.g. PV domain) will be considered supported.

>> In the past your notifier would set suspend_mode (or something) but now
>> it really doesn't do anything except reports an error in some (ARM) cases.
>> So I think you should move this check into the notifier.
>> (And BTW I still think PM_SUSPEND_PREPARE should return an error too.
>> The fact that we are using "suspend" in xen routine names is irrelevant)
> I may have send "not-updated" version of the notifier's function change.
> +    switch (pm_event) {
> +        /* Guest hibernation is not supported for aarch64 currently*/
> +        if (IS_ENABLED(CONFIG_ARM64)) {
> +             ret = NOTIFY_BAD;                                                                                                                                                                                                                                                    
> +             break;                                                                                                                                                                                                                                                               
> +     }               
> +       case PM_RESTORE_PREPARE:
> +       case PM_POST_RESTORE:
> +       case PM_POST_HIBERNATION:
> +       default:
> +           ret = NOTIFY_OK;
> +    }

There is no difference on x86 between this code and what you sent
earlier. In both instances PM_SUSPEND_PREPARE will return NOTIFY_OK.

On ARM this code will allow suspend to proceed (which is not what we want).


> With the above path PM_SUSPEND_PREPARE will go all together. Does that
> resolves this issue? I wanted to get rid of all SUSPEND_* as they are not needed
> here clearly.
> The only reason I kept it there is if someone tries to trigger hibernation on
> ARM instances they should get an error. As I am not sure about the current
> behavior. There may be a better way to not invoke hibernation on ARM DomU's and
> get rid of this block all together.
> Again, sorry for sending in the half baked fix. My workspace switch may have
> caused the error.
>> -boris
> Anchal
>>> +     return register_pm_notifier(&xen_pm_notifier_block);
>>> +}
>>> +

Powered by blists - more mailing lists