[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e9b94104-d20a-b6b2-cbe0-f79b1ed09c98@oracle.com>
Date: Mon, 14 Sep 2020 20:24:22 -0400
From: boris.ostrovsky@...cle.com
To: Anchal Agarwal <anchalag@...zon.com>
Cc: tglx@...utronix.de, mingo@...hat.com, bp@...en8.de, hpa@...or.com,
x86@...nel.org, jgross@...e.com, linux-pm@...r.kernel.org,
linux-mm@...ck.org, kamatam@...zon.com, sstabellini@...nel.org,
konrad.wilk@...cle.com, roger.pau@...rix.com, axboe@...nel.dk,
davem@...emloft.net, rjw@...ysocki.net, len.brown@...el.com,
pavel@....cz, peterz@...radead.org, eduval@...zon.com,
sblbir@...zon.com, xen-devel@...ts.xenproject.org,
vkuznets@...hat.com, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, dwmw@...zon.co.uk,
benh@...nel.crashing.org
Subject: Re: [PATCH v3 01/11] xen/manage: keep track of the on-going suspend
mode
On 9/14/20 5:47 PM, Anchal Agarwal wrote:
> On Sun, Sep 13, 2020 at 11:43:30AM -0400, boris.ostrovsky@...cle.com wrote:
>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>
>>
>>
>> On 8/21/20 6:25 PM, Anchal Agarwal wrote:
>>> Though, accquirng pm_mutex is still right thing to do, we may
>>> see deadlock if PM hibernation is interrupted by Xen suspend.
>>> PM hibernation depends on xenwatch thread to process xenbus state
>>> transactions, but the thread will sleep to wait pm_mutex which is
>>> already held by PM hibernation context in the scenario. Xen shutdown
>>> code may need some changes to avoid the issue.
>>
>>
>> Is it Xen's shutdown or suspend code that needs to address this? (Or I
>> may not understand what the problem is that you are describing)
>>
> Its Xen suspend code I think. If we do not take the system_transition_mutex
> in do_suspend then if hibernation is triggered in parallel to xen suspend there
> could be issues.
But you *are* taking this mutex to avoid this exact race, aren't you?
> Now this is still theoretical in my case and I havent been able
> to reproduce such a race. So the approach the original author took was to take
> this lock which to me seems right.
> And its Xen suspend and not Xen Shutdown. So basically if this scenario
> happens I am of the view one of other will fail to occur then how do we recover
> or avoid this at all.
>
> Does that answer your question?
>
>>> +
>>> +static int xen_setup_pm_notifier(void)
>>> +{
>>> + if (!xen_hvm_domain() || xen_initial_domain())
>>> + return -ENODEV;
>>
>> I don't think this works anymore.
> What do you mean?
> The first check is for xen domain types and other is for architecture support.
> The reason I put this check here is because I wanted to segregate the two.
> I do not want to register this notifier at all for !hmv guest and also if its
> an initial control domain.
> The arm check only lands in notifier because once hibernate() api is called ->
> calls pm_notifier_call_chain for PM_HIBERNATION_PREPARE this will fail for
> aarch64.
> Once we have support for aarch64 this notifier can go away altogether.
>
> Is there any other reason I may be missing why we should move this check to
> notifier?
Not registering this notifier is equivalent to having it return NOTIFY_OK.
In your earlier versions just returning NOTIFY_OK was not sufficient for
hibernation to proceed since the notifier would also need to set
suspend_mode appropriately. But now your notifier essentially filters
out unsupported configurations. And so if it is not called your
configuration (e.g. PV domain) will be considered supported.
>> In the past your notifier would set suspend_mode (or something) but now
>> it really doesn't do anything except reports an error in some (ARM) cases.
>>
>> So I think you should move this check into the notifier.
>> (And BTW I still think PM_SUSPEND_PREPARE should return an error too.
>> The fact that we are using "suspend" in xen routine names is irrelevant)
>>
> I may have send "not-updated" version of the notifier's function change.
>
> + switch (pm_event) {
> + case PM_HIBERNATION_PREPARE:
> + /* Guest hibernation is not supported for aarch64 currently*/
> + if (IS_ENABLED(CONFIG_ARM64)) {
> + ret = NOTIFY_BAD;
> + break;
> + }
> + case PM_RESTORE_PREPARE:
> + case PM_POST_RESTORE:
> + case PM_POST_HIBERNATION:
> + default:
> + ret = NOTIFY_OK;
> + }
There is no difference on x86 between this code and what you sent
earlier. In both instances PM_SUSPEND_PREPARE will return NOTIFY_OK.
On ARM this code will allow suspend to proceed (which is not what we want).
-boris
>
> With the above path PM_SUSPEND_PREPARE will go all together. Does that
> resolves this issue? I wanted to get rid of all SUSPEND_* as they are not needed
> here clearly.
> The only reason I kept it there is if someone tries to trigger hibernation on
> ARM instances they should get an error. As I am not sure about the current
> behavior. There may be a better way to not invoke hibernation on ARM DomU's and
> get rid of this block all together.
>
> Again, sorry for sending in the half baked fix. My workspace switch may have
> caused the error.
>>
>>
>> -boris
>>
> Anchal
>>
>>> + return register_pm_notifier(&xen_pm_notifier_block);
>>> +}
>>> +
Powered by blists - more mailing lists