[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8b742188-d8a2-cf4e-e9de-0ca6f3d829b3@linux.ibm.com>
Date: Thu, 1 Jul 2021 10:28:52 -0400
From: Tony Krowiak <akrowiak@...ux.ibm.com>
To: Halil Pasic <pasic@...ux.ibm.com>
Cc: linux-s390@...r.kernel.org, linux-kernel@...r.kernel.org,
pasic@...ux.vnet.ibm.com, jjherne@...ux.ibm.com,
Jason Gunthorpe <jgg@...dia.com>
Subject: Re: [PATCH] s390/vfio-ap: do not use open locks during
VFIO_GROUP_NOTIFY_SET_KVM notification
On 6/30/21 6:39 PM, Halil Pasic wrote:
> On Wed, 30 Jun 2021 10:31:22 -0400
> Tony Krowiak <akrowiak@...ux.ibm.com> wrote:
>
>> On 6/28/21 4:29 PM, Halil Pasic wrote:
>>> On Fri, 25 Jun 2021 18:07:58 -0400
>>> Tony Krowiak <akrowiak@...ux.ibm.com> wrote:
>>>
>>> What is a suitable base for this patch. I've tried the usual suspects,
>>> but none of them worked.
>> I discovered what the problem is here. The patch is based on our
>> master branch along with the two pre-requisite patches that were
>> recently reviewed and are currently being merged. The two patches
>> of which I speak are:
>> * [PATCH v6 1/2] s390/vfio-ap: clean up mdev resources when remove
>> callback invoked
>> Message ID: <20210621155714.1198545-2-akrowiak@...ux.ibm.com>
>>
>> * [PATCH v6 2/2] s390/vfio-ap: r/w lock for PQAP interception handler
>> function pointer
>> <20210621155714.1198545-3-akrowiak@...ux.ibm.com>
>>
>> I probably should have included those along with this one.
> Either that, or state in the cover letter that those are prerequisites.
>
>>>
>>>> The fix to resolve a lockdep splat while handling the
>>>> VFIO_GROUP_NOTIFY_SET_KVM event introduced a kvm_busy flag indicating that
>>>> the vfio_ap device driver is busy setting or unsetting the KVM pointer.
>>>> A wait queue was employed to allow functions requiring access to the KVM
>>>> pointer to wait for the kvm_busy flag to be cleared. For the duration of
>>>> the wait period, the mdev lock was unlocked then acquired again after the
>>>> kvm_busy flag was cleared. This got rid of the lockdep report, but didn't
>>>> really resolve the problem.
>>> Can you please elaborate on the last point. You mean that we can have
>>> circular locking even after 0cc00c8d4050, but instead of getting stuck in
>>> on a lock we will get stuck on wait_event_cmd()? If that is it, please
>>> state it clearly in the description, and if you can to it in the short
>>> description.
I created this patch in response to Jason G's review comments I copied
below. He did not mention anything about getting stuck in a
wait_event_cmd(),
so you may want to ask him about that. To answer your
question, I don't see how we can get stuck in a wait_event_cmd() unless
one of the functions that set the matrix_mdev->kvm_busy flag does not
complete for some reason.
You asked for me to elaborate on the last point in the preceding paragraph;
I did so in my response to your comments/question above.
>>>
>> This patch was in response to the following review comments made by Jason
>> Gunthorpe:
>>
>> * Message ID: <20210525162927.GC1002214@...dia.com>
>> "... the kvm_busy should be replaced by a proper rwsem,
>> don't try to open code locks like that - it just defeats lockdep
>> analysis".
>>
>> * Message ID: <20210527112433.GX1002214@...dia.com>
>> "Usually when people start open coding locks it is often
>> because lockdep complained. Open coding a lock makes
>> lockdep stop because the lockdep code
>> is removed, but it doesn't fix anything. The kvm_busy
>> should be replaced by a proper rwsem, don't try to
>> open code locks like that - it just defeats lockdep
>> analysis."
>>
>> I will paraphrase and include the information from Jason's
>> comments in the description.
>>
> This does not answer my questions.
See above.
>
> I'm in favor of Jason's proposal, because it is much easier to
> comprehend simple rwsem protected than a mutex + wait_queue dance.
That is a matter of opinion. I have no trouble understanding
the "mutex + wait_queue dance", but then again, I wrote the
code, so maybe that is why.
>
>
> I think Jason was talking about open coding locks in general.
That may be so, but his comments were in support of his
statement that the mutex + wait_queue did not resolve
the issue reported vai the lockdep splat because it turned
off lockdep.
> I don't
> consider it as proof of commit 0cc00c8d4050 not doing what it
> advertised.
I think I agree with this statement. Maybe I misunderstood what
Jason meant by "open coding locks like that". Since that comment
directly related to replacing the kvm->busy, I assumed that he
was referring to the "mutex + wait_queue dance" as you called it.
I probably should have probed deeper to discern exactly what
Jason meant by "open coding locks". I took Jason at his word
that "the kvm->busy ... just defeats lockdep analysis" because
I don't have deep knowledge about lockdep.
Even if the kvm->busy does defeat lockdep analysis, I still believe
it fixes the problem for which commit 0cc00c8d4050 was created.
If we don't hold the matrix_dev->lock while the kvm->lock is held
during update of the guest's matrix, lockdep does not report a
problem. That is proven by the tests of this patch.
If commit 0cc00c8d4050 does in fact resolve the issue for which
it was created, then there really is no need for this patch. It
certainly would reduce the amount of change that will be required
to integrate this patch with the "s390/vfio-ap: dynamic configuration
support" patch series currently under review.
> You can add a Suggested-by tag if you like, but you should
> be able to tell us what is the merit of your patch.
The patch removes the matrix_mdev->kvm_busy flag and the
wait_queue. The reason for removing them was because, as
Jason stated, "the kvm->busy ... just defeats lockdep analysis".
>
> Regards,
> Halil
>
Powered by blists - more mailing lists