[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0d623c98-a714-1639-cc53-f58ba3f08212@suse.com>
Date: Mon, 8 Feb 2021 11:22:52 +0100
From: Jürgen Groß <jgross@...e.com>
To: Julien Grall <julien@....org>, xen-devel@...ts.xenproject.org,
linux-kernel@...r.kernel.org, linux-block@...r.kernel.org,
netdev@...r.kernel.org, linux-scsi@...r.kernel.org
Cc: Boris Ostrovsky <boris.ostrovsky@...cle.com>,
Stefano Stabellini <sstabellini@...nel.org>,
stable@...r.kernel.org,
Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
Roger Pau Monné <roger.pau@...rix.com>,
Jens Axboe <axboe@...nel.dk>, Wei Liu <wei.liu@...nel.org>,
Paul Durrant <paul@....org>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>
Subject: Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
On 08.02.21 10:54, Julien Grall wrote:
>
>
> On 08/02/2021 09:41, Jürgen Groß wrote:
>> On 08.02.21 10:11, Julien Grall wrote:
>>> Hi Juergen,
>>>
>>> On 07/02/2021 12:58, Jürgen Groß wrote:
>>>> On 06.02.21 19:46, Julien Grall wrote:
>>>>> Hi Juergen,
>>>>>
>>>>> On 06/02/2021 10:49, Juergen Gross wrote:
>>>>>> The first three patches are fixes for XSA-332. The avoid WARN splats
>>>>>> and a performance issue with interdomain events.
>>>>>
>>>>> Thanks for helping to figure out the problem. Unfortunately, I
>>>>> still see reliably the WARN splat with the latest Linux master
>>>>> (1e0d27fce010) + your first 3 patches.
>>>>>
>>>>> I am using Xen 4.11 (1c7d984645f9) and dom0 is forced to use the 2L
>>>>> events ABI.
>>>>>
>>>>> After some debugging, I think I have an idea what's went wrong. The
>>>>> problem happens when the event is initially bound from vCPU0 to a
>>>>> different vCPU.
>>>>>
>>>>> From the comment in xen_rebind_evtchn_to_cpu(), we are masking the
>>>>> event to prevent it being delivered on an unexpected vCPU. However,
>>>>> I believe the following can happen:
>>>>>
>>>>> vCPU0 | vCPU1
>>>>> |
>>>>> | Call xen_rebind_evtchn_to_cpu()
>>>>> receive event X |
>>>>> | mask event X
>>>>> | bind to vCPU1
>>>>> <vCPU descheduled> | unmask event X
>>>>> |
>>>>> | receive event X
>>>>> |
>>>>> | handle_edge_irq(X)
>>>>> handle_edge_irq(X) | -> handle_irq_event()
>>>>> | -> set IRQD_IN_PROGRESS
>>>>> -> set IRQS_PENDING |
>>>>> | -> evtchn_interrupt()
>>>>> | -> clear IRQD_IN_PROGRESS
>>>>> | -> IRQS_PENDING is set
>>>>> | -> handle_irq_event()
>>>>> | -> evtchn_interrupt()
>>>>> | -> WARN()
>>>>> |
>>>>>
>>>>> All the lateeoi handlers expect a ONESHOT semantic and
>>>>> evtchn_interrupt() is doesn't tolerate any deviation.
>>>>>
>>>>> I think the problem was introduced by 7f874a0447a9 ("xen/events:
>>>>> fix lateeoi irq acknowledgment") because the interrupt was disabled
>>>>> previously. Therefore we wouldn't do another iteration in
>>>>> handle_edge_irq().
>>>>
>>>> I think you picked the wrong commit for blaming, as this is just
>>>> the last patch of the three patches you were testing.
>>>
>>> I actually found the right commit for blaming but I copied the
>>> information from the wrong shell :/. The bug was introduced by:
>>>
>>> c44b849cee8c ("xen/events: switch user event channels to lateeoi model")
>>>
>>>>
>>>>> Aside the handlers, I think it may impact the defer EOI mitigation
>>>>> because in theory if a 3rd vCPU is joining the party (let say vCPU
>>>>> A migrate the event from vCPU B to vCPU C). So info->{eoi_cpu,
>>>>> irq_epoch, eoi_time} could possibly get mangled?
>>>>>
>>>>> For a fix, we may want to consider to hold evtchn_rwlock with the
>>>>> write permission. Although, I am not 100% sure this is going to
>>>>> prevent everything.
>>>>
>>>> It will make things worse, as it would violate the locking hierarchy
>>>> (xen_rebind_evtchn_to_cpu() is called with the IRQ-desc lock held).
>>>
>>> Ah, right.
>>>
>>>>
>>>> On a first glance I think we'll need a 3rd masking state ("temporarily
>>>> masked") in the second patch in order to avoid a race with lateeoi.
>>>>
>>>> In order to avoid the race you outlined above we need an "event is
>>>> being
>>>> handled" indicator checked via test_and_set() semantics in
>>>> handle_irq_for_port() and reset only when calling clear_evtchn().
>>>
>>> It feels like we are trying to workaround the IRQ flow we are using
>>> (i.e. handle_edge_irq()).
>>
>> I'm not really sure this is the main problem here. According to your
>> analysis the main problem is occurring when handling the event, not when
>> handling the IRQ: the event is being received on two vcpus.
>
> I don't think we can easily divide the two because we rely on the IRQ
> framework to handle the lifecycle of the event. So...
>
>>
>> Our problem isn't due to the IRQ still being pending, but due it being
>> raised again, which should happen for a one shot IRQ the same way.
>
> ... I don't really see how the difference matter here. The idea is to
> re-use what's already existing rather than trying to re-invent the wheel
> with an extra lock (or whatever we can come up).
The difference is that the race is occurring _before_ any IRQ is
involved. So I don't see how modification of IRQ handling would help.
Juergen
Download attachment "OpenPGP_0xB0DE9DD628BF132F.asc" of type "application/pgp-keys" (3092 bytes)
Download attachment "OpenPGP_signature" of type "application/pgp-signature" (496 bytes)
Powered by blists - more mailing lists