linux-kernel - Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <063eff75-56a5-1af7-f684-a2ed4b13c9a7@xen.org>
Date:   Mon, 8 Feb 2021 13:09:27 +0000
From:   Julien Grall <julien@....org>
To:     Jürgen Groß <jgross@...e.com>,
        xen-devel@...ts.xenproject.org, linux-kernel@...r.kernel.org,
        linux-block@...r.kernel.org, netdev@...r.kernel.org,
        linux-scsi@...r.kernel.org
Cc:     Boris Ostrovsky <boris.ostrovsky@...cle.com>,
        Stefano Stabellini <sstabellini@...nel.org>,
        stable@...r.kernel.org,
        Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
        Roger Pau Monné <roger.pau@...rix.com>,
        Jens Axboe <axboe@...nel.dk>, Wei Liu <wei.liu@...nel.org>,
        Paul Durrant <paul@....org>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>
Subject: Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids

Hi Juergen,

On 08/02/2021 12:31, Jürgen Groß wrote:
> On 08.02.21 13:16, Julien Grall wrote:
>>
>>
>> On 08/02/2021 12:14, Jürgen Groß wrote:
>>> On 08.02.21 11:40, Julien Grall wrote:
>>>> Hi Juergen,
>>>>
>>>> On 08/02/2021 10:22, Jürgen Groß wrote:
>>>>> On 08.02.21 10:54, Julien Grall wrote:
>>>>>> ... I don't really see how the difference matter here. The idea is 
>>>>>> to re-use what's already existing rather than trying to re-invent 
>>>>>> the wheel with an extra lock (or whatever we can come up).
>>>>>
>>>>> The difference is that the race is occurring _before_ any IRQ is
>>>>> involved. So I don't see how modification of IRQ handling would help.
>>>>
>>>> Roughly our current IRQ handling flow (handle_eoi_irq()) looks like:
>>>>
>>>> if ( irq in progress )
>>>> {
>>>>    set IRQS_PENDING
>>>>    return;
>>>> }
>>>>
>>>> do
>>>> {
>>>>    clear IRQS_PENDING
>>>>    handle_irq()
>>>> } while (IRQS_PENDING is set)
>>>>
>>>> IRQ handling flow like handle_fasteoi_irq() looks like:
>>>>
>>>> if ( irq in progress )
>>>>    return;
>>>>
>>>> handle_irq()
>>>>
>>>> The latter flow would catch "spurious" interrupt and ignore them. So 
>>>> it would handle nicely the race when changing the event affinity.
>>>
>>> Sure? Isn't "irq in progress" being reset way before our "lateeoi" is
>>> issued, thus having the same problem again? 
>>
>> Sorry I can't parse this.
> 
> handle_fasteoi_irq() will do nothing "if ( irq in progress )". When is
> this condition being reset again in order to be able to process another
> IRQ?
It is reset after the handler has been called. See handle_irq_event().

> I believe this will be the case before our "lateeoi" handling is
> becoming active (more precise: when our IRQ handler is returning to
> handle_fasteoi_irq()), resulting in the possibility of the same race we
> are experiencing now.

I am a bit confused what you mean by "lateeoi" handling is becoming 
active. Can you clarify?

Note that are are other IRQ flows existing. We should have a look at 
them before trying to fix thing ourself.

Although, the other issue I can see so far is handle_irq_for_port() will 
update info->{eoi_cpu, irq_epoch, eoi_time} without any locking. But it 
is not clear this is what you mean by "becoming active".

Cheers,

-- 
Julien Grall