linux-kernel - Re: [PATCHv4 2/2] iommu/vt-d: Limit number of faults to clear in irq handler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Thu, 3 May 2018 07:49:24 +0800
From:   Lu Baolu <baolu.lu@...ux.intel.com>
To:     Dmitry Safonov <dima@...sta.com>, linux-kernel@...r.kernel.org,
        joro@...tes.org, "Raj, Ashok" <ashok.raj@...el.com>
Cc:     0x7f454c46@...il.com, Alex Williamson <alex.williamson@...hat.com>,
        David Woodhouse <dwmw2@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        iommu@...ts.linux-foundation.org
Subject: Re: [PATCHv4 2/2] iommu/vt-d: Limit number of faults to clear in irq
 handler

Hi,

On 05/02/2018 08:38 PM, Dmitry Safonov wrote:
> Hi Lu,
>
> On Wed, 2018-05-02 at 14:34 +0800, Lu Baolu wrote:
>> Hi,
>>
>> On 03/31/2018 08:33 AM, Dmitry Safonov wrote:
>>> Theoretically, on some machines faults might be generated faster
>>> than
>>> they're cleared by CPU.
>> Is this a real case?
> No. 1/2 is a real case and this one was discussed on v3:
> lkml.kernel.org/r/<20180215191729.15777-1-dima@...sta.com>
>
> It's not possible on my hw as far as I tried, but the discussion result
> was to fix this theoretical issue too.

If faults are generated faster than CPU can clear them, the PCIe
device should be in a very very bad state. How about disabling
the PCIe device and ask the administrator to replace it? Anyway,
I don't think that's goal of this patch series. :-)

>
>>>  Let's limit the cleaning-loop by number of hw
>>> fault registers.
>> Will this cause the fault recording registers full of faults, hence
>> new faults will be dropped without logging?
> If faults come faster then they're being cleared - some of them will be
> dropped without logging. Not sure if it's worth to report all faults in
> such theoretical(!) situation.
> If amount of reported faults for such situation is not enough and it's
> worth to keep all the faults, then probably we should introduce a
> workqueue here (which I did in v1, but it was rejected by the reason
> that it will introduce some latency in fault reporting).
>
>> And even worse, new faults will not generate interrupts?
> They will, we clear page fault overflow outside of the loop, so any new
> fault will raise interrupt, iiuc.
>

I am afraid that they might not generate interrupts any more.

Say, the fault registers are full of events that are not cleared,
then a new fault comes. There is no room for this event and
hence the hardware might drop it silently.

Best regards,
Lu Baolu