lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87v9pcw55q.fsf@nanos.tec.linutronix.de>
Date:   Thu, 16 Jan 2020 02:39:45 +0100
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Ramon Fried <rfried.dev@...il.com>
Cc:     hkallweit1@...il.com, Bjorn Helgaas <bhelgaas@...gle.com>,
        maz@...nel.org, lorenzo.pieralisi@....com,
        linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
        maz@...nel.org
Subject: Re: MSI irqchip configured as IRQCHIP_ONESHOT_SAFE causes spurious IRQs

Ramon,

Ramon Fried <rfried.dev@...il.com> writes:
> On Wed, Jan 15, 2020 at 12:54 AM Thomas Gleixner <tglx@...utronix.de> wrote:
>> Ramon Fried <rfried.dev@...il.com> writes:
>> Due to the semantics of MSI this is perfectly fine and aside of your
>> problem this has worked perfectly fine so far and it's an actual
>> performance win because it avoid fiddling with the MSI mask which is
>> slow.
>>
> fiddling with MSI masks is a configuration space write, which is
> non-posted, so it does come with a price.
> The question is if a test was ever conducted to see the it's better
> than spurious IRQ's.

The point is that there are no spurious interrupts in the sane cases and
the tests we did showed a real performance improvements in high
frequency interrupt situations due to avoiding the config space access.

Please stop claiming that this spurious interrupt problem is there by
design. It's not. Read the MSI spec.

Also boot your laptop/workstation with 'threadirqs' on the kernel
command line and check how many spurious interrupts come in. On a test
machine which has that command line parameter set I see exactly ONE with
an uptime of several days and heavy MSI interrupt activity. The ONE is
even there without 'threadirqs' on the command line, so I really can't
be bothered to analyze that.

>> You still have not told which driver/hardware is affected by this. Can
>> you please provide that information so we can finally look at the actual
>> hardware/driver combo?
>>
> Sure,
> I'm writing an MSI IRQ controller, it's basically a MIPS GIC interrupt
> line which several MSI are multiplexed on it.

I assume you write the driver, not the VHDL for the actual hardware,
right? If so, you still did not tell which hardware that is and where we
can find information about it.

I further assume that 'multiplexed' means that the hardware is something
like an MSI receiver on the CPU/chipset which handles multiple MSI
messages and forwards them to a single shared interrupt line on the MIPS
GIC. Right?

Can you please provide a pointer to the hardware documentation?

> It's configured with handle_level_irq() as the GIC is level IRQ.

Which is completely bonkers. MSI has edge semantics and sharing an
interrupt line for edge type interrupts is broken by design, unless the
hardware which handles the incoming MSIs and forwards them to the level
type interrupt line is designed properly and the driver does the right
thing.

> The ack callback acks the GIC irq.  the mask/unmask calls
> pci_msi_mask_irq() / pci_msi_unmask_irq()

What? How is that supposed to work with multiple MSIs?

Either the hardware is a trainwreck or the driver or both.

I can't tell as I can't find my crystal ball. Maybe I should replace it
with an Mobileye :)

Thanks,

        tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ