linux-kernel - Re: MSI irqchip configured as IRQCHIP_ONESHOT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAGi-RUJtqdLtFBVMxL8TOQ3LGRqqrV4Ge7Fu9mTyDoQVYxtA5g@mail.gmail.com>
Date:   Fri, 17 Jan 2020 15:32:07 +0200
From:   Ramon Fried <rfried.dev@...il.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     hkallweit1@...il.com, Bjorn Helgaas <bhelgaas@...gle.com>,
        maz@...nel.org, lorenzo.pieralisi@....com,
        linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: MSI irqchip configured as IRQCHIP_ONESHOT_SAFE causes spurious IRQs

Thomas,
Thanks a lot for the detailed answer.

> > Basically, 32 MSI vectors are represented by a single GIC irq.
> > There's a status registers which every bit correspond to an MSI vector, and
> > individual MSI needs to be acked on that registers. in any case where
> > there's asserted bit the GIC IRQ level is high.
>
> Which is not that bad.
>
> >> > It's configured with handle_level_irq() as the GIC is level IRQ.
> >>
> >> Which is completely bonkers. MSI has edge semantics and sharing an
> >> interrupt line for edge type interrupts is broken by design, unless the
> >> hardware which handles the incoming MSIs and forwards them to the level
> >> type interrupt line is designed properly and the driver does the right
> >> thing.
> >
> > Yes, the design of the HW is sort of broken. I concur.
>
> As you describe it, it's not that bad.
>
> >> > The ack callback acks the GIC irq.  the mask/unmask calls
> >> > pci_msi_mask_irq() / pci_msi_unmask_irq()
> >>
> >> What? How is that supposed to work with multiple MSIs?
> > Acking is per MSI vector as I described above, so it should work.
>
> No. This is the wrong approach. Lets look at the hardware:
>
> | GIC   line X |------| MSI block | <--- Messages from devices
>
> The MSI block latches the incoming message up to the point where it is
> acknowledged in the MSI block. This makes sure that the level semantics
> of the GIC are met.
>
> So from a software perspective you want to do something like this:
>
> gic_irq_handler()
> {
>    mask_ack(gic_irqX);
>
>    pending = read(msi_status);
>    for_each_bit(bit, pending) {
>        ack(msi_status, bit);  // This clears the latch in the MSI block
>        handle_irq(irqof(bit));
>    }
>    unmask(gic_irqX);
> }
>
> And that works perfectly correct without masking the MSI interrupt at
> the PCI level for a threaded handler simply because the PCI device will
> not send another interrupt until the previous one has been handled by
> the driver unless the PCI device is broken.
I'm missing something here, isn't this implementation blocks IRQ's only during
the HW handler and not during the threaded handler ? (Assuming that I selected
handle_level_irq() as the default handler)

Actually my implementation current implementation is very similar to what
you just described:

static void eq_msi_isr(struct irq_desc *desc)
{
        struct irq_chip *chip = irq_desc_get_chip(desc);
        struct eq_msi *msi;
        u16 status;
        unsigned long bitmap;
        u32 bit;
        u32 virq;

        chained_irq_enter(chip, desc);
        msi = irq_desc_get_handler_data(desc);

        while ((status = readw(msi->gcsr_regs_base + LINK_GCSR5_OFFSET)
                & MSI_IRQ_REQ) != 0) {
                pr_debug("MSI: %x\n", status >> 12);

                bitmap = status >> 12;
                for_each_set_bit(bit, &bitmap, msi->num_of_vectors) {
                        virq = irq_find_mapping(msi->inner_domain, bit);
                        if (virq) {
                                generic_handle_irq(virq);
                        } else {
                                pr_err("unexpected MSI\n");
                                handle_bad_irq(desc);
                        }
                }
        }
        chained_irq_exit(chip, desc);
}

Additionally the domain allocation is defined like:
static int eq_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
                                 unsigned int nr_irqs, void *args)
{
        struct eq_msi *msi = domain->host_data;
        unsigned long bit;
        u32 mask;

        /* We only allow 32 MSI per device */
        WARN_ON(nr_irqs > 32);
        if (nr_irqs > 32)
                return -ENOSPC;

        bit = find_first_zero_bit(msi->used, msi->num_of_vectors);
        if (bit >= msi->num_of_vectors)
                return -ENOSPC;

        set_bit(bit, msi->used);

        mask = readw(msi->gcsr_regs_base + LINK_GCSR6_OFFSET);
        mask |= BIT(bit) << 12;
        writew(mask, msi->gcsr_regs_base + LINK_GCSR6_OFFSET);

        irq_domain_set_info(domain, virq, bit, &eq_msi_bottom_irq_chip,
                            domain->host_data, handle_level_irq,
                            NULL, NULL);

        pr_debug("Enabling MSI irq: %lu\n", bit);

        return 0;
}

>
> If that's the case, then you have to work around that at the device
> driver level, not at the irq chip level, by installing a primary handler
> which quiesces the device (not the MSI part).
>
> Thanks,
>
>         tglx