[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <063D6719AE5E284EB5DD2968C1650D6DB027CB6A@AcuExch.aculab.com>
Date: Mon, 6 Feb 2017 17:23:54 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Alexander Duyck' <alexander.duyck@...il.com>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>
Subject: RE: Disabling msix interrupts
From: Alexander
> Sent: 06 February 2017 16:27
> To: David Laight
> On Mon, Feb 6, 2017 at 7:33 AM, David Laight <David.Laight@...lab.com> wrote:
> > netdev probably isn't the right list for this, but I suspect people
> > reading it understand what happens.
> >
> > I'm fairly sure that an msix interrupt can get raised after
> > the kernel thinks it has masked it.
> >
> > When an msix interrupt is disabled I think msi_set_mask_bit()
> > (in drivers/pci/msi.c) is called to write a '1' to the card's
> > hardware MSIX mask register (the last 32bit word of the entry).
> > This function carefully reads back the mask register to flush
> > the write through the pcie bus.
> > Except it doesn't, it reads the 'address_lo' register instead! [1]
>
> Reading any register will force the write to be flushed as all PCI
> writes must be completed before a PCI read. For example in the Intel
> drivers we read register 0 to flush a write of any of the other
> registers.
Sorry brain fade on that one.
Although the 'store buffer' on the sparc cpus I used to use would
let reads overtake writes. So you did have to read back the address
of the last write - not sure about modern sparc cpus.
> > While this will stop the hardware raising any more interrupts,
> > it could easily be in the process of raising one.
> > ie have read the mask, found it zero, read the address and
> > data, and be in the process of issuing the pcie write.
> >
> > The pcie write (to disable the interrupt) and readback are seen
> > by the hardware as (more or less) back to back transfers, so can
> > both easily overtake the request to raise the interrupt.
>
> I don't believe this is correct. On the PCI bus what you should see
> is the device aware that interrupts are disabled before the completion
> arrives and any MSI messages should arrive before the last read
> completion.
You are making the assumption that it takes the hardware zero
time to raise msix interrupts, and/or that the hardware logic that
raises the interrupts is tightly coupled with that which generates
the read completions.
The fpga logic we use for raising interrupts reads the mask last,
but then has to do three 'bus' cycles to get the 64bit address
configured and request the pcie write.
Meanwhile the pcie <=> msix table logic will be running asynchronously
so will quite easily complete the readback before the request itself
is sent.
I'm pretty sure there is enough async in the 'write' path that even if
I aborted raising the interrupt if there was a write to the msix table
the read completion could still overtake the write.
> > The pcie bus is also allowed to make a read completion tlp
> > overtake a write tlp.
> > Add in any host-side delays in raising the hardware interrupt
> > itself, and an interrupt could happen well after it was masked.
>
> That is only if relaxed ordering is enabled, which I am not sure is
> supported for MSI-X interrupts.
The PCIE book on my desk is not helpful wrt reordering.
I know reads and read completion can be reordered - I fixed the fpga
logic (from the manufacturer) so that it correctly processed interleaved
read completions.
I don't see anything obvious that would require writes and read completions
to be kept in order.
> That being said though I believe
> there are some platforms that could end up seeing a delay in handling
> the interrupt if for example the CPU the interrupt was delivered to
> was in a sleep state.
>
> > More worrying would be any code that tries to change the address
> > and data associated with an interrupt.
> > You'd need moderate guard times after the disable and before the
> > enable to ensure the hardware didn't raise an interrupt with
> > a mismatch of the old and new values.
> >
> > [1] Maybe I'll look at the order those cycles actually arrive in.
> >
> > David
>
> So I have thrown in my $.02 on this, and added the linux-pci mailing
> list. That is a much better place to bring this up rather than
> netdev.
>
> Thanks.
>
> - Alex
Powered by blists - more mailing lists