[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <11dfbc12d3b8451aad1226d185d44228@AcuMS.aculab.com>
Date: Mon, 6 Nov 2023 08:56:38 +0000
From: David Laight <David.Laight@...LAB.COM>
To: David Laight <David.Laight@...LAB.COM>,
'David Epping' <david.epping@...singlinkelectronics.com>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>,
Dinh Nguyen <dinguyen@...nel.org>,
Ley Foon Tan <ley.foon.tan@...el.com>,
Lorenzo Pieralisi <lorenzo.pieralisi@....com>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
Bjorn Helgaas <bhelgaas@...gle.com>,
Krzysztof Wilczyński <kw@...ux.com>
Subject: RE: mach-socfpga: PCIe Root IO TLP support for Cyclone V
From: David Laight
> Sent: 05 November 2023 11:20
>
> From: David Epping
> > Sent: 31 October 2023 10:58
> >
> > Hello ARM PCIe and especially Intel Altera SOCFPGA maintainers,
> >
> > the Intel Altera Cyclone V PCIe Root Complex drivers afaik currently
> > don’t support sending IO TLPs.
> > The Root Complex IP Core, seemingly unlike many other ARM Root Complexes,
>
> It isn't an ARM root complex ...
> I didn't think any of the Cyclone V had embedded arm cpu.
> I know some of the more recent Altera FPGA do, by the Cyclone V
> is pretty old now - although we are still using them in new cards.
> (Only as PCIe endpoints though.)
>
> > does not offer a memory mapping for the IO address space, but instead relies
> > on indirect addressing via address and data registers.
>
> If you are building the FPGA image then all the logic to convert the
> memory mapped slave cycles (into the fpga logic) is supplied as
> verilog source.
> So you should be able to 'fix' it do generate IO TLP instead of data
> TLP for certain addresses.
> (A few years back we had to fix it to correctly process multiple
> data TLP in response to a single read TLP - not a problem now.)
Another idea would to be to write an Avalon slave that converts
a single read/write into the required sequence of transfers into the
cyclone V PCIe block to generate the IO TLP from a cpu memory access.
That isn't hard to write.
It would also let you implement posted writes and asynchronous reads.
Although the drivers won't expect async reads the PCIe is slow
enough that they really do make sense.
In my measurements a Cyclone V endpoint (the root will be much the
same) takes about 128 clocks (of the 125MHz PCIe clock) to process
an incoming read TLP.
This stalls a 3GHz host for about 3000 clocks.
IIRC the time for an outgoing read is much the same, the local
cpu will be slower (ours are 62.5MHz Nios) but it is still
significant.
Most of the PCIe transfers we do are from a locally written
multi-channel DMA block that generates 128byte TLP.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists