linux-kernel - Re: [PATCH RFC v2] net: add PCINet driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200811051451.00065.arnd@arndb.de>
Date:	Wed, 5 Nov 2008 14:50:59 +0100
From:	Arnd Bergmann <arnd@...db.de>
To:	Ira Snyder <iws@...o.caltech.edu>
Cc:	linuxppc-dev@...abs.org, Stephen Hemminger <shemminger@...tta.com>,
	netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
	Jan-Bernd Themann <THEMANN@...ibm.com>
Subject: Re: [PATCH RFC v2] net: add PCINet driver

On Tuesday 04 November 2008, Ira Snyder wrote:
> On Tue, Nov 04, 2008 at 09:23:03PM +0100, Arnd Bergmann wrote:
> > On Tuesday 04 November 2008, Ira Snyder wrote:
> > > I don't really know how to do that. I got a warning here from sparse
> > > telling me something about expensive pointer subtraction. Adding a dummy
> > > 32bit padding variable got rid of the warning, but I didn't change the
> > > driver.
> > 
> > Ok, I see. However, adding the packed attribute makes it more expensive
> > to use.
> > 
> 
> Ok. Is there any way to make sure that the structure compiles to the
> same representation on the host and agent system without using packed?

Only knowledge about the alignment on all the possible architectures ;-)
As a simplified rule, always pad every struct member to the largest
other member in the struct and always use explicitly sized types like
__u8 or __le32.
 
> Hopefully that's a good description. :) It seems to me that both sides
> of the connection need to read the descriptors (to get packet length,
> clean up dirty packets, etc.) and write them (to set packet length, mark
> packets dirty, etc.) I just can't come up with something that is
> local-read / remote-write only.

If I understand your description correctly, the only remote read is
when the host accesses the buffer descriptors to find free space.
Avoiding this read access may improve the latency a bit. In our ring
buffer concept, both host and endpoint allocate a memory buffer that
gets ioremapped into the remote side. Since you always need to read
the descriptors from powerpc, you should probably keep them in powerpc
memory, but you can change the code so that for finding the next
free entry, the host will look in its own memory for the number of the
next entry, and the powerpc side will write that when it consumes a
descriptor to mark it as free.

> > Which side allocates them anyway? Since you use ioread32/iowrite32
> > on the ppc side, it looks like they are on the PCI host, which does
> > not seem to make much sense, because the ppc memory is much closer
> > to the DMA engine?
> > 
> 
> The PowerPC allocates them. They are accessible via PCI BAR1. They live
> in regular RAM on the PowerPC. I can't remember why I used
> ioread32/iowrite32 anymore. I'll try again with in_le32()/out_le32() on
> the PowerPC system, and see what happens.

Actually, if they are in powerpc RAM, you must not neither in_le32 nor
ioread32. Both are only well-defined on I/O devices (local bus or PCI,
respectively). Instead, you should use directly access the buffer using
pointer dereferences, and use rmb()/wmb() to make sure anything you
access is synchronized with the host.

> > Obviously, you want the DMA engine to do the data transfers, but here, you
> > use ioread32 for mmio transfers to the descriptors, which is slow.
> > 
> 
> I didn't know it was slow :) Maybe this is why I had to make the MTU
> very large to get good speed. Using a standard 1500 byte MTU I get
> <10 MB/sec transfer speed. Using a 64K MTU, I get ~45MB/sec transfer
> speed.
> 
> Do I need to do any sort of flushing to make sure that the read has
> actually gone out of cache and into memory? When the host accesses the
> buffer descriptors over PCI, it can only view memory. If a write is
> still in the PowerPC cache, the host will get stale data.

The access over the bus is cache-coherent, unless you are on one of the
more obscure powerpc implementations. This means you do not have a
problem with data still being in cache. However, you need to make
sure that data arrives in the right order. DMA read accesses over PCI
may be reordered, and you need a wmb() between two memory stores if you
want to be sure that the host sees them in the correct order.

> > > Yep, I tried to do this. I couldn't figure out a sane ordering that
> > > would work. I tried to keep the network and uart as seperate as possible
> > > in the code.
> > 
> > I'd suggest splitting the uart code into a separate driver then.
> > 
> 
> How? In Linux we can only have one driver for a certain set of hardware.
> I use the messaging unit to do both network (interrupts and status bits)
> and uart (interrupts and message transfer).
> 
> Both the network and uart _must_ run at the same time. This way I can
> type into the bootloader prompt to start a network transfer, and watch
> it complete.
> 
> Remember, I can't have a real serial console plugged into this board.
> I'll be using this with about 150 boards in 8 separate chassis, which
> makes cabling a nightmare. I'm trying to do as much as possible with the
> PCI backplane.
 
When splitting out the hardware specific parts, I would write a device
driver for the messaging unit that knows about neither the uart nor the
network (or any other high-level protocol). It's a bit more complicated to
load the two high-level drivers in that case, but one clean way to do
it would be to instantiate a new bus-type from the MU driver and have
that driver register devices for itself. Then you can load the high-level
driver through udev or have them built into the kernel.

To get really fancy, you could find a way for the host to announce what
protocols are supported on though the MU. A use case for that, which I
have been thinking about before, would be to allow the host to set up
direct virtual point-to-point networks between two endpoints, not involving
the host at all once the device is up.

	Arnd <><
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/