lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 01 Jun 2007 17:13:34 -0400 From: Jeff Garzik <jgarzik@...ox.com> To: David Acker <dacker@...net.com> CC: Milton Miller <miltonm@....com>, "Kok, Auke" <auke-jan.h.kok@...el.com>, e1000-devel@...ts.sourceforge.net, Jeff Kirsher <jeffrey.t.kirsher@...el.com>, John Ronciak <john.ronciak@...el.com>, Jesse Brandeburg <jesse.brandeburg@...el.com>, Scott Feldman <sfeldma@...ox.com>, netdev@...r.kernel.org Subject: Re: [PATCH] fix e100 rx path on ARM (was [PATCH] e100 rx: or s and el bits) David Acker wrote: > Milton Miller wrote: >> Your logic works, this adds some conditional branching but saves a >> pointer, not sure which is better. Mine can be used for initializing >> without special casing since it will just calculate rx_to_unmark as >> rx[n-2] and rx_to_mark as rx[n-2] != rx[n-2]->prev; unmarking a never >> marked still works, where as for yours the "was marked" must be >> explicitly set to something (hmm, rxs = rx[0] might work for that >> initial value until we mark -2??) >> >> again, the compare of rx->el instead of rx->rfd->el is to save cache >> traffic to the rfd under io. The rx->was_0 is required, so the el >> flag is free. >> > > Ok, I took a stab at coding and testing these ideas. Below is a patch > against 2.6.22-rc3. > Let me know what you think. Testing shows that we can hit any of the > following scenarios: > > Find a buffer not complete with rx->el and rx->s0 set. I read back the > status and see if > the receiver is still running. > Find a buffer not complete with rx->el not set and rx->s0 set. I check > the next buffer > and if it is complete, I free the skb and return 0. If the next buffer > is not > complete, I read the receiver's status to see if he is still running. > Find a buffer that is complete with rx->el not set and rx->s0 set. It > appears that hardware > can read the rfd's el-bit, then software can clear the rfd el-bit and > set the rfd size, and > then hardware can come in and read the size. I am reading the status > back, although > I don't think that I have to in this instance. > > I am testing a version of this code patched against 2.6.18.4 on my PXA > 255 based system. I will let you all know how it goes. > > On the ARM, their is a race condition between software allocating a new > receive > buffer and hardware writing into a buffer. The two race on touching the > last > Receive Frame Descriptor (RFD). It has its el-bit set and its next link > equal > to 0. When hardware encounters this buffer it attempts to write data to it > and then update Status Word bits and Actual Count in the RFD. At the > same time > software may try to clear the el-bit and set the link address to a new > buffer. > Since the entire RFD is once cache-line, the two write operations can > collide. This can lead to the receive unit stalling or freed receive > buffers > getting written to. > > The fix is to set the el-bit on and the size to 0 on the next to last > buffer > in the chain. When the hardware encounters this buffer it stops and does > not write to it at all. The hardware issues an RNR interrupt with the > receive unit in the No Resources state. When software allocates buffers, > it can update the tail of the list when either it knows the hardware > has stopped or the previous to the new one to mark marked. > Once it has a new next to last buffer prepared, it can clear the el-bit > and set the size on the previous one. The race on this buffer is safe > since the link already points to a valid next buffer. We keep flags > in our software descriptor to note if the el bit is set and if the size > was 0. When we clear the RFD's el bit and set its size, we also clear > the el flag but we leave the size was 0 bit set. This was we can find > this buffer again later. > > If the hardware sees the el-bit cleared without the size set, it will > move on to the next buffer and skip this one. If it sees > the size set but the el-bit still set, it will complete that buffer > and then RNR interrupt and wait. > > > Signed-off-by: David Acker <dacker@...net.com> That seems to vaguely match my memory of what eepro100 was doing (or trying to do). I _really_ appreciate you working on this problem. Getting e100 driver stable for the long term, and ditching eepro100, is a big hurdle to cross. Getting this right is really one of the last steps. The patch looks OK at quick glance. Thanks, Jeff - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists