[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <463BA906.30205@roinet.com>
Date: Fri, 04 May 2007 17:43:34 -0400
From: David Acker <dacker@...net.com>
To: Milton Miller <miltonm@....com>
CC: Scott Feldman <sfeldma@...ox.com>, Jeff Garzik <jeff@...zik.org>,
e1000-devel@...ts.sourceforge.net, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>,
John Ronciak <john.ronciak@...el.com>,
Jesse Brandeburg <jesse.brandeburg@...el.com>,
Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
Auke Kok <auke-jan.h.kok@...el.com>
Subject: Re: [PATCH] e100 rx: or s and el bits
David Acker wrote:
> So far my testing has shown both the original and the new version of the
> S-bit patch work in that no corruption seemed to occur over long term
> runs.
I spoke too soon. Further testing has not gone well. If I use the
default settings for CPU saver and drop the receive pool down to 16
buffers I can cause problems with various forms of the patch. With the
original S-bit patch I can get:
e100: eth0: e100_update_stats: exec cuc_dump_reset failed
At this point sends and receives are failing and their counters are not
changing. When a raw socket opened on the port tries to send it gets
errno 11, Resource temporarily unavailable.
When this error occurred, I added a reset call through the tx timeout
schedule_work(&nic->tx_timeout_task);
This made the port come back but obviously this is somewhat of a kludge
and eventually, the entire system hangs or get an oops from memory
corruption.
Apparently my earlier tests got lucky with a large receive pool size.
The only way the original patch was stable for me was by disabling CPU
saver, and setting the NAPI weight (default of 16) as large as the
default receive pool size (256) so that all buffers were claimed and
reallocated in one NAPI call. None of this should be needed so there is
definitely a problem with the original patch.
The updated patch produced a different issue. We got an RNR interrupt
indicating the receive unit got ahead of the software. The S-bit patch
removed any handling of this issue as it assumed the hardware would spin
on the sbit. Apparently if both the S-bit and the EL-bit are set on the
same RFD, it follows the EL-bit handling. Printing the stat/ack and
status bytes on the RNR interrupts I get:
status
01001000 = 0x48 = RUS of 0010 = No Resources, CUS of 01 = Suspended
stat/ack
01010000 = 0x50 = FR, RNR
or
00010000 = 0x10 = RNR
Notice that the RUS went into No Resources and not suspended. Thus
clearing the S-bit does not wake it up; it needs a new start command.
I could not find documentation that states that the S-bit need only be
cleared to take the RU out of suspended state. Before the S-bit patch
the driver tried to track this need but that version of the driver
didn't work for me either. By the way, I am using, "Intel 8255x 10/100
Mbps Ethernet Controller Family, Open Source Software Developer Manual,
January 2006" as my documentation.
This got me looking at just how in the world this worked on the old
eepro100 driver. It had another difference; it did not reap the last rx
buffer in the chain. It set a postponed bit and then picked it up on
the next interrupt after more buffers had been allocated. It then
noticed that the RU was in a suspended or no resources state and did a
softreset.
I don't believe this avoid the last buffer trick really fixes the race.
Imagine the following:
1. 4 buffers in receive pool, all freshly allocated
2. Hardware consumes 3 buffers
3. Software processes 3 buffers, begins to allocate new buffers
4. Hardware writes status bits into buffer 4 while software updates link
and command word bits in buffer 4. They share a cache line and corrupt
each other.
This appears to be possible with any of the versions of this driver I
have seen. The problem is one of packet ownership. Once the driver
gives a list of buffers to hardware, hardware owns them all. The driver
can not safely change these buffers. Sadly, this means that the idea of
the driver "staying ahead" of the hardware such that the hardware never
runs out of resources will not work here. Once the driver gives the
hardware a packet with S or EL bits set, it must let the hardware
encounter the packet and return it to software.
I think the driver needs to protect the last entry in the ring by
putting the S-bit on the entry before it. The first time the driver
allocates a block of packets, it writes a new S-bit out on the next to
last packet. As buffers complete it allocates more packets in the chain
but does not set a new S-bit since the old one will stop hardware. It
can not clear the old S-bit because the driver does not own the buffer,
hardware does. After processing the s-bit packet the hardware will
interrupt with a stat/ack of RNR and RUS of suspended.
When software processes a packet with an old S-bit it allocates new
buffers and sets the s-bit on the new next to last packet.
The above case changes now:
1. 4 buffers numbered 1-4 in a receive pool, all freshly allocated.
S-bit is on buffer 3.
2. Hardware consumes 3 buffers, hits S-bit, RNR interrupts
3. Software processes 3 buffers, begins to allocate new buffers
4. Software sends resume once buffers are allocated, S-bit is on buffer 2.
5. Hardware gets resume. When it processed buffer 3, it saved the link
to buffer 4 and thus resumes at buffer 4.
Here is a different flow where the software stays ahead:
1. 4 buffers numbered 1-4 in a receive pool, all freshly allocated.
S-bit is on buffer 3.
2. Hardware consumes 2 buffers (1, 2).
3. Software processes buffers 1, 2, begins to allocate new buffers
4. Software buffers 1, 2 are allocated
5. Hardware consumes 1 buffer (#3) and hits S-bit, RNR interrupts.
6. Software consumes 1 buffer, (#3) and finds the old S-bit. It
allocates a new buffer 3 and sets the S-bit on buffer 2.
7. Software sends resume, hardware continues at buffer 4.
In this setup, software will send a resume command every RING_SIZE
packets. RNR interrupts will also occur every RING_SIZE packets. When
hardware is faster than software, it will process RING_SIZE packets, RNR
interrupt and wait for software to process all of them. When software
is faster then hardware, hardware will still process RING_SIZE packets
before interrupting but software will only need to allocate 1 packet or
so before sending the resume so hardware will wait much less time.
This will probably slow things down since on a fast CPU, software will
normally stay ahead of the hardware and the only PCI operations from the
driver would be interrupt acks. With this change, we have PCI
operations every 256 packets. I don't see how else to do this in a safe
way on ARM (at least PXA255).
I am testing this over the weekend with a 16-buffer receive pool. If
all goes well, I will send a patch early next week. It will basically
back out the S-bit patch and then make the changes noted above.
-Ack
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists