[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20131008145331.GA4767@order.stressinduktion.org>
Date: Tue, 8 Oct 2013 16:53:31 +0200
From: Hannes Frederic Sowa <hannes@...essinduktion.org>
To: Jon Mason <jdmason@...zu.us>
Cc: Jiri Pirko <jiri@...nulli.us>, netdev <netdev@...r.kernel.org>,
yoshfuji@...ux-ipv6.org, David Miller <davem@...emloft.net>,
kuznet@....inr.ac.ru, jmorris@...ei.org, kaber@...sh.net,
herbert@...dor.apana.org.au, Eric Dumazet <eric.dumazet@...il.com>
Subject: Re: Neterion and UFO handling [was: Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO]
On Tue, Oct 08, 2013 at 01:07:29AM -0700, Jon Mason wrote:
> On Wed, Oct 2, 2013 at 9:27 AM, Hannes Frederic Sowa
> <hannes@...essinduktion.org> wrote:
> > Hi!
> >
> > I have a question regarding UFO and the neterion driver, which as the only one
> > advertises hardware UFO support:
> >
> > The patch discusses in this thread
> > http://thread.gmane.org/gmane.linux.network/284348/focus=285405 could change
> > some semantics how packets are constructed before submitted to the driver.
> >
> > We currently guarantee that we have the MAC/IP/UDP header in skb->data and the
> > payload is attached in the skb's frags. With the changes discussed in this
> > thread it is possible that we also append to skb->data some amount of data
> > which is not targeted for the header. From reading the driver sources it seems
> > the hardware interprets the skb->data to skb_headlen as the header, so we
> > could include some data in the fragments more than once.
>
> From my reading of the HW Spec and a quick look at the driver, it
> appears that the driver is using one entry in the TX ring for the
> header and another for the body of the packet to be fragmented (which
> is what the hardware wants). I don't understand what you are saying,
> but if you are asking if simply appending a new header & data to the
> end of skb->data will get it out on the wire correct, I don't believe
> it will.
No this is not what I tried to say. I'll try to be more clear this
time. ;)
We start with an UDP socket which is corked. As soon as we write the
first few bytes (smaller than the mtu) onto this socket we put the
header in place and the rest of the data is just appended behind the
header directly in skb->data via plain ip_append_data.
Now a second write with a length > mtu happens: The ip(6)_append_data
will branch to ufo_append. This will fetch the first skb and append
to skb->frags. gso_type and gso_size will be updated on this skb (this
currently does not happen but will with the patches discussed in this
thread).
If this packet is transmitted down to the device driver we have the udp
header in skb->data *and* also the payload from the first write. The
payload from the second write is appended as a frag and gso_type and
gso_size are set. This header+payload seem to be mapped just after the
ufo_in_band_v descriptor as the header in the first tx descriptor:
4174 txdp->Buffer_Pointer = pci_map_single(sp->pdev, skb->data,
4175 frg_len, PCI_DMA_TODEVICE);
frg_len is set to skb_headlen(skb). This happens right after setting up
the descriptor for the in-band ufo data.
My guess is that this data isn't split currently by the neterion driver
(at least I could not find it in the driver as Eric showed it for bnx2x)
so it might reappear in the packets when the hardware fragments the
packet and places the first tx ring in front of every packet.
Before these changes we never updated the gso_type and gso_size even when
we did append via UFO. So we never had payload in an UFO marked skb->data,
only the headers. Now we could also end up with a some payload in the
first TX ring, which you said is only for the header.
> I do have hardware that I can try the patch on, if you can walk me
> through the use case (unless it is as easy as setup an IPv6 connection
> and ping).
Ok, testing this should not be that complicated:
We can test this with plain IPv4/UDP sockets. I would suggest a net-next kernel
with this patch from Jiri applied: http://patchwork.ozlabs.org/patch/279691/
--- >8 ---
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <linux/udp.h>
#include <stdio.h>
int test(int mtu)
{
int fd;
const int one = 1;
const int off = 0;
struct sockaddr_in addr = {.sin_family = AF_INET, .sin_port = htons(53) };
unsigned char buffer[3701];
inet_pton(AF_INET, "127.0.0.1", &addr.sin_addr);
fd = socket(AF_INET, SOCK_DGRAM, 0);
connect(fd, (struct sockaddr *) &addr, sizeof(addr));
setsockopt(fd, IPPROTO_UDP, UDP_CORK, &one, sizeof(one));
write(fd, " ", 4);
write(fd, buffer, sizeof(buffer));
write(fd, " ", 1);
setsockopt(fd, IPPROTO_UDP, UDP_CORK, &off, sizeof(off));
close(fd);
}
int main() {
test(1280);
}
--- >8 ---
I left out error handling so it is better observed with strace if
something went wrong.
You should change the port number and ip address to something reasonable
for your network. My guess would be that the spaces (0x20) of the first
write is now placed between UDP header and payload of every packet
fragmented by the hardware. Would be nice to hear that I am wrong. ;)
Be aware that the above program can cause memory corruption in the kernel
if you did not apply Jiri's patch.
Thanks for helping!
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists