[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1204898997.4220.41.camel@moonstone.uk.level5networks.com>
Date: Fri, 07 Mar 2008 14:09:57 +0000
From: Kieran Mansley <kmansley@...arflare.com>
To: netdev@...r.kernel.org
Subject: LRO/GSO interaction when packets are forwarded
We've seen a couple of problems when using a bridge or IP forwarding
combined with LRO packets generated by a network device driver. As you
know, LRO packets can be either be page based (and passed up with
lro_receive_page()) or use the skb frag_list (and passed up with
lro_receive_skb()). In both cases it is likely that the device driver
will have set CHECKSUM_UNNECESSARY to indicate that the packet has been
checksummed by the device, and gso_size to mark it as an LRO packet and
indicate the actual received MSS.
If this skb goes directly to the network stack everything is fine. The
problem comes when this packet instead goes into a bridge and is then
retransmitted on another device. The skb seems to pass through the
bridge relatively unmodified and because it has gso_size set the
transmit path will attempt to segment it. If page-based allocation has
been used, this is fine, but if the skb frag_list has been used the
transmit path BUGs in skb_gso_segment():
http://lxr.linux.no/linux+v2.6.24.3/net/core/dev.c#L1410
Secondly, the same function hopes that a GSO packet will have
CHECKSUM_PARTIAL set - if this packet had originated from a stack rather
than from an LRO device this would be the case - but instead it will
most likely have CHECKSUM_UNNECESSARY.
Both of these problems are essentially being caused by gso_size and the
ip_summed field have slightly different meanings on the receive and
transmit paths, and the bridge/IP forwarding stuff not translating from
one to the other. To be fair to the bridge, it would not be obvious to
it that it will be passing the packet to a real device (that will invoke
the transmit path) or to a stack.
This leads me to my questions:
- any idea why other drivers aren't hitting this problem? One
possibility is that they're using lro_receive_page rather then
lro_receive_skb, but I'd still expect to see the CHECKSUM_PARTIAL
warning. I'm wondering if having LRO and forwarding between devices is
a relatively rare thing, and so it just hasn't been tested.
- any suggestion as to the best place to try and fix this up? My
preference is making the transmit path cope with a packet that has the
frag_list in use. Making it cope with CHECKSUM_UNNECESSARY should also
be possible but to be honest I'm finding skb_gso_segment's handling of
CHECKSUM_PARTIAL a bit hard to follow. The alternative would be I
suppose to get the bridge and IP forwarding code to fix the socket
buffer up before transmitting it, or for the driver to somehow know that
it this packet will be forwarded and so it shouldn't use LRO.
Of course, if we're hitting this because we're doing something wrong and
you're confident it's not a problem in Linux, I'd be grateful to know!
Here's a stack trace showing the path a packet that hits this might
take:
[<c0106831>] die+0x111/0x210
[<c0106d67>] do_trap+0x97/0xf0
[<c0107149>] do_invalid_op+0x89/0xa0
[<c033c2fa>] error_code+0x72/0x78
[<c02d41de>] dev_hard_start_xmit+0x1ae/0x2c0
[<c02e276f>] __qdisc_run+0x4f/0x1d0
[<c02d45c1>] dev_queue_xmit+0x2d1/0x350
[<f8ae4054>] br_dev_queue_push_xmit+0x64/0xb0 [bridge]
[<f8ae8bd3>] br_nf_dev_queue_xmit+0x13/0x40 [bridge]
[<f8ae90b0>] br_nf_post_routing+0x1b0/0x1f0 [bridge]
[<c02e724b>] nf_iterate+0x5b/0x90
[<c02e72ca>] nf_hook_slow+0x4a/0xc0
[<f8ae41b6>] br_forward_finish+0x46/0x60 [bridge]
[<f8ae9317>] br_nf_forward_finish+0xc7/0x160 [bridge]
[<f8ae98e7>] br_nf_forward_ip+0x137/0x1b0 [bridge]
[<c02e724b>] nf_iterate+0x5b/0x90
[<c02e72ca>] nf_hook_slow+0x4a/0xc0
[<f8ae4225>] __br_forward+0x55/0x80 [bridge]
[<f8ae4307>] br_forward+0x27/0x30 [bridge]
[<f8ae4cfd>] br_handle_frame_finish+0xed/0x150 [bridge]
[<f8ae960e>] br_nf_pre_routing_finish+0x1be/0x360 [bridge]
[<f8ae9f15>] br_nf_pre_routing+0x425/0x6e0 [bridge]
[<c02e724b>] nf_iterate+0x5b/0x90
[<c02e72ca>] nf_hook_slow+0x4a/0xc0
[<f8ae4ecb>] br_handle_frame+0x16b/0x210 [bridge]
[<c02d4856>] netif_receive_skb+0x216/0x310
[<c02d49b6>] process_backlog+0x66/0xd0
[<c02d0c72>] net_rx_action+0xd2/0x170
[<c0131f72>] __do_softirq+0x82/0x100
[<c0107f11>] do_softirq+0x71/0xc0
skb_gso_segment is called from dev_gso_segment, which is called from
dev_hard_start_xmit, which is shown in the stack trace.
Thanks
Kieran
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists