[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1409673647.1808.25.camel@jlt4.sipsolutions.net>
Date: Tue, 02 Sep 2014 18:00:47 +0200
From: Johannes Berg <johannes@...solutions.net>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: netdev <netdev@...r.kernel.org>,
linux-wireless <linux-wireless@...r.kernel.org>,
Ido Yariv <ido@...ery.com>,
Emmanuel Grumbach <egrumbach@...il.com>
Subject: Re: truesize for pages shared between SKBs
On Tue, 2014-09-02 at 08:51 -0700, Eric Dumazet wrote:
> > In our driver, we have 4k receive buffers, but usually ~1500 byte
> > packets.
>
> Which driver exactly is that ?
iwlwifi/iwlmvm, of course :)
> Can you elaborate on 'they share the page' ?
>
> If a 4K page is really split into 2 2KB subpages, then yes, truesize can
> be 2KB + skb->head + sizeof(struct sk_buff)
What do you mean by "split"?
There could be any number of packets (though usually two in the
interesting case) in the page, and we call skb_add_rx_frag() for both
packets, pointing to the same page, with different offsets.
> Some drivers do that (Intel IGBVF for example)
It seems to split into two unconditionally, which is interesting.
> If a single 4KB page can be used by a single 1500 frame, then its not
> shared ;)
Right, obviously :)
> > How do other drivers handle this? Should the truesize maybe be aware of
> > this kind of sharing? Should we just lie about it and risk that the
> > truesize is accounted erroneously if some but not all of the packets are
> > freed?
>
> Lies are not worth crashing hosts under memory pressure.
>
> skb->truesize is really how many bytes are consumed by an skb. This
> serves in TCP stack to trigger collapses when a socket reaches its
> limits.
Right - the question is more what "consumed" means in this case. Should
it be correct at the time of SKB creation (in which case we should split
by the number of packets created from the page) or should it be correct
for the worst case (all but one packet are freed quickly, one remains
stuck on some socket with the full 4k page allocated to it) ...
Or maybe we should make it even more complex and check the page sharing
in conjunction with GRO...
> Your performance is better when you lie, because for the same
> sk->sk_rcvbuf value (typically tcp_rmem[2]), TCP window can be bigger,
> and allows TCP sender to send more packets (bigger cwnd)
>
> Workaround : make tcp_rmem[2] larger, so that we still have an
> appropriate memory limit per socket, acting for OOM prevention, and
> allowing better performance for large BDP flows.
>
> Current value is 6MB, which is already quite big IMO for well behaving
> drivers.
>
> Real fix would be to make your skb as slim as possible of course.
> It helps even if GRO or TCP coalescing can reduce the memory
> requirements for bulk flows.
Sure, it's always a trade-off though. If we want to *actually* make it
smaller we'd have to copy the data, which doesn't buy us much either.
The hardware is limited in the RX buffer handling, and there's actually
a chance we might receive close to 4k in a single frame with A-MSDU.
If you wanted to use the medium more efficiently you'd set the MTU
higher, for better page utilisation to a little under 2000 (a perfectly
valid value for 802.11). Nobody really does that though :-)
johannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists