lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 02 Sep 2014 08:51:53 -0700
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Johannes Berg <johannes@...solutions.net>
Cc:	netdev <netdev@...r.kernel.org>,
	linux-wireless <linux-wireless@...r.kernel.org>,
	Ido Yariv <ido@...ery.com>,
	Emmanuel Grumbach <egrumbach@...il.com>
Subject: Re: truesize for pages shared between SKBs

On Tue, 2014-09-02 at 14:20 +0200, Johannes Berg wrote:
> Hi,
> 
> In our driver, we have 4k receive buffers, but usually ~1500 byte
> packets.

Which driver exactly is that ?

> 
> How do other drivers handle this? We currently set up the truesize of
> each SKB to be its size plus the 4k page size, but we see performance
> improvements when we lie and pretend the truesize is just 4k/(# of
> packets in the page), which is correct as long as the packets are all
> pending in the stack since they share the page.

Can you elaborate on 'they share the page' ?

If a 4K page is really split into 2 2KB subpages, then yes, truesize can
be 2KB + skb->head + sizeof(struct sk_buff)

Some drivers do that (Intel IGBVF for example)

If a single 4KB page can be used by a single 1500 frame, then its not
shared ;)

> 
> How do other drivers handle this? Should the truesize maybe be aware of
> this kind of sharing? Should we just lie about it and risk that the
> truesize is accounted erroneously if some but not all of the packets are
> freed?

Lies are not worth crashing hosts under memory pressure.

skb->truesize is really how many bytes are consumed by an skb. This
serves in TCP stack to trigger collapses when a socket reaches its
limits.

Your performance is better when you lie, because for the same
sk->sk_rcvbuf value (typically tcp_rmem[2]), TCP window can be bigger,
and allows TCP sender to send more packets (bigger cwnd)

Workaround : make tcp_rmem[2] larger, so that we still have an
appropriate memory limit per socket, acting for OOM prevention, and
allowing better performance for large BDP flows.

Current value is 6MB, which is already quite big IMO for well behaving
drivers.

Real fix would be to make your skb as slim as possible of course.
It helps even if GRO or TCP coalescing can reduce the memory
requirements for bulk flows.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ