lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1336569529.25514.109.camel@zakaz.uk.xensource.com>
Date:	Wed, 9 May 2012 14:18:49 +0100
From:	Ian Campbell <Ian.Campbell@...rix.com>
To:	"Michael S. Tsirkin" <mst@...hat.com>
CC:	David Miller <davem@...emloft.net>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"eric.dumazet@...il.com" <eric.dumazet@...il.com>
Subject: Re: [PATCH RFC 0/6] copy aside frags with destructors (was [PATCH
 7/9] net: add skb_orphan_frags to copy aside frags with destructors)

On Mon, 2012-05-07 at 14:53 +0100, Michael S. Tsirkin wrote:
> On Fri, May 04, 2012 at 11:51:31AM +0100, Ian Campbell wrote:
> > On Fri, 2012-05-04 at 11:03 +0100, Michael S. Tsirkin wrote:
> > > On Fri, May 04, 2012 at 02:54:33AM -0400, David Miller wrote:
> > > > From: "Michael S. Tsirkin" <mst@...hat.com>
> > > > Date: Fri, 4 May 2012 00:10:24 +0300
> > > > 
> > > > > Hmm we orphan skbs when we loop them back so how about reusing the
> > > > > skb->destructor for this?
> > > > 
> > > > That's one possibility.
> 
> So originally I thought it would work: destructor would wake the
> original owner which would do data copy and modify the fragments.  But
> then I unfortunately realized that would be racy: the new owner could be

By "new owner" you mean the owner of some other skb which refernences
the same shinfo (so either clone or the original if we currently hold a
clone).

> using the old frags and there appears no way for us to
> make sure it doesn't so that we can put the original page.

Right, the key issue is, I think, that cloning etc take an extra dataref
on the shinfo but do not take references on the frags. This is obviously
sane but does cause the issue above.

> And the same logic applies to modifying the frags at any
> other time if the skb is cloned. So it seems we must copy if we
> want to clone the skb.

Right. I had been hoping that the frag destructors could avoid this but
it seems like this is not going to be the case.

Just as a thought experiment would taking a frag ref on clone help us?
(lets assume for now that, iff there are destructors, this is cheaper
than the copy). I think this doesn't work -- since although the ref will
mean that the page doesn't go away under anyone elses feet after we've
orphaned it we will have lost the pointer to the original page by the
time that other ref is dropped, so freeing it will be tricky.

So following that train of thought a step further -- would creating a
new shinfo entirely on frag orphan buy us anything (and at what cost)?
Obviously it an extra allocation -- but we are already allocating N
pages of new frag. The others skb's referencing the shared shinfo would
still continue to see the old shinfo, pages and refs until they
themselves needed to orphan the frags (if they do, they might avoid it).

I think that could work, but I'm not sure it is worth the complexity.
Basically all it means that if you are bridging + flooding then you
would potentially save some number of a copies depending on the types of
devices on each port.

[...]

> Second option is what macvtap zero copy uses and it already
> does copy on clone too. So I hacked that to make it support tcp/udp
> used by sunrpc.

This does seem like it is the preferable solution.

I haven't read the patches yet, so maybe you already support this, but
it would be good to allow the creator of an skb to declare that it
doesn't want this behaviour, e.g. because it has some other mechanism
for dealing with this copy out for the case where an skb gets held
somewhere.

Ian.

> > > > 
> > > > But I fear we're about to toss Ian into yet another rabbit hole. :-)
> > > > 
> > > > Let's try to converge on something quickly as I think integration of
> > > > his work has been delayed enough as-is.
> 
> ...
> 
> > > It's weekend here, I'll work on a patch like this
> > > Sunday.
> > 
> > Thanks, I was starting to feel my nose twitching and my ears beginning
> > to elongate ;-)
> > 
> > Ian.
> 
> Here's a first stub at a fix. Basically to be able to modify frags on
> the fly we must make sure the skb isn't cloned, so the moment someone
> clones the skb we need to trigger the frag copy logic.  And this is
> exactly what happens with SKBTX_DEV_ZEROCOPY so it seems to make sense
> to reuse that logic.
> 
> The below patchset replaces patch 7
> ([PATCH 7/9] net: add skb_orphan_frags to copy aside frags with destructors)
> in Ian's patchset and needs to be applied there.
> 
> 
> Compiled only but I'd like to hear what people think all the same
> because it does add a couple of branches on fast path.  On the other
> hand this makes it generic so the same logic will be reusable for packet
> sockets (which IIRC are currently buggy in the same way as sunrpc) and
> for adding zero copy support to tun.
> 
> Please comment,
> Thanks!
> 
> -- 
> MST
> 
> 
> Michael S. Tsirkin (6):
>   skbuff: support per-page destructors in copy_ubufs
>   skbuff: add an api to orphan frags
>   skbuff: convert to skb_orphan_frags
>   tun: orphan frags on xmit
>   net: orphan frags on receive
>   skbuff: set zerocopy flag on frag destructor
> 
>  drivers/net/tun.c      |    2 ++
>  include/linux/skbuff.h |   41 +++++++++++++++++++++++++++++++++++++++++
>  net/core/dev.c         |    2 ++
>  net/core/skbuff.c      |   43 +++++++++++++++++++++++++------------------
>  4 files changed, 70 insertions(+), 18 deletions(-)
> 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ