[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF=yD-KoUX=suZaaV3xzixSM2Kds65YRxCR4HyXXkKp8=aN+XQ@mail.gmail.com>
Date: Mon, 26 Nov 2018 14:49:07 -0500
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: Network Development <netdev@...r.kernel.org>,
David Miller <davem@...emloft.net>,
Willem de Bruijn <willemb@...gle.com>
Subject: Re: [PATCH net-next v2 1/2] udp: msg_zerocopy
On Mon, Nov 26, 2018 at 1:19 PM Willem de Bruijn
<willemdebruijn.kernel@...il.com> wrote:
>
> On Mon, Nov 26, 2018 at 1:04 PM Paolo Abeni <pabeni@...hat.com> wrote:
> >
> > On Mon, 2018-11-26 at 12:59 -0500, Willem de Bruijn wrote:
> > > The callers of this function do flush the queue of the other skbs on
> > > error, but only after the call to sock_zerocopy_put_abort.
> > >
> > > sock_zerocopy_put_abort depends on total rollback to revert the
> > > sk_zckey increment and suppress the completion notification (which
> > > must not happen on return with error).
> > >
> > > I don't immediately have a fix. Need to think about this some more..
> >
> > [still out of sheer ignorance] How about tacking a refcnt for the whole
> > ip_append_data() scope, like in the tcp case? that will add an atomic
> > op per loop (likely, hitting the cache) but will remove some code hunk
> > in sock_zerocopy_put_abort() and sock_zerocopy_alloc().
>
> The atomic op pair is indeed what I was trying to avoid. But I also need
> to solve the problem that the final decrement will happen from the freeing
> of the other skbs in __ip_flush_pending_frames, and will not suppress
> the notification.
>
> Freeing the entire queue inside __ip_append_data, effectively making it
> a true noop on error is one approach. But that is invasive, also to non
> zerocopy codepaths, so I would rather avoid that.
>
> Perhaps I need to handle the abort logic in udp_sendmsg directly,
> after both __ip_append_data and __ip_flush_pending_frames.
Actually,
(1) the notification suppression is handled correctly, as .._abort
decrements uarg->len. If now zero, this suppresses the notification
in sock_zerocopy_callback, regardless whether that callback is
called right away or from a later kfree_skb.
(2) if moving skb_zcopy_set below getfrag, then no kfree_skb
will be called on a zerocopy skb inside __ip_append_data. So on
abort the refcount is exactly the number of zerocopy skbs on the
queue that will call sock_zerocopy_put later. Abort then only needs
to handle special case zero, and call sock_zerocopy_put right away.
Tentative fix on top of v2 (I'll squash into v3):
---
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 2179ef84bb44..4b21a58329d1 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1099,12 +1099,13 @@ void sock_zerocopy_put_abort(struct ubuf_info *uarg)
- if (sk->sk_type != SOCK_STREAM && !refcount_read(&uarg->refcnt))
+ if (sk->sk_type != SOCK_STREAM &&
!refcount_read(&uarg->refcnt)) {
refcount_set(&uarg->refcnt, 1);
-
- sock_zerocopy_put(uarg);
+ sock_zerocopy_put(uarg);
+ }
}
}
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 7504da2f33d6..a19396e21b35 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1014,13 +1014,6 @@ static int __ip_append_data(struct sock *sk,
skb->csum = 0;
skb_reserve(skb, hh_len);
- /* only the initial fragment is time stamped */
- skb_shinfo(skb)->tx_flags = cork->tx_flags;
- cork->tx_flags = 0;
- skb_shinfo(skb)->tskey = tskey;
- tskey = 0;
- skb_zcopy_set(skb, uarg);
-
/*
* Find where to start putting bytes.
*/
@@ -1053,6 +1046,13 @@ static int __ip_append_data(struct sock *sk,
exthdrlen = 0;
csummode = CHECKSUM_NONE;
+ /* only the initial fragment is time stamped */
+ skb_shinfo(skb)->tx_flags = cork->tx_flags;
+ cork->tx_flags = 0;
+ skb_shinfo(skb)->tskey = tskey;
+ tskey = 0;
+ skb_zcopy_set(skb, uarg);
+
if ((flags & MSG_CONFIRM) && !skb_prev)
skb_set_dst_pending_confirm(skb, 1);
---
This patch moves all the skb_shinfo touches operations after the copy,
to avoid touching that twice.
Instead of the refcnt trick, I could also refactor sock_zerocopy_put
and call __sock_zerocopy_put
---
-void sock_zerocopy_put(struct ubuf_info *uarg)
+static void __sock_zerocopy_put(struct ubuf_info *uarg)
{
- if (uarg && refcount_dec_and_test(&uarg->refcnt)) {
- if (uarg->callback)
- uarg->callback(uarg, uarg->zerocopy);
- else
- consume_skb(skb_from_uarg(uarg));
- }
+ if (uarg->callback)
+ uarg->callback(uarg, uarg->zerocopy);
+ else
+ consume_skb(skb_from_uarg(uarg));
+}
+
+void sock_zerocopy_put(struct ubuf_info *uarg)
+ if (uarg && refcount_dec_and_test(&uarg->refcnt))
+ __sock_zerocopy_put(uarg);
}
EXPORT_SYMBOL_GPL(sock_zerocopy_put);
---
Powered by blists - more mailing lists