[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF=yD-Jpceeq5dzL3usLKBs+S5Ky9gou4bifKAaCNXZZJcr2gA@mail.gmail.com>
Date: Mon, 19 Nov 2018 16:51:42 -0500
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: "Anand H. Krishnan" <anandhkrishnan@...il.com>
Cc: Network Development <netdev@...r.kernel.org>
Subject: Re: VETH & AF_PACKET problem
On Mon, Nov 19, 2018 at 12:54 PM Willem de Bruijn
<willemdebruijn.kernel@...il.com> wrote:
>
> On Mon, Nov 19, 2018 at 6:00 AM Anand H. Krishnan
> <anandhkrishnan@...il.com> wrote:
> >
> > Hello,
> >
> > I tried the 4.19.2 kernel without success. You were probably right
> > that skb_orphan
> > is indeed called from somewhere in the receive path. I had an
> > instrumented kernel
> > and the following is what I see:
>
> Thanks for verifying.
>
> I was also able to reproduce it without veth by looping a packet
> to another packet socket that delays reading from the socket.
>
> A very preliminary patch is to mark the socket as zerocopy in
> tpacket_fill_skb to trigger skb_copy_ubufs whenever the
> packet gets cloned or enters the receive path:
>
> @@ -2462,6 +2462,9 @@ static int tpacket_fill_skb(struct packet_sock
> *po, struct sk_buff *skb,
> skb->tstamp = sockc->transmit_time;
> sock_tx_timestamp(&po->sk, sockc->tsflags, &skb_shinfo(skb)->tx_flags);
> skb_shinfo(skb)->destructor_arg = ph.raw;
> + skb_shinfo(skb)->tx_flags |= SKBTX_ZEROCOPY_FRAG;
>
> This requires at the least a fixup in skb_zcopy_clear to handle this
> special case and not call uarg->callback (because
> skb_shinfo(skb)->destructor_arg is of a different type for
> tpacket_snd). I'll share a patch to test when I have something
> more concrete.
These packets already call an skb destructor when the original skb is
released, tpacket_destruct_skb, to release the slot and record the
timestamp of original submission.
We cannot change that timestamp until all skbs are freed. And, we also
do not want two indirect function calls, one for tpacket on the original
skb and one for zcopy on the last skb clone.
Instead, disallow cloning and refcounted ubufs, similar to the
existing tun/tap zerocopy driver. Call skb_copy_ubufs when
cloning, looping into the receive path, ..
Mark these packets as being zerocopy pages (SKBTX_ZEROCOP_FRAG)
But, we cannot assign skb_shinfo(skb)->destructor_arg to hold a uarg,
because tpacket already uses that for tpacket_destruct_skb. We also do
not want or need the extra alloc/free. So introduce a zerocopy variant that
expects this field to not be dereferencable.
I did not find a cleaner way to do this so far than with pointer
tricks. Tentative
patch that fixes the issue in my test:
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a2e8297a5b00..c500e3c86a12 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1334,6 +1334,22 @@ static inline void skb_zcopy_set(struct sk_buff
*skb, struct ubuf_info *uarg)
}
}
+static inline void skb_zcopy_set_nouarg(struct sk_buff *skb, void *val)
+{
+ skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t) val | 0x1UL);
+ skb_shinfo(skb)->tx_flags |= SKBTX_ZEROCOPY_FRAG;
+}
+
+static inline bool skb_zcopy_is_nouarg(struct sk_buff *skb)
+{
+ return (uintptr_t) skb_shinfo(skb)->destructor_arg & 0x1UL;
+}
+
+static inline void * skb_zcopy_get_nouarg(struct sk_buff *skb)
+{
+ return (void *)((uintptr_t) skb_shinfo(skb)->destructor_arg & ~0x1UL);
+}
+
/* Release a reference on a zerocopy structure */
static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy)
{
@@ -1343,7 +1359,7 @@ static inline void skb_zcopy_clear(struct
sk_buff *skb, bool zerocopy)
if (uarg->callback == sock_zerocopy_callback) {
uarg->zerocopy = uarg->zerocopy && zerocopy;
sock_zerocopy_put(uarg);
- } else {
+ } else if (!skb_zcopy_is_nouarg(skb)) {
uarg->callback(uarg, zerocopy);
}
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index ec3095f13aae..a74650e98f42 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2394,7 +2394,7 @@ static void tpacket_destruct_skb(struct sk_buff *skb)
void *ph;
__u32 ts;
- ph = skb_shinfo(skb)->destructor_arg;
+ ph = skb_zcopy_get_nouarg(skb);
packet_dec_pending(&po->tx_ring);
ts = __packet_set_timestamp(po, ph, skb);
@@ -2461,7 +2461,7 @@ static int tpacket_fill_skb(struct packet_sock
*po, struct sk_buff *skb,
skb->mark = po->sk.sk_mark;
skb->tstamp = sockc->transmit_time;
sock_tx_timestamp(&po->sk, sockc->tsflags, &skb_shinfo(skb)->tx_flags);
- skb_shinfo(skb)->destructor_arg = ph.raw;
+ skb_zcopy_set_nouarg(skb, ph.raw);
skb_reserve(skb, hlen);
skb_reset_network_header(skb);
Powered by blists - more mailing lists