[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CACeb84u9pYsxTDLzjRtkcQQMxzYPhkvrRt7G76YLNhG728mZsg@mail.gmail.com>
Date: Tue, 20 Nov 2018 13:06:59 +0530
From: "Anand H. Krishnan" <anandhkrishnan@...il.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: Network Development <netdev@...r.kernel.org>
Subject: Re: VETH & AF_PACKET problem
Hello Willem,
I was able to do scp without any problems with the patch. I agree with
your reasoning of
the patch.
Thanks for your attention to this problem.
Thanks,
Anand
On Tue, Nov 20, 2018 at 3:22 AM Willem de Bruijn
<willemdebruijn.kernel@...il.com> wrote:
>
> On Mon, Nov 19, 2018 at 12:54 PM Willem de Bruijn
> <willemdebruijn.kernel@...il.com> wrote:
> >
> > On Mon, Nov 19, 2018 at 6:00 AM Anand H. Krishnan
> > <anandhkrishnan@...il.com> wrote:
> > >
> > > Hello,
> > >
> > > I tried the 4.19.2 kernel without success. You were probably right
> > > that skb_orphan
> > > is indeed called from somewhere in the receive path. I had an
> > > instrumented kernel
> > > and the following is what I see:
> >
> > Thanks for verifying.
> >
> > I was also able to reproduce it without veth by looping a packet
> > to another packet socket that delays reading from the socket.
> >
> > A very preliminary patch is to mark the socket as zerocopy in
> > tpacket_fill_skb to trigger skb_copy_ubufs whenever the
> > packet gets cloned or enters the receive path:
> >
> > @@ -2462,6 +2462,9 @@ static int tpacket_fill_skb(struct packet_sock
> > *po, struct sk_buff *skb,
> > skb->tstamp = sockc->transmit_time;
> > sock_tx_timestamp(&po->sk, sockc->tsflags, &skb_shinfo(skb)->tx_flags);
> > skb_shinfo(skb)->destructor_arg = ph.raw;
> > + skb_shinfo(skb)->tx_flags |= SKBTX_ZEROCOPY_FRAG;
> >
> > This requires at the least a fixup in skb_zcopy_clear to handle this
> > special case and not call uarg->callback (because
> > skb_shinfo(skb)->destructor_arg is of a different type for
> > tpacket_snd). I'll share a patch to test when I have something
> > more concrete.
>
> These packets already call an skb destructor when the original skb is
> released, tpacket_destruct_skb, to release the slot and record the
> timestamp of original submission.
>
> We cannot change that timestamp until all skbs are freed. And, we also
> do not want two indirect function calls, one for tpacket on the original
> skb and one for zcopy on the last skb clone.
>
> Instead, disallow cloning and refcounted ubufs, similar to the
> existing tun/tap zerocopy driver. Call skb_copy_ubufs when
> cloning, looping into the receive path, ..
>
> Mark these packets as being zerocopy pages (SKBTX_ZEROCOP_FRAG)
> But, we cannot assign skb_shinfo(skb)->destructor_arg to hold a uarg,
> because tpacket already uses that for tpacket_destruct_skb. We also do
> not want or need the extra alloc/free. So introduce a zerocopy variant that
> expects this field to not be dereferencable.
>
> I did not find a cleaner way to do this so far than with pointer
> tricks. Tentative
> patch that fixes the issue in my test:
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index a2e8297a5b00..c500e3c86a12 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -1334,6 +1334,22 @@ static inline void skb_zcopy_set(struct sk_buff
> *skb, struct ubuf_info *uarg)
> }
> }
>
> +static inline void skb_zcopy_set_nouarg(struct sk_buff *skb, void *val)
> +{
> + skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t) val | 0x1UL);
> + skb_shinfo(skb)->tx_flags |= SKBTX_ZEROCOPY_FRAG;
> +}
> +
> +static inline bool skb_zcopy_is_nouarg(struct sk_buff *skb)
> +{
> + return (uintptr_t) skb_shinfo(skb)->destructor_arg & 0x1UL;
> +}
> +
> +static inline void * skb_zcopy_get_nouarg(struct sk_buff *skb)
> +{
> + return (void *)((uintptr_t) skb_shinfo(skb)->destructor_arg & ~0x1UL);
> +}
> +
> /* Release a reference on a zerocopy structure */
> static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy)
> {
> @@ -1343,7 +1359,7 @@ static inline void skb_zcopy_clear(struct
> sk_buff *skb, bool zerocopy)
> if (uarg->callback == sock_zerocopy_callback) {
> uarg->zerocopy = uarg->zerocopy && zerocopy;
> sock_zerocopy_put(uarg);
> - } else {
> + } else if (!skb_zcopy_is_nouarg(skb)) {
> uarg->callback(uarg, zerocopy);
> }
>
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index ec3095f13aae..a74650e98f42 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -2394,7 +2394,7 @@ static void tpacket_destruct_skb(struct sk_buff *skb)
> void *ph;
> __u32 ts;
>
> - ph = skb_shinfo(skb)->destructor_arg;
> + ph = skb_zcopy_get_nouarg(skb);
> packet_dec_pending(&po->tx_ring);
>
> ts = __packet_set_timestamp(po, ph, skb);
> @@ -2461,7 +2461,7 @@ static int tpacket_fill_skb(struct packet_sock
> *po, struct sk_buff *skb,
> skb->mark = po->sk.sk_mark;
> skb->tstamp = sockc->transmit_time;
> sock_tx_timestamp(&po->sk, sockc->tsflags, &skb_shinfo(skb)->tx_flags);
> - skb_shinfo(skb)->destructor_arg = ph.raw;
> + skb_zcopy_set_nouarg(skb, ph.raw);
>
> skb_reserve(skb, hlen);
> skb_reset_network_header(skb);
Powered by blists - more mailing lists