lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CACeb84u9pYsxTDLzjRtkcQQMxzYPhkvrRt7G76YLNhG728mZsg@mail.gmail.com>
Date:   Tue, 20 Nov 2018 13:06:59 +0530
From:   "Anand H. Krishnan" <anandhkrishnan@...il.com>
To:     Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc:     Network Development <netdev@...r.kernel.org>
Subject: Re: VETH & AF_PACKET problem

Hello Willem,

I was able to do scp without any problems with the patch. I agree with
your reasoning of
the patch.

Thanks for your attention to this problem.

Thanks,
Anand
On Tue, Nov 20, 2018 at 3:22 AM Willem de Bruijn
<willemdebruijn.kernel@...il.com> wrote:
>
> On Mon, Nov 19, 2018 at 12:54 PM Willem de Bruijn
> <willemdebruijn.kernel@...il.com> wrote:
> >
> > On Mon, Nov 19, 2018 at 6:00 AM Anand H. Krishnan
> > <anandhkrishnan@...il.com> wrote:
> > >
> > > Hello,
> > >
> > > I tried the 4.19.2 kernel without success. You were probably right
> > > that skb_orphan
> > > is indeed called from somewhere in the receive path. I had an
> > > instrumented kernel
> > > and the following is what I see:
> >
> > Thanks for verifying.
> >
> > I was also able to reproduce it without veth by looping a packet
> > to another packet socket that delays reading from the socket.
> >
> > A very preliminary patch is to mark the socket as zerocopy in
> > tpacket_fill_skb to trigger skb_copy_ubufs whenever the
> > packet gets cloned or enters the receive path:
> >
> > @@ -2462,6 +2462,9 @@ static int tpacket_fill_skb(struct packet_sock
> > *po, struct sk_buff *skb,
> >         skb->tstamp = sockc->transmit_time;
> >         sock_tx_timestamp(&po->sk, sockc->tsflags, &skb_shinfo(skb)->tx_flags);
> >         skb_shinfo(skb)->destructor_arg = ph.raw;
> > +       skb_shinfo(skb)->tx_flags |= SKBTX_ZEROCOPY_FRAG;
> >
> > This requires at the least a fixup in skb_zcopy_clear to handle this
> > special case and not call uarg->callback (because
> > skb_shinfo(skb)->destructor_arg is of a different type for
> > tpacket_snd). I'll share a patch to test when I have something
> > more concrete.
>
> These packets already call an skb destructor when the original skb is
> released, tpacket_destruct_skb, to release the slot and record the
> timestamp of original submission.
>
> We cannot change that timestamp until all skbs are freed. And, we also
> do not want two indirect function calls, one for tpacket on the original
> skb and one for zcopy on the last skb clone.
>
> Instead, disallow cloning and refcounted ubufs, similar to the
> existing tun/tap zerocopy driver. Call skb_copy_ubufs when
> cloning, looping into the receive path, ..
>
> Mark these packets as being zerocopy pages (SKBTX_ZEROCOP_FRAG)
> But, we cannot assign skb_shinfo(skb)->destructor_arg to hold a uarg,
> because tpacket already uses that for tpacket_destruct_skb. We also do
> not want or need the extra alloc/free. So introduce a zerocopy variant that
> expects this field to not be dereferencable.
>
> I did not find a cleaner way to do this so far than with pointer
> tricks. Tentative
> patch that fixes the issue in my test:
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index a2e8297a5b00..c500e3c86a12 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -1334,6 +1334,22 @@ static inline void skb_zcopy_set(struct sk_buff
> *skb, struct ubuf_info *uarg)
>         }
>  }
>
> +static inline void skb_zcopy_set_nouarg(struct sk_buff *skb, void *val)
> +{
> +       skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t) val | 0x1UL);
> +       skb_shinfo(skb)->tx_flags |= SKBTX_ZEROCOPY_FRAG;
> +}
> +
> +static inline bool skb_zcopy_is_nouarg(struct sk_buff *skb)
> +{
> +       return (uintptr_t) skb_shinfo(skb)->destructor_arg & 0x1UL;
> +}
> +
> +static inline void * skb_zcopy_get_nouarg(struct sk_buff *skb)
> +{
> +       return (void *)((uintptr_t) skb_shinfo(skb)->destructor_arg & ~0x1UL);
> +}
> +
>  /* Release a reference on a zerocopy structure */
>  static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy)
>  {
> @@ -1343,7 +1359,7 @@ static inline void skb_zcopy_clear(struct
> sk_buff *skb, bool zerocopy)
>                 if (uarg->callback == sock_zerocopy_callback) {
>                         uarg->zerocopy = uarg->zerocopy && zerocopy;
>                         sock_zerocopy_put(uarg);
> -               } else {
> +               } else if (!skb_zcopy_is_nouarg(skb)) {
>                         uarg->callback(uarg, zerocopy);
>                 }
>
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index ec3095f13aae..a74650e98f42 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -2394,7 +2394,7 @@ static void tpacket_destruct_skb(struct sk_buff *skb)
>                 void *ph;
>                 __u32 ts;
>
> -               ph = skb_shinfo(skb)->destructor_arg;
> +               ph = skb_zcopy_get_nouarg(skb);
>                 packet_dec_pending(&po->tx_ring);
>
>                 ts = __packet_set_timestamp(po, ph, skb);
> @@ -2461,7 +2461,7 @@ static int tpacket_fill_skb(struct packet_sock
> *po, struct sk_buff *skb,
>         skb->mark = po->sk.sk_mark;
>         skb->tstamp = sockc->transmit_time;
>         sock_tx_timestamp(&po->sk, sockc->tsflags, &skb_shinfo(skb)->tx_flags);
> -       skb_shinfo(skb)->destructor_arg = ph.raw;
> +       skb_zcopy_set_nouarg(skb, ph.raw);
>
>         skb_reserve(skb, hlen);
>         skb_reset_network_header(skb);

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ