lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJ+HfNhb=wn1XymE4L6QaE--Z_i2s+bDH=wjCBhHMK4gnVw0wg@mail.gmail.com>
Date:   Fri, 3 Nov 2017 11:47:24 +0100
From:   Björn Töpel <bjorn.topel@...il.com>
To:     Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc:     "Karlsson, Magnus" <magnus.karlsson@...el.com>,
        Alexander Duyck <alexander.h.duyck@...el.com>,
        Alexander Duyck <alexander.duyck@...il.com>,
        John Fastabend <john.fastabend@...il.com>,
        Alexei Starovoitov <ast@...com>,
        Jesper Dangaard Brouer <brouer@...hat.com>,
        michael.lundkvist@...csson.com, ravineet.singh@...csson.com,
        Daniel Borkmann <daniel@...earbox.net>,
        Network Development <netdev@...r.kernel.org>,
        Björn Töpel <bjorn.topel@...el.com>,
        jesse.brandeburg@...el.com, anjali.singhai@...el.com,
        rami.rosen@...el.com, jeffrey.b.shaw@...el.com,
        ferruh.yigit@...el.com, qi.z.zhang@...el.com
Subject: Re: [RFC PATCH 07/14] packet: wire up zerocopy for AF_PACKET V4

2017-11-03 4:17 GMT+01:00 Willem de Bruijn <willemdebruijn.kernel@...il.com>:
> On Tue, Oct 31, 2017 at 9:41 PM, Björn Töpel <bjorn.topel@...il.com> wrote:
>> From: Björn Töpel <bjorn.topel@...el.com>
>>
>> This commits adds support for zerocopy mode. Note that zerocopy mode
>> requires that the network interface has been bound to the socket using
>> the bind syscall, and that the corresponding netdev implements the
>> AF_PACKET V4 ndos.
>>
>> Signed-off-by: Björn Töpel <bjorn.topel@...el.com>
>> ---
>> +
>> +static void packet_v4_disable_zerocopy(struct net_device *dev,
>> +                                      struct tp4_netdev_parms *zc)
>> +{
>> +       struct tp4_netdev_parms params;
>> +
>> +       params = *zc;
>> +       params.command  = TP4_DISABLE;
>> +
>> +       (void)dev->netdev_ops->ndo_tp4_zerocopy(dev, &params);
>
> Don't ignore error return codes.
>

Will fix!

>> +static int packet_v4_zerocopy(struct sock *sk, int qp)
>> +{
>> +       struct packet_sock *po = pkt_sk(sk);
>> +       struct socket *sock = sk->sk_socket;
>> +       struct tp4_netdev_parms *zc = NULL;
>> +       struct net_device *dev;
>> +       bool if_up;
>> +       int ret = 0;
>> +
>> +       /* Currently, only RAW sockets are supported.*/
>> +       if (sock->type != SOCK_RAW)
>> +               return -EINVAL;
>> +
>> +       rtnl_lock();
>> +       dev = packet_cached_dev_get(po);
>> +
>> +       /* Socket needs to be bound to an interface. */
>> +       if (!dev) {
>> +               rtnl_unlock();
>> +               return -EISCONN;
>> +       }
>> +
>> +       /* The device needs to have both the NDOs implemented. */
>> +       if (!(dev->netdev_ops->ndo_tp4_zerocopy &&
>> +             dev->netdev_ops->ndo_tp4_xmit)) {
>> +               ret = -EOPNOTSUPP;
>> +               goto out_unlock;
>> +       }
>
> Inconsistent error handling with above test.
>

Will fix.

>> +
>> +       if (!(po->rx_ring.pg_vec && po->tx_ring.pg_vec)) {
>> +               ret = -EOPNOTSUPP;
>> +               goto out_unlock;
>> +       }
>
> A ring can be unmapped later with packet_set_ring. Should that operation
> fail if zerocopy is enabled? After that, it can also change version with
> PACKET_VERSION.
>

You're correct, I've missed this. I need to revisit the scenario when
a ring is unmapped, and recreated. Thanks for pointing this out.

>> +
>> +       if_up = dev->flags & IFF_UP;
>> +       zc = rtnl_dereference(po->zc);
>> +
>> +       /* Disable */
>> +       if (qp <= 0) {
>> +               if (!zc)
>> +                       goto out_unlock;
>> +
>> +               packet_v4_disable_zerocopy(dev, zc);
>> +               rcu_assign_pointer(po->zc, NULL);
>> +
>> +               if (if_up) {
>> +                       spin_lock(&po->bind_lock);
>> +                       register_prot_hook(sk);
>> +                       spin_unlock(&po->bind_lock);
>> +               }
>
> There have been a bunch of race conditions in this bind code. We need
> to be very careful with adding more states to the locking, especially when
> open coding in multiple locations, as this patch does. I counted at least
> four bind locations. See for instance also
> http://patchwork.ozlabs.org/patch/813945/
>

Yeah, the locking schemes in AF_PACKET is pretty convoluted. I'll
document and make the locking more explicit (and avoiding open coding
it).

>
>> +
>> +               goto out_unlock;
>> +       }
>> +
>> +       /* Enable */
>> +       if (!zc) {
>> +               zc = kzalloc(sizeof(*zc), GFP_KERNEL);
>> +               if (!zc) {
>> +                       ret = -ENOMEM;
>> +                       goto out_unlock;
>> +               }
>> +       }
>> +
>> +       if (zc->queue_pair >= 0)
>> +               packet_v4_disable_zerocopy(dev, zc);
>
> This calls disable even if zc was freshly allocated.
> Shoud be > 0?
>

Good catch. It should be > 0.

>>  static int packet_release(struct socket *sock)
>>  {
>> +       struct tp4_netdev_parms *zc;
>>         struct sock *sk = sock->sk;
>> +       struct net_device *dev;
>>         struct packet_sock *po;
>>         struct packet_fanout *f;
>>         struct net *net;
>> @@ -3337,6 +3541,20 @@ static int packet_release(struct socket *sock)
>>         sock_prot_inuse_add(net, sk->sk_prot, -1);
>>         preempt_enable();
>>
>> +       rtnl_lock();
>> +       zc = rtnl_dereference(po->zc);
>> +       dev = packet_cached_dev_get(po);
>> +       if (zc && dev)
>> +               packet_v4_disable_zerocopy(dev, zc);
>> +       if (dev)
>> +               dev_put(dev);
>> +       rtnl_unlock();
>> +
>> +       if (zc) {
>> +               synchronize_rcu();
>> +               kfree(zc);
>> +       }
>
> Please use a helper function for anything this complex.

Will fix.


Thanks,
Björn

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ