netdev - Re: [PATCH net-next] af_packet: convert pending frame counter to atomic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAF=yD-Joh1ne4Y_pwDv8VOcWnKP-2veeXWw=eUBoZKr5___3TA@mail.gmail.com>
Date:   Fri, 28 Jun 2019 11:15:09 -0400
From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
To:     Neil Horman <nhorman@...driver.com>
Cc:     Network Development <netdev@...r.kernel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Willem de Bruijn <willemb@...gle.com>,
        Daniel Borkmann <daniel@...earbox.net>
Subject: Re: [PATCH net-next] af_packet: convert pending frame counter to atomic_t

On Fri, Jun 28, 2019 at 10:53 AM Neil Horman <nhorman@...driver.com> wrote:
>
> The AF_PACKET protocol, when running as a memory mapped socket uses a
> pending frame counter to track the number of skbs in flight during
> transmission.  It is incremented during the sendmsg call (via
> tpacket_snd), and decremented (possibly asynchronously) in the skb
> desructor during skb_free.
>
> The counter is currently implemented as a percpu variable for each open
> socket, but for reads (via packet_read_pending), we iterate over every
> cpu variable, accumulating the total pending count.
>
> Given that the socket transmit path is an exclusive path (locked via the
> pg_vec_lock mutex), we do not have the ability to increment this counter
> on multiple cpus in parallel.  This implementation also seems to have
> the potential to be broken, in that, should an skb be freed on a cpu
> other than the one that it was initially transmitted on, we may
> decrement a counter that was not initially incremented, leading to
> underflow.
>
> As such, adjust the packet socket struct to convert the per-cpu counter
> to an atomic_t variable (to enforce consistency between the send path
> and the skb free path).  This saves us some space in the packet_sock
> structure, prevents the possibility of underflow, and should reduce the
> run time of packet_read_pending, as we only need to read a single
> variable, instead of having to loop over every available cpu variable
> instance.
>
> Tested by myself by running a small program which sends frames via
> AF_PACKET on multiple cpus in parallel, with good results.
>
> Signed-off-by: Neil Horman <nhorman@...driver.com>
> CC: "David S. Miller" <davem@...emloft.net>
> CC: Willem de Bruijn <willemb@...gle.com>
> ---

This essentially is a revert of commit b013840810c2 ("packet: use
percpu mmap tx frame pending refcount"). That has some benchmark
numbers and also discusses the overflow issue.

I think more interesting would be to eschew the counter when
MSG_DONTWAIT, as it is only used to know when to exit the loop if
need_wait.

But, IMHO packet sockets are deprecated in favor of AF_XDP and
should be limited to bug fixes at this point.

>  net/packet/af_packet.c | 40 +++++-----------------------------------
>  net/packet/internal.h  |  2 +-
>  2 files changed, 6 insertions(+), 36 deletions(-)
>
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index 8d54f3047768..25ffb486fac9 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -1154,43 +1154,17 @@ static void packet_increment_head(struct packet_ring_buffer *buff)
>
>  static void packet_inc_pending(struct packet_ring_buffer *rb)
>  {
> -       this_cpu_inc(*rb->pending_refcnt);
> +       atomic_inc(&rb->pending_refcnt);
>  }

If making this change, can also remove these helper functions. The
layer of indirection just hinders readability.