netdev - Re: [PATCH net-next] af_packet: convert pending frame counter to atomic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190628195458.GC14635@hmswarspite.think-freely.org>
Date:   Fri, 28 Jun 2019 15:54:58 -0400
From:   Neil Horman <nhorman@...driver.com>
To:     Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc:     Network Development <netdev@...r.kernel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Willem de Bruijn <willemb@...gle.com>,
        Daniel Borkmann <daniel@...earbox.net>
Subject: Re: [PATCH net-next] af_packet: convert pending frame counter to
 atomic_t

On Fri, Jun 28, 2019 at 11:15:09AM -0400, Willem de Bruijn wrote:
> On Fri, Jun 28, 2019 at 10:53 AM Neil Horman <nhorman@...driver.com> wrote:
> >
> > The AF_PACKET protocol, when running as a memory mapped socket uses a
> > pending frame counter to track the number of skbs in flight during
> > transmission.  It is incremented during the sendmsg call (via
> > tpacket_snd), and decremented (possibly asynchronously) in the skb
> > desructor during skb_free.
> >
> > The counter is currently implemented as a percpu variable for each open
> > socket, but for reads (via packet_read_pending), we iterate over every
> > cpu variable, accumulating the total pending count.
> >
> > Given that the socket transmit path is an exclusive path (locked via the
> > pg_vec_lock mutex), we do not have the ability to increment this counter
> > on multiple cpus in parallel.  This implementation also seems to have
> > the potential to be broken, in that, should an skb be freed on a cpu
> > other than the one that it was initially transmitted on, we may
> > decrement a counter that was not initially incremented, leading to
> > underflow.
> >
> > As such, adjust the packet socket struct to convert the per-cpu counter
> > to an atomic_t variable (to enforce consistency between the send path
> > and the skb free path).  This saves us some space in the packet_sock
> > structure, prevents the possibility of underflow, and should reduce the
> > run time of packet_read_pending, as we only need to read a single
> > variable, instead of having to loop over every available cpu variable
> > instance.
> >
> > Tested by myself by running a small program which sends frames via
> > AF_PACKET on multiple cpus in parallel, with good results.
> >
> > Signed-off-by: Neil Horman <nhorman@...driver.com>
> > CC: "David S. Miller" <davem@...emloft.net>
> > CC: Willem de Bruijn <willemb@...gle.com>
> > ---
> 
> This essentially is a revert of commit b013840810c2 ("packet: use
> percpu mmap tx frame pending refcount"). That has some benchmark
> numbers and also discusses the overflow issue.
> 
Yes it is.  I was looking at it thinking that the accumulation issue was more
serious now that we actually sleep until we get a completion, rather than just
schedule.  I.e if we're freeing frames in parallel, I was thinking it was
possible for the ordering to result in neither instance getting a return value
from packet_read_pending of 0, which would lead to a hang/maximal timeout, but
as I look back, I think I was wrong about that.

As for the performance, you're right.  They're almost the same, but I did some
perf runs, and for 10000 iterations this patch is about 0.1% slower (measuring
system time).  I kind of wonder if it isn't worth the code and data savings, but
faster is faster.  Though thats on a 6 cpu system.  I suppose some more testing
might be warranted on high cpu count systems (i.e. is there a point at which the
looping over the cpu_possible_mask becomes more expensive), though perhaps that
just so small it doesn't matter.

> I think more interesting would be to eschew the counter when
> MSG_DONTWAIT, as it is only used to know when to exit the loop if
> need_wait.
> 
Yeah, that might be interesting.  Though that would mean needing to have
tpacket_destruct_skb be aware of wether or not the frame being free was sent as
part of a WAIT/DONTWAIT flagged call.


> But, IMHO packet sockets are deprecated in favor of AF_XDP and
> should be limited to bug fixes at this point.
> 
I don't see any documentation listing AF_PACKET as deprecated.

People can use AF_XDP to do raw frame sends if they like,
but AF_PACKET will still be around (ostensibly in perpetuity), to support
existing applications.  I see no need to avoid improving it when we can.

Neil


> >  net/packet/af_packet.c | 40 +++++-----------------------------------
> >  net/packet/internal.h  |  2 +-
> >  2 files changed, 6 insertions(+), 36 deletions(-)
> >
> > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> > index 8d54f3047768..25ffb486fac9 100644
> > --- a/net/packet/af_packet.c
> > +++ b/net/packet/af_packet.c
> > @@ -1154,43 +1154,17 @@ static void packet_increment_head(struct packet_ring_buffer *buff)
> >
> >  static void packet_inc_pending(struct packet_ring_buffer *rb)
> >  {
> > -       this_cpu_inc(*rb->pending_refcnt);
> > +       atomic_inc(&rb->pending_refcnt);
> >  }
> 
> If making this change, can also remove these helper functions. The
> layer of indirection just hinders readability.
>