netdev - Re: [PATCH bpf 4/4] xsk: fix potential race in SKB TX completion code

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJ8uoz1VkaRz-q0gykDAZ2dtL=uTk0yqz562GbS3pmAYv0HXCg@mail.gmail.com>
Date:   Thu, 28 Jun 2018 10:31:35 +0200
From:   Magnus Karlsson <magnus.karlsson@...il.com>
To:     eric.dumazet@...il.com
Cc:     "Karlsson, Magnus" <magnus.karlsson@...el.com>,
        Björn Töpel <bjorn.topel@...el.com>,
        Alexei Starovoitov <ast@...com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Network Development <netdev@...r.kernel.org>,
        "Zhang, Qi Z" <qi.z.zhang@...el.com>, pavel@...tnetmon.com
Subject: Re: [PATCH bpf 4/4] xsk: fix potential race in SKB TX completion code

On Wed, Jun 27, 2018 at 5:57 PM Eric Dumazet <eric.dumazet@...il.com> wrote:
>
>
>
> On 06/27/2018 07:02 AM, Magnus Karlsson wrote:
> > There was a potential race in the TX completion code for
> > the SKB case when the TX napi thread and the error path
> > of the sendmsg code could both call the SKB destructor
> > at the same time. Fixed by introducing a spin_lock in the
> > destructor.
>
>
> Wow, what is the impact on performance ?
>
> Please describe a bit more what is the problem.

One process enters the sendmsg code of an AF_XDP socket in order to
send a frame. The execution eventually trickles down to the driver
that is told to send the packet. However, it decides to drop the
packet due to some error condition (e.g., rings full) and frees the
SKB. This will trigger the SKB destructor and a completion will be
sent to the AF_XDP user space through its
single-producer/single-consumer queues.

At the same time a TX interrupt has fired on another core and it
dispatches the TX completion code in the driver. It does its HW
specific things and ends up freeing the SKB associated with the
transmitted packet. This will trigger the SKB destructor and a
completion will be sent to the AF_XDP user space through its
single-producer/single-consumer queues. With a pseudo call stack,
it would look like this:

Core 1:
sendmsg() being called in the application
  netdev_start_xmit()
    Driver entered through ndo_start_xmit
      Driver decides to free the SKB for some reason (e.g., rings full)
        Destructor of SKB called
          xskq_produce_addr() is called to signal completion to user space

Core 2:
TX completion irq
  NAPI loop
    Driver irq handler for TX completions
      Frees the SKB
        Destructor of SKB called
          xskq_produce_addr() is called to signal completion to user space

We now have a violation of the single-producer/single-consumer
principle for our queues as there are two threads trying to produce at
the same time on the same queue.

Let me know if this is clear enough and I will put it in the commit
message and submit a V2.

In regards to the performance, I get around 1.74 Mpps for txonly
before and after the introduction of the spinlock on my machine. There
is of course some impact due to the spin lock but it is in the less
significant digits that are too noisy for me to measure. But let us
say that the version without the spin lock got 1.745 Mpps in the best
case and the version with 1.735 Mpps in the worst case, then that
would mean a maximum drop in performance of 0.5%.

Will had some ideas in
https://www.spinics.net/lists/netdev/msg497859.html on how to improve
the performance of the TX SKB path for AF_XDP. We could always
implement these in order to try to recoup the performance loss due to
the spin lock.

/Magnus