netdev - RE: [PATCH v4 bpf-next 09/10] selftests: xsk: rely on pkts_in_flight in wait_for_tx

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <62ad3ed172224_24b342084d@john.notmuch>
Date:   Fri, 17 Jun 2022 19:56:17 -0700
From:   John Fastabend <john.fastabend@...il.com>
To:     Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
        bpf@...r.kernel.org, ast@...nel.org, daniel@...earbox.net
Cc:     netdev@...r.kernel.org, magnus.karlsson@...el.com,
        bjorn@...nel.org, kuba@...nel.org,
        Maciej Fijalkowski <maciej.fijalkowski@...el.com>
Subject: RE: [PATCH v4 bpf-next 09/10] selftests: xsk: rely on pkts_in_flight
 in wait_for_tx_completion()

Maciej Fijalkowski wrote:
> Some of the drivers that implement support for AF_XDP Zero Copy (like
> ice) can have lazy approach for cleaning Tx descriptors. For ZC, when
> descriptor is cleaned, it is placed onto AF_XDP completion queue. This
> means that current implementation of wait_for_tx_completion() in
> xdpxceiver can get onto infinite loop, as some of the descriptors can
> never reach CQ.
> 
> This function can be changed to rely on pkts_in_flight instead.
> 
> Acked-by: Magnus Karlsson <magnus.karlsson@...el.com>
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@...el.com>
> ---

Sorry I'm going to need more details to follow whats going on here.

In send_pkts() we do the expected thing and send all the pkts and
then call wait_for_tx_completion().

Wait for completion is obvious,

 static void wait_for_tx_completion(struct xsk_socket_info *xsk)               
 {                                                   
        while (xsk->outstanding_tx)                                                      
                complete_pkts(xsk, BATCH_SIZE);
 }  

the 'outstanding_tx' counter appears to be decremented in complete_pkts().
This is done by looking at xdk_ring_cons__peek() makes sense to me until
it shows up here we don't know the pkt has been completely sent and
can release the resources.

Now if you just zero it on exit and call it good how do you know the
resources are safe to clean up? Or that you don't have a real bug
in the driver that isn't correctly releasing the resource.

How are users expected to use a lazy approach to tx descriptor cleaning
in this case e.g. on exit like in this case. It seems we need to
fix the root cause of ice not putting things on the completion queue
or I misunderstood the patch.


>  tools/testing/selftests/bpf/xdpxceiver.c | 3 ++-
>  tools/testing/selftests/bpf/xdpxceiver.h | 2 +-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/bpf/xdpxceiver.c b/tools/testing/selftests/bpf/xdpxceiver.c
> index de4cf0432243..13a3b2ac2399 100644
> --- a/tools/testing/selftests/bpf/xdpxceiver.c
> +++ b/tools/testing/selftests/bpf/xdpxceiver.c
> @@ -965,7 +965,7 @@ static int __send_pkts(struct ifobject *ifobject, u32 *pkt_nb)
>  
>  static void wait_for_tx_completion(struct xsk_socket_info *xsk)
>  {
> -	while (xsk->outstanding_tx)
> +	while (pkts_in_flight)
>  		complete_pkts(xsk, BATCH_SIZE);
>  }
>  
> @@ -1269,6 +1269,7 @@ static void *worker_testapp_validate_rx(void *arg)
>  		pthread_mutex_unlock(&pacing_mutex);
>  	}
>  
> +	pkts_in_flight = 0;
>  	pthread_exit(NULL);
>  }