lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 19 Aug 2020 08:44:32 +0200
From:   Björn Töpel <>
To:     "Li,Rongqing" <>,
        Björn Töpel <>
Cc:     Netdev <>,
        intel-wired-lan <>,
        "Karlsson, Magnus" <>,
        bpf <>,
        Maciej Fijalkowski <>,
        Piotr <>,
        Maciej <>
Subject: Re: 答复: [Intel-wired-lan] [PATCH 0/2] intel/xdp fixes for fliping rx buffer

On 2020-08-19 03:37, Li,Rongqing wrote:
 > Hi:
 > Thanks for your explanation.
 > But we can reproduce this bug
 > We use ebpf to redirect only-Vxlan packets to non-zerocopy AF_XDP, 
First we see panic on tcp stack, in tcp_collapse: BUG_ON(offset < 0); it 
is very hard to reproduce.
 > Then we use the scp to do test, and has lots of vxlan packet at the 
same time, scp will be broken frequently.

Ok! Just so that I'm certain of your setup. You receive packets to an
i40e netdev where there's an XDP program. The program does XDP_PASS or
XDP_REDIRECT to e.g. devmap for non-vxlan packets. However, vxlan
packets are redirected to AF_XDP socket(s) in *copy-mode*. Am I
understanding that correct?

I'm assuming this is an x86-64 with 4k page size, right? :-) The page
flipping is a bit different if the PAGE_SIZE is not 4k.

 > With this fixes, scp has not been broken again, and kernel is not 
panic again

Let's dig into your scenario.

Are you saying the following:

Page A:
| "first skb" ----> Rx HW ring entry X
| "second skb"----> Rx HW ring entry X+1 (or X+n)

This is a scenario that shouldn't be allowed, because there are now
two users of the page. If that's the case, the refcounting is
broken. Is that the case?

Check out i40e_can_reuse_rx_page(). The idea with page flipping/reuse
is that the page is only reused if there is only one user.

 > Seem your explanation is unable to solve my analysis:
 >         1. first skb is not for xsk, and forwarded to another device
 >            or socket queue

The data for the "first skb" resides on a page:
| "first skb"
| to be reused
refcount >>1

 >         2. seconds skb is for xsk, copy data to xsk memory, and page
 >            of skb->data is released

Note that page B != page A.

| to be reused/or used by the stack
| "second skb for xsk"
refcount >>1

data is copied to socket, page_frag_free() is called, and the page
count is decreased. The driver will then check if the page can be
reused. If not, it's freed to the page allocator.

 >         3. rx_buff is reusable since only first skb is in it, but
 >            *_rx_buffer_flip will make that page_offset is set to
 >            first skb data

I'm having trouble grasping how this is possible. More than one user
implies that it wont be reused. If this is possible, the
recounting/reuse mechanism is broken, and that is what should be

The AF_XDP redirect should not have semantics different from, say,
devmap redirect. It's just that the page_frag_free() is called earlier
for AF_XDP, instead of from i40e_clean_tx_irq() as the case for

 >         4. then reuse rx buffer, first skb which still is living
 >            will be corrupted.
 > The root cause is difference you said upper, so I only fixes for 
non-zerocopy AF_XDP

I have only addressed non-zerocopy, so we're on the same page (pun
intended) here!


 > -Li

Powered by blists - more mailing lists