[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZGJEGr+ZgBqOhjDf@corigine.com>
Date: Mon, 15 May 2023 16:39:22 +0200
From: Simon Horman <simon.horman@...igine.com>
To: Maciej Fijalkowski <maciej.fijalkowski@...el.com>
Cc: intel-wired-lan@...ts.osuosl.org, netdev@...r.kernel.org,
bpf@...r.kernel.org, anthony.l.nguyen@...el.com,
magnus.karlsson@...el.com
Subject: Re: [PATCH iwl-net] ice: recycle/free all of the fragments from
multi-buffer packet
On Mon, May 15, 2023 at 03:17:33PM +0200, Maciej Fijalkowski wrote:
> On Fri, May 12, 2023 at 05:37:39PM +0200, Simon Horman wrote:
> > On Fri, May 12, 2023 at 03:23:30PM +0200, Maciej Fijalkowski wrote:
> > > The ice driver caches next_to_clean value at the beginning of
> > > ice_clean_rx_irq() in order to remember the first buffer that has to be
> > > freed/recycled after main Rx processing loop. The end is indicated by
> > > first descriptor of packet that the Rx processing loop has ended. Note
> > > that if mentioned loop ended in the middle of gathering a multi-buffer
> > > packet, next_to_clean would be pointing to the descriptor in the middle
> > > of the packet BUT freeing/recycling stage will stop at the first
> > > descriptor. This means that next iteration of ice_clean_rx_irq() will
> > > miss the (first_desc, next_to_clean - 1) entries.
> > >
> > > When running various 9K MTU workloads, such splats were observed,
> > > mostly related to rx_buf->page being NULL as behavior described in the
> > > paragraph above breaks ICE_RX_DESC_UNUSED() macro which is used when
> > > refilling Rx buffers:
> > >
> > > [ 540.780716] BUG: kernel NULL pointer dereference, address: 0000000000000000
> > > [ 540.787787] #PF: supervisor read access in kernel mode
> > > [ 540.793002] #PF: error_code(0x0000) - not-present page
> > > [ 540.798218] PGD 0 P4D 0
> > > [ 540.800801] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > > [ 540.805231] CPU: 18 PID: 3984 Comm: xskxceiver Tainted: G W 6.3.0-rc7+ #96
> > > [ 540.813619] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
> > > [ 540.824209] RIP: 0010:ice_clean_rx_irq+0x2b6/0xf00 [ice]
> > > [ 540.829678] Code: 74 24 10 e9 aa 00 00 00 8b 55 78 41 31 57 10 41 09 c4 4d 85 ff 0f 84 83 00 00 00 49 8b 57 08 41 8b 4f 1c 65 8b 35 1a fa 4b 3f <48> 8b 02 48 c1 e8 3a 39 c6 0f 85 a2 00 00 00 f6 42 08 02 0f 85 98
> > > [ 540.848717] RSP: 0018:ffffc9000f42fc50 EFLAGS: 00010282
> > > [ 540.854029] RAX: 0000000000000004 RBX: 0000000000000002 RCX: 000000000000fffe
> > > [ 540.861272] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 00000000ffffffff
> > > [ 540.868519] RBP: ffff88984a05ac00 R08: 0000000000000000 R09: dead000000000100
> > > [ 540.875760] R10: ffff88983fffcd00 R11: 000000000010f2b8 R12: 0000000000000004
> > > [ 540.883008] R13: 0000000000000003 R14: 0000000000000800 R15: ffff889847a10040
> > > [ 540.890253] FS: 00007f6ddf7fe640(0000) GS:ffff88afdf800000(0000) knlGS:0000000000000000
> > > [ 540.898465] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 540.904299] CR2: 0000000000000000 CR3: 000000010d3da001 CR4: 00000000007706e0
> > > [ 540.911542] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > [ 540.918789] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > [ 540.926032] PKRU: 55555554
> > > [ 540.928790] Call Trace:
> > > [ 540.931276] <TASK>
> > > [ 540.933418] ice_napi_poll+0x4ca/0x6d0 [ice]
> > > [ 540.937804] ? __pfx_ice_napi_poll+0x10/0x10 [ice]
> > > [ 540.942716] napi_busy_loop+0xd7/0x320
> > > [ 540.946537] xsk_recvmsg+0x143/0x170
> > > [ 540.950178] sock_recvmsg+0x99/0xa0
> > > [ 540.953729] __sys_recvfrom+0xa8/0x120
> > > [ 540.957543] ? do_futex+0xbd/0x1d0
> > > [ 540.961008] ? __x64_sys_futex+0x73/0x1d0
> > > [ 540.965083] __x64_sys_recvfrom+0x20/0x30
> > > [ 540.969155] do_syscall_64+0x38/0x90
> > > [ 540.972796] entry_SYSCALL_64_after_hwframe+0x72/0xdc
> > > [ 540.977934] RIP: 0033:0x7f6de5f27934
> > >
> > > To fix this, compare next_to_clean with first_desc at the beginning of
> > > ice_clean_rx_irq(). In the case they are not the same, set cached_ntc to
> > > first_desc so that at the end, when freeing/recycling buffers,
> > > descriptors from first to ntc are not missed.
> > >
> > > Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side")
> > > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@...el.com>
> > > ---
> > > drivers/net/ethernet/intel/ice/ice_txrx.c | 3 +++
> > > 1 file changed, 3 insertions(+)
> > >
> > > diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
> > > index 4fcf2d07eb85..4d1fc047767f 100644
> > > --- a/drivers/net/ethernet/intel/ice/ice_txrx.c
> > > +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
> > > @@ -1162,6 +1162,9 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
> > > bool failure;
> > > u32 first;
> > >
> > > + if (ntc != rx_ring->first_desc)
> > > + cached_ntc = rx_ring->first_desc;
> > > +
> >
> > Above ntc is initialised as rx_ring->next_to_clean.
> > And cached_ntc is initialised as ntc, i.e. rx_ring->next_to_clean.
> >
> > u32 ntc = rx_ring->next_to_clean;
> > ...
> > u32 cached_ntc = ntc;
> >
> > So in effect we have:
> >
> > if (rx_ring->next_to_clean != rx_ring->first_desc)
> > cached_ntc = rx_ring->first_desc;
> > else
> > cached_ntc = rx_ring->next_to_clean;
> >
> > I wonder if we can simplify this by simply initialising cached_ntc as:
> >
> > cached_ntc = rx_ring->first_desc;
>
> Thanks for taking a fresh look over this, indeed this can be simplified as
> you pointed out :) being stuck on some problem makes us sometimes unable
> to see such obvious things ;)
Indeed it can. Happy to help :)
Powered by blists - more mailing lists