[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YfO1q52G7GKl+P40@myrica>
Date: Fri, 28 Jan 2022 09:21:47 +0000
From: Jean-Philippe Brucker <jean-philippe@...aro.org>
To: Yunsheng Lin <linyunsheng@...wei.com>
Cc: linuxarm@...neuler.org, ilias.apalodimas@...aro.org,
salil.mehta@...wei.com, netdev@...r.kernel.org,
moyufeng@...wei.com, alexanderduyck@...com, brouer@...hat.com,
kuba@...nel.org
Subject: Re: [PATCH net-next v2 4/4] net: hns3: support skb's frag page
recycling based on page pool
On Fri, Jan 28, 2022 at 12:00:35PM +0800, Yunsheng Lin wrote:
> On 2022/1/26 22:30, Jean-Philippe Brucker wrote:
> > Hi,
> >
> > On Fri, Aug 06, 2021 at 10:46:22AM +0800, Yunsheng Lin wrote:
> >> This patch adds skb's frag page recycling support based on
> >> the frag page support in page pool.
> >>
> >> The performance improves above 10~20% for single thread iperf
> >> TCP flow with IOMMU disabled when iperf server and irq/NAPI
> >> have a different CPU.
> >>
> >> The performance improves about 135%(14Gbit to 33Gbit) for single
> >> thread iperf TCP flow when IOMMU is in strict mode and iperf
> >> server shares the same cpu with irq/NAPI.
> >>
> >> Signed-off-by: Yunsheng Lin <linyunsheng@...wei.com>
> >
> > This commit is giving me some trouble, but I haven't managed to pinpoint
> > the exact problem.
>
> Hi,
> Thanks for reporting the problem.
>
> We also hit a similiar problem during internal CI testing, but was not
> able to trigger it manually, so was not able to find the root case yet.
>
> Is your test case more likely to trigger the problem?
The problem shows up shortly after boot for me, when I'm not doing
anything special, just ssh'ing into the server. I did manage to trigger it
faster with a "netperf -T TCP_MAERTS" job. Maybe I have something enabled
in my config that makes it easier to trigger? Attached the .config to
this reply, but I think it corresponds pretty much to debian's config.
> > Symptoms are:
> > * A page gets unmapped twice from page_pool_release_page(). The second
> > time, dma-iommu.c warns about the empty PTE [1]
> > * The rx ring still accesses the page after the first unmap, causing SMMU
> > translation faults [2]
> > * That leads to APEI errors and reset of the device, at which time
> > page_pool_inflight() complains about "Negative(-x) inflight packet-pages".
> >
> > After some debugging, it looks like the page gets released three times
> > instead of two:
> >
> > (1) first in page_pool_drain_frag():
> >
> > page_pool_alloc_frag+0x1fc/0x248
> > hns3_alloc_and_map_buffer+0x30/0x170
> > hns3_nic_alloc_rx_buffers+0x9c/0x170
> > hns3_clean_rx_ring+0x854/0x950
> > hns3_nic_common_poll+0xa0/0x218
> > __napi_poll+0x38/0x1b0
> > net_rx_action+0xe8/0x248
> > __do_softirq+0x120/0x284
> >
> > (2) Then later by page_pool_return_skb_page(), which (I guess) unmaps the
> > page:
> >
> > page_pool_put_page+0x214/0x308
> > page_pool_return_skb_page+0x48/0x60
> > skb_release_data+0x168/0x188
> > skb_release_all+0x28/0x38
> > kfree_skb+0x30/0x90
> > packet_rcv+0x4c/0x410
> > __netif_receive_skb_list_core+0x1f4/0x218
> > netif_receive_skb_list_internal+0x18c/0x2a8
> >
> > (3) And finally, soon after, by clean_rx_ring() which causes pp_frag_count
> > underflow (seen after removing the optimization in
> > page_pool_atomic_sub_frag_count_return):
> >
> > page_pool_put_page+0x2a0/0x308
> > page_pool_put_full_page
> > hns3_alloc_skb
> > hns3_handle_rx_bd
> > hns3_clean_rx_ring+0x744/0x950
> > hns3_nic_common_poll+0xa0/0x218
> > __napi_poll+0x38/0x1b0
> > net_rx_action+0xe8/0x248
> >
> > So I'm guessing (2) happens too early while the RX ring is still using the
> > page, but I don't know more. I'd be happy to add more debug and to test
>
> If the reference counting or pp_frag_count of the page is manipulated correctly,
> I think step 2&3 does not have any dependency between each other.
>
> > fixes if you have any suggestions.
>
> My initial thinking is to track if the reference counting or pp_frag_count of
> the page is manipulated correctly.
It looks like pp_frag_count is dropped too many times: after (1),
pp_frag_count only has 1 ref, so (2) drops it to 0 and (3) results in
underflow. I turned page_pool_atomic_sub_frag_count_return() into
"atomic_long_sub_return(nr, &page->pp_frag_count)" to make sure (the
atomic_long_read() bit normally hides this). Wasn't entirely sure if this
is expected behavior, though.
Thanks,
Jean
>
> Perhaps using the newly added reference counting tracking infrastructure?
> Will look into how to use the reference counting tracking infrastructure
> for the above problem.
>
> >
> > Thanks,
> > Jean
> >
> >
> > [1] ------------[ cut here ]------------
> > WARNING: CPU: 71 PID: 0 at drivers/iommu/dma-iommu.c:848 iommu_dma_unmap_page+0xbc/0xd8
> > Modules linked in: fuse overlay ipmi_si hisi_hpre hisi_zip ecdh_generic hisi_trng_v2 ecc ipmi_d>
> > CPU: 71 PID: 0 Comm: swapper/71 Not tainted 5.16.0-g3813c61fbaad #22
> > Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V5.B133.01 03/25/2021
> > pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > pc : iommu_dma_unmap_page+0xbc/0xd8
> > lr : iommu_dma_unmap_page+0x38/0xd8
> > sp : ffff800010abb8d0
> > x29: ffff800010abb8d0 x28: ffff20200ee80000 x27: 0000000000000042
> > x26: ffff20201a7ed800 x25: ffff20200be7a5c0 x24: 0000000000000002
> > x23: 0000000000000020 x22: 0000000000001000 x21: 0000000000000000
> > x20: 0000000000000002 x19: ffff002086b730c8 x18: 0000000000000001
> > x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> > x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000008
> > x11: 000000000000ffff x10: 0000000000000001 x9 : 0000000000000004
> > x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffff202006274800
> > x5 : 0000000000000009 x4 : 0000000000000001 x3 : 000000000000001e
> > x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
> > Call trace:
> > iommu_dma_unmap_page+0xbc/0xd8
> > dma_unmap_page_attrs+0x30/0x1d0
> > page_pool_release_page+0x40/0x88
> > page_pool_return_page+0x18/0x80
> > page_pool_put_page+0x248/0x288
> > hns3_clean_rx_ring+0x744/0x950
> > hns3_nic_common_poll+0xa0/0x218
> > __napi_poll+0x38/0x1b0
> > net_rx_action+0xe8/0x248
> > __do_softirq+0x120/0x284
> > irq_exit_rcu+0xe0/0x100
> > el1_interrupt+0x3c/0x88
> > el1h_64_irq_handler+0x18/0x28
> > el1h_64_irq+0x78/0x7c
> > arch_cpu_idle+0x18/0x28
> > default_idle_call+0x20/0x68
> > do_idle+0x214/0x260
> > cpu_startup_entry+0x28/0x70
> > secondary_start_kernel+0x160/0x170
> > __secondary_switched+0x90/0x94
> > ---[ end trace 432d1737b4b96ed9 ]---
> >
> > (please ignore the kernel version, I can reproduce this with v5.14 and
> > v5.17-rc1, and bisected to this commit.)
> >
> > [2] arm-smmu-v3 arm-smmu-v3.6.auto: event 0x10 received:
> > arm-smmu-v3 arm-smmu-v3.6.auto: 0x0000bd0000000010
> > arm-smmu-v3 arm-smmu-v3.6.auto: 0x000012000000007c
> > arm-smmu-v3 arm-smmu-v3.6.auto: 0x00000000ff905800
> > arm-smmu-v3 arm-smmu-v3.6.auto: 0x00000000ff905000
View attachment "config" of type "text/plain" (249004 bytes)
Powered by blists - more mailing lists