netdev - Re: [BUG REPORT]net: page_pool: kernel crash at iommu_get_dma

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKgT0Uf4xBJDLMxa3awSnzgZvbb-LN82APkPi7uVpWw+j7wqRA@mail.gmail.com>
Date: Fri, 2 Aug 2024 09:38:37 -0700
From: Alexander Duyck <alexander.duyck@...il.com>
To: Yonglong Liu <liuyonglong@...wei.com>
Cc: "David S. Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, pabeni@...hat.com, 
	hawk@...nel.org, ilias.apalodimas@...aro.org, netdev@...r.kernel.org, 
	linux-kernel@...r.kernel.org, Alexei Starovoitov <ast@...nel.org>, 
	linyunsheng <linyunsheng@...wei.com>, "shenjian (K)" <shenjian15@...wei.com>, 
	Salil Mehta <salil.mehta@...wei.com>
Subject: Re: [BUG REPORT]net: page_pool: kernel crash at iommu_get_dma_domain+0xc/0x20

On Tue, Jul 30, 2024 at 6:08 AM Yonglong Liu <liuyonglong@...wei.com> wrote:
>
> I found a bug when running hns3 driver with page pool enabled, the log
> as below:
>
> [ 4406.956606] Unable to handle kernel NULL pointer dereference at
> virtual address 00000000000000a8
> [ 4406.965379] Mem abort info:
> [ 4406.968160]   ESR = 0x0000000096000004
> [ 4406.971906]   EC = 0x25: DABT (current EL), IL = 32 bits
> [ 4406.977218]   SET = 0, FnV = 0
> [ 4406.980258]   EA = 0, S1PTW = 0
> [ 4406.983404]   FSC = 0x04: level 0 translation fault
> [ 4406.988273] Data abort info:
> [ 4406.991154]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> [ 4406.996632]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> [ 4407.001681]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [ 4407.006985] user pgtable: 4k pages, 48-bit VAs, pgdp=0000202828326000
> [ 4407.013430] [00000000000000a8] pgd=0000000000000000, p4d=0000000000000000
> [ 4407.020212] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> [ 4407.026454] Modules linked in: hclgevf xt_CHECKSUM ipt_REJECT
> nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle
> ip6table_filter ip6_tables hns_roce_hw_v2 hns3 hclge hnae3 xt_addrtype
> iptable_filter xt_conntrack overlay arm_spe_pmu arm_smmuv3_pmu
> hisi_uncore_hha_pmu hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu
> hisi_uncore_pmu fuse rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi
> scsi_transport_iscsi crct10dif_ce hisi_sec2 hisi_hpre hisi_zip
> hisi_sas_v3_hw xhci_pci sbsa_gwdt hisi_qm hisi_sas_main hisi_dma
> xhci_pci_renesas uacce libsas [last unloaded: hnae3]
> [ 4407.076027] CPU: 48 PID: 610 Comm: kworker/48:1
> [ 4407.093343] Workqueue: events page_pool_release_retry
> [ 4407.098384] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS
> BTYPE=--)
> [ 4407.105316] pc : iommu_get_dma_domain+0xc/0x20
> [ 4407.109744] lr : iommu_dma_unmap_page+0x38/0xe8
> [ 4407.114255] sp : ffff80008bacbc80
> [ 4407.117554] x29: ffff80008bacbc80 x28: 0000000000000000 x27:
> ffffc31806be7000
> [ 4407.124659] x26: ffff2020002b6ac0 x25: 0000000000000000 x24:
> 0000000000000002
> [ 4407.131762] x23: 0000000000000022 x22: 0000000000001000 x21:
> 00000000fcd7c000
> [ 4407.138865] x20: ffff0020c9882800 x19: ffff0020856f60c8 x18:
> ffff8000d3503c58
> [ 4407.145968] x17: 0000000000000000 x16: 1fffe00419521061 x15:
> 0000000000000001
> [ 4407.153073] x14: 0000000000000003 x13: 00000401850ae012 x12:
> 000006b10004e7fb
> [ 4407.160177] x11: 0000000000000067 x10: 0000000000000c70 x9 :
> ffffc3180405cd20
> [ 4407.167280] x8 : fefefefefefefeff x7 : 0000000000000001 x6 :
> 0000000000000010
> [ 4407.174382] x5 : ffffc3180405cce8 x4 : 0000000000000022 x3 :
> 0000000000000002
> [ 4407.181485] x2 : 0000000000001000 x1 : 00000000fcd7c000 x0 :
> 0000000000000000
> [ 4407.188589] Call trace:
> [ 4407.191027]  iommu_get_dma_domain+0xc/0x20
> [ 4407.195105]  dma_unmap_page_attrs+0x38/0x1d0
> [ 4407.199361]  page_pool_return_page+0x48/0x180
> [ 4407.203699]  page_pool_release+0xd4/0x1f0
> [ 4407.207692]  page_pool_release_retry+0x28/0xe8
> [ 4407.212119]  process_one_work+0x164/0x3e0
> [ 4407.216116]  worker_thread+0x310/0x420
> [ 4407.219851]  kthread+0x120/0x130
> [ 4407.223066]  ret_from_fork+0x10/0x20
> [ 4407.226630] Code: ffffc318 aa1e03e9 d503201f f9416c00 (f9405400)
> [ 4407.232697] ---[ end trace 0000000000000000 ]---

The issue as I see it is that we aren't unmapping the pages when we
call page_pool_destroy. There need to be no pages remaining with a DMA
unmapping needed *after* that is called. Otherwise we will see this
issue regularly.

What we probably need to look at doing is beefing up page_pool_release
to add a step that will take an additional reference on the inflight
pages, then call __page_pool_put_page to switch them to a reference
counted page.

Seems like the worst case scenario is that we are talking about having
to walk the page table to do the above for any inflight pages but it
would certainly be a much more deterministic amount of time needed to
do that versus waiting on a page that may or may not return.

Alternatively a quick hack that would probably also address this would
be to clear poll->dma_map in page_pool_destroy or maybe in
page_pool_unreg_netdev so that any of those residual mappings would
essentially get leaked, but we wouldn't have to worry about trying to
unmap while the device doesn't exist.