linux-kernel - Re: [BUG REPORT]net: page_pool: kernel crash at iommu_get_dma

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e255e7862c29c80174455fc587219badfbd3076f.camel@linux.ibm.com>
Date: Tue, 06 Aug 2024 15:35:37 +0200
From: Niklas Schnelle <schnelle@...ux.ibm.com>
To: Yunsheng Lin <linyunsheng@...wei.com>,
        Somnath Kotur
	 <somnath.kotur@...adcom.com>,
        Jesper Dangaard Brouer <hawk@...nel.org>
Cc: Yonglong Liu <liuyonglong@...wei.com>,
        "David S. Miller"
 <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>, pabeni@...hat.com,
        ilias.apalodimas@...aro.org, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org,
        Alexander Duyck <alexander.duyck@...il.com>,
        Alexei Starovoitov <ast@...nel.org>,
        "shenjian (K)"
 <shenjian15@...wei.com>,
        Salil Mehta <salil.mehta@...wei.com>, joro@...tes.org, will@...nel.org,
        robin.murphy@....com, iommu@...ts.linux.dev
Subject: Re: [BUG REPORT]net: page_pool: kernel crash at
 iommu_get_dma_domain+0xc/0x20

On Mon, 2024-08-05 at 20:19 +0800, Yunsheng Lin wrote:
> On 2024/7/31 16:42, Somnath Kotur wrote:
> > On Tue, Jul 30, 2024 at 10:51 PM Jesper Dangaard Brouer <hawk@...nel.org> wrote:
> > > 
> 
> +cc iommu maintainers and list
> 
> > > 
> > > On 30/07/2024 15.08, Yonglong Liu wrote:
> > > > I found a bug when running hns3 driver with page pool enabled, the log
> > > > as below:
> > > > 
> > > > [ 4406.956606] Unable to handle kernel NULL pointer dereference at
> > > > virtual address 00000000000000a8
> > > 
> > > struct iommu_domain *iommu_get_dma_domain(struct device *dev)
> > > {
> > >         return dev->iommu_group->default_domain;
> > > }
> > > 
> > > $ pahole -C iommu_group --hex | grep default_domain
> > >         struct iommu_domain *      default_domain;   /*  0xa8   0x8 */
> > > 
> > > Looks like iommu_group is a NULL pointer (that when deref member
> > > 'default_domain' cause this fault).
> > > 
> > > 
> > > > [ 4406.965379] Mem abort info:
> > > > [ 4406.968160]   ESR = 0x0000000096000004
> > > > [ 4406.971906]   EC = 0x25: DABT (current EL), IL = 32 bits
> > > > [ 4406.977218]   SET = 0, FnV = 0
> > > > [ 4406.980258]   EA = 0, S1PTW = 0
> > > > [ 4406.983404]   FSC = 0x04: level 0 translation fault
> > > > [ 4406.988273] Data abort info:
> > > > [ 4406.991154]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> > > > [ 4406.996632]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> > > > [ 4407.001681]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> > > > [ 4407.006985] user pgtable: 4k pages, 48-bit VAs, pgdp=0000202828326000
> > > > [ 4407.013430] [00000000000000a8] pgd=0000000000000000,
> > > > p4d=0000000000000000
> > > > [ 4407.020212] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> > > > [ 4407.026454] Modules linked in: hclgevf xt_CHECKSUM ipt_REJECT
> > > > nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle
> > > > ip6table_filter ip6_tables hns_roce_hw_v2 hns3 hclge hnae3 xt_addrtype
> > > > iptable_filter xt_conntrack overlay arm_spe_pmu arm_smmuv3_pmu
> > > > hisi_uncore_hha_pmu hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu
> > > > hisi_uncore_pmu fuse rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi
> > > > scsi_transport_iscsi crct10dif_ce hisi_sec2 hisi_hpre hisi_zip
> > > > hisi_sas_v3_hw xhci_pci sbsa_gwdt hisi_qm hisi_sas_main hisi_dma
> > > > xhci_pci_renesas uacce libsas [last unloaded: hnae3]
> > > > [ 4407.076027] CPU: 48 PID: 610 Comm: kworker/48:1
> > > > [ 4407.093343] Workqueue: events page_pool_release_retry
> > > > [ 4407.098384] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS
> > > > BTYPE=--)
> > > > [ 4407.105316] pc : iommu_get_dma_domain+0xc/0x20
> > > > [ 4407.109744] lr : iommu_dma_unmap_page+0x38/0xe8
> > > > [ 4407.114255] sp : ffff80008bacbc80
> > > > [ 4407.117554] x29: ffff80008bacbc80 x28: 0000000000000000 x27:
> > > > ffffc31806be7000
> > > > [ 4407.124659] x26: ffff2020002b6ac0 x25: 0000000000000000 x24:
> > > > 0000000000000002
> > > > [ 4407.131762] x23: 0000000000000022 x22: 0000000000001000 x21:
> > > > 00000000fcd7c000
> > > > [ 4407.138865] x20: ffff0020c9882800 x19: ffff0020856f60c8 x18:
> > > > ffff8000d3503c58
> > > > [ 4407.145968] x17: 0000000000000000 x16: 1fffe00419521061 x15:
> > > > 0000000000000001
> > > > [ 4407.153073] x14: 0000000000000003 x13: 00000401850ae012 x12:
> > > > 000006b10004e7fb
> > > > [ 4407.160177] x11: 0000000000000067 x10: 0000000000000c70 x9 :
> > > > ffffc3180405cd20
> > > > [ 4407.167280] x8 : fefefefefefefeff x7 : 0000000000000001 x6 :
> > > > 0000000000000010
> > > > [ 4407.174382] x5 : ffffc3180405cce8 x4 : 0000000000000022 x3 :
> > > > 0000000000000002
> > > > [ 4407.181485] x2 : 0000000000001000 x1 : 00000000fcd7c000 x0 :
> > > > 0000000000000000
> > > > [ 4407.188589] Call trace:
> > > > [ 4407.191027]  iommu_get_dma_domain+0xc/0x20
> > > > [ 4407.195105]  dma_unmap_page_attrs+0x38/0x1d0
> > > > [ 4407.199361]  page_pool_return_page+0x48/0x180
> > > > [ 4407.203699]  page_pool_release+0xd4/0x1f0
> > > > [ 4407.207692]  page_pool_release_retry+0x28/0xe8
> > > 
> > > I suspect that the DMA IOMMU part was deallocated and freed by the
> > > driver even-though page_pool still have inflight packets.
> > When you say driver, which 'driver' do you mean?
> > I suspect this could be because of the VF instance going away with
> > this cmd - disable the vf: echo 0 >
> > /sys/class/net/eno1/device/sriov_numvfs, what do you think?
> > > 
> > > The page_pool bumps refcnt via get_device() + put_device() on the DMA
> > > 'struct device', to avoid it going away, but I guess there is also some
> > > IOMMU code that we need to make sure doesn't go away (until all inflight
> > > pages are returned) ???
> 
> I guess the above is why thing went wrong here, the question is which
> IOMMU code need to be called here to stop them from going away.
> 
> What I am also curious is that there should be a pool of allocated iova in
> iommu that is corresponding to the in-flight page for page_pool, shouldn't
> iommu wait for the corresponding allocated iova to be freed similarly as
> page_pool does for it's in-flight pages?
> 


Is it possible you're using an IOMMU whose driver doesn't yet support
blocking_domain? I'm currently working an issue on s390 that also
occurs during device removal and is fixed by implementing blocking
domain in the s390 IOMMU driver (patch forthcoming). The root cause for
that is that our domain->ops->attach_dev() fails when during hot-unplug
the device is already gone from the platform's point of view and then
we ended up with a NULL domain unless we have a blocking domain which
can handle non existant devices and gets set as fallback in
__iommu_device_set_domain(). In the case I can reproduce the backtrace
is different[0] but we also saw at least two cases where we see the
exact same call trace as in the first mail of this thread. So far I
suspected them to be due to the blocking domain issue but it could be a
separate issue too.

Thanks,
Niklas

[0]
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 0000000000000000 TEID: 0000000000000483
Fault in home space mode while using kernel ASCE.
AS:0000000159f00007 R3:00000003fe900007 S:00000003fe8ff800 P:000000000000013d
Oops: 0004 ilc:2 [#1] SMP
Modules linked in: ...
CPU: 15 UID: 0 PID: 139 Comm: kmcheck Kdump: loaded ...
Tainted: [W]=WARN
Hardware name: IBM 3931 A01 701 (LPAR)
Krnl PSW : 0404e00180000000 00000109fc6c2d98 (s390_iommu_release_device+0x58/0xf0)
           R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
Krnl GPRS: 0000000000000010 0000000000000000 0000000000000000 00000109fd18eb40
           00000109fcf78c34 0000000000000037 0000000000000000 00000109fd33ae60
           0000000804768748 0000000804768700 00000109fd133f10 0700000000000000
           0000000000000000 000000080474a400 00000109fc6b9d9a 00000089fef6b968
Krnl Code: 00000109fc6c2d8c: a51e0000           llilh   %r1,0
0109fc6c2d90: 580013ac          l       %r0,940(%r1)
0109fc6c2d94: a7180000          lhi     %r1,0
0109fc6c2d98: ba10c0b8          cs      %r1,%r0,184(%r12)
0109fc6c2d9c: ec180008007e      cij     %r1,0,8,00000109fc6c2dac
0109fc6c2da2: 4120c0b8          la      %r2,184(%r12)
0109fc6c2da6: c0e50023b3f5      brasl   %r14,00000109fcb39590
0109fc6c2dac: e310d0200004      lg      %r1,32(%r13)
Call Trace:
 [<00000109fc6c2d98>] s390_iommu_release_device+0x58/0xf0
 [<00000109fc6b9d9a>] iommu_deinit_device+0x7a/0x1c0
 [<00000109fc6b9920>] iommu_release_device+0x160/0x240
 [<00000109fc6b97ae>] iommu_bus_notifier+0x9e/0xb0
 [<00000109fbbb1be2>] blocking_notifier_call_chain+0x72/0x130
 [<00000109fc6d2c0c>] bus_notify+0x10c/0x130
 [<00000109fc6ccc16>] device_del+0x4c6/0x700
 [<00000109fc6349fa>] pci_remove_bus_device+0xfa/0x1c0
 [<00000109fc634af8>] pci_stop_and_remove_bus_device_locked+0x38/0x50
 [<00000109fbb6cb86>] zpci_bus_remove_device+0x66/0xa0
 [<00000109fbb6a9ac>] zpci_event_availability+0x15c/0x270
 [<00000109fc77b16a>] chsc_process_crw+0x48a/0xca0
 [<00000109fc7842c2>] crw_collect_info+0x1d2/0x310
 [<00000109fbbaf85c>] kthread+0x1bc/0x1e0
 [<00000109fbb0f5fa>] __ret_from_fork+0x3a/0x60
 [<00000109fcb5807a>] ret_from_fork+0xa/0x40
Last Breaking-Event-Address:
 [<00000109fc6b9d98>] iommu_deinit_device+0x78/0x1c0
Kernel panic - not syncing: Fatal exception: panic_on_oops