lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YN/7xcxt/XGAKceZ@Ryzen-9-3900X.localdomain>
Date:   Fri, 2 Jul 2021 22:55:17 -0700
From:   Nathan Chancellor <nathan@...nel.org>
To:     Robin Murphy <robin.murphy@....com>
Cc:     Will Deacon <will@...nel.org>, Claire Chang <tientzu@...omium.org>,
        Rob Herring <robh+dt@...nel.org>, mpe@...erman.id.au,
        Joerg Roedel <joro@...tes.org>,
        Frank Rowand <frowand.list@...il.com>,
        Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
        boris.ostrovsky@...cle.com, jgross@...e.com,
        Christoph Hellwig <hch@....de>,
        Marek Szyprowski <m.szyprowski@...sung.com>,
        benh@...nel.crashing.org, paulus@...ba.org,
        "list@....net:IOMMU DRIVERS" <iommu@...ts.linux-foundation.org>,
        Stefano Stabellini <sstabellini@...nel.org>,
        grant.likely@....com, xypron.glpk@....de,
        Thierry Reding <treding@...dia.com>, mingo@...nel.org,
        bauerman@...ux.ibm.com, peterz@...radead.org,
        Greg KH <gregkh@...uxfoundation.org>,
        Saravana Kannan <saravanak@...gle.com>,
        "Rafael J . Wysocki" <rafael.j.wysocki@...el.com>,
        heikki.krogerus@...ux.intel.com,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Randy Dunlap <rdunlap@...radead.org>,
        Dan Williams <dan.j.williams@...el.com>,
        Bartosz Golaszewski <bgolaszewski@...libre.com>,
        linux-devicetree <devicetree@...r.kernel.org>,
        lkml <linux-kernel@...r.kernel.org>,
        linuxppc-dev@...ts.ozlabs.org, xen-devel@...ts.xenproject.org,
        Nicolas Boichat <drinkcat@...omium.org>,
        Jim Quinlan <james.quinlan@...adcom.com>,
        Tomasz Figa <tfiga@...omium.org>, bskeggs@...hat.com,
        Bjorn Helgaas <bhelgaas@...gle.com>, chris@...is-wilson.co.uk,
        Daniel Vetter <daniel@...ll.ch>, airlied@...ux.ie,
        dri-devel@...ts.freedesktop.org, intel-gfx@...ts.freedesktop.org,
        jani.nikula@...ux.intel.com, Jianxiong Gao <jxgao@...gle.com>,
        joonas.lahtinen@...ux.intel.com, linux-pci@...r.kernel.org,
        maarten.lankhorst@...ux.intel.com, matthew.auld@...el.com,
        rodrigo.vivi@...el.com, thomas.hellstrom@...ux.intel.com,
        Tom Lendacky <thomas.lendacky@....com>,
        Qian Cai <quic_qiancai@...cinc.com>
Subject: Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for
 swiotlb data bouncing

Hi Will and Robin,

On Fri, Jul 02, 2021 at 04:13:50PM +0100, Robin Murphy wrote:
> On 2021-07-02 14:58, Will Deacon wrote:
> > Hi Nathan,
> > 
> > On Thu, Jul 01, 2021 at 12:52:20AM -0700, Nathan Chancellor wrote:
> > > On 7/1/2021 12:40 AM, Will Deacon wrote:
> > > > On Wed, Jun 30, 2021 at 08:56:51AM -0700, Nathan Chancellor wrote:
> > > > > On Wed, Jun 30, 2021 at 12:43:48PM +0100, Will Deacon wrote:
> > > > > > On Wed, Jun 30, 2021 at 05:17:27PM +0800, Claire Chang wrote:
> > > > > > > `BUG: unable to handle page fault for address: 00000000003a8290` and
> > > > > > > the fact it crashed at `_raw_spin_lock_irqsave` look like the memory
> > > > > > > (maybe dev->dma_io_tlb_mem) was corrupted?
> > > > > > > The dev->dma_io_tlb_mem should be set here
> > > > > > > (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/probe.c#n2528)
> > > > > > > through device_initialize.
> > > > > > 
> > > > > > I'm less sure about this. 'dma_io_tlb_mem' should be pointing at
> > > > > > 'io_tlb_default_mem', which is a page-aligned allocation from memblock.
> > > > > > The spinlock is at offset 0x24 in that structure, and looking at the
> > > > > > register dump from the crash:
> > > > > > 
> > > > > > Jun 29 18:28:42 hp-4300G kernel: RSP: 0018:ffffadb4013db9e8 EFLAGS: 00010006
> > > > > > Jun 29 18:28:42 hp-4300G kernel: RAX: 00000000003a8290 RBX: 0000000000000000 RCX: ffff8900572ad580
> > > > > > Jun 29 18:28:42 hp-4300G kernel: RDX: ffff89005653f024 RSI: 00000000000c0000 RDI: 0000000000001d17
> > > > > > Jun 29 18:28:42 hp-4300G kernel: RBP: 000000000a20d000 R08: 00000000000c0000 R09: 0000000000000000
> > > > > > Jun 29 18:28:42 hp-4300G kernel: R10: 000000000a20d000 R11: ffff89005653f000 R12: 0000000000000212
> > > > > > Jun 29 18:28:42 hp-4300G kernel: R13: 0000000000001000 R14: 0000000000000002 R15: 0000000000200000
> > > > > > Jun 29 18:28:42 hp-4300G kernel: FS:  00007f1f8898ea40(0000) GS:ffff890057280000(0000) knlGS:0000000000000000
> > > > > > Jun 29 18:28:42 hp-4300G kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > Jun 29 18:28:42 hp-4300G kernel: CR2: 00000000003a8290 CR3: 00000001020d0000 CR4: 0000000000350ee0
> > > > > > Jun 29 18:28:42 hp-4300G kernel: Call Trace:
> > > > > > Jun 29 18:28:42 hp-4300G kernel:  _raw_spin_lock_irqsave+0x39/0x50
> > > > > > Jun 29 18:28:42 hp-4300G kernel:  swiotlb_tbl_map_single+0x12b/0x4c0
> > > > > > 
> > > > > > Then that correlates with R11 holding the 'dma_io_tlb_mem' pointer and
> > > > > > RDX pointing at the spinlock. Yet RAX is holding junk :/
> > > > > > 
> > > > > > I agree that enabling KASAN would be a good idea, but I also think we
> > > > > > probably need to get some more information out of swiotlb_tbl_map_single()
> > > > > > to see see what exactly is going wrong in there.
> > > > > 
> > > > > I can certainly enable KASAN and if there is any debug print I can add
> > > > > or dump anything, let me know!
> > > > 
> > > > I bit the bullet and took v5.13 with swiotlb/for-linus-5.14 merged in, built
> > > > x86 defconfig and ran it on my laptop. However, it seems to work fine!
> > > > 
> > > > Please can you share your .config?
> > > 
> > > Sure thing, it is attached. It is just Arch Linux's config run through
> > > olddefconfig. The original is below in case you need to diff it.
> > > 
> > > https://raw.githubusercontent.com/archlinux/svntogit-packages/9045405dc835527164f3034b3ceb9a67c7a53cd4/trunk/config
> > > 
> > > If there is anything more that I can provide, please let me know.
> > 
> > I eventually got this booting (for some reason it was causing LD to SEGV
> > trying to link it for a while...) and sadly it works fine on my laptop. Hmm.

Seems like it might be something specific to the amdgpu module?

> > Did you manage to try again with KASAN?

Yes, it took a few times to reproduce the issue but I did manage to get
a dmesg, please find it attached. I build from commit 7d31f1c65cc9 ("swiotlb:
fix implicit debugfs declarations") in Konrad's tree.

> > It might also be worth taking the IOMMU out of the equation, since that
> > interfaces differently with SWIOTLB and I couldn't figure out the code path
> > from the log you provided. What happens if you boot with "amd_iommu=off
> > swiotlb=force"?
> 
> Oh, now there's a thing... the chat from the IOMMU API in the boot log
> implies that the IOMMU *should* be in the picture - we see that default
> domains are IOMMU_DOMAIN_DMA default and the GPU 0000:0c:00.0 was added to a
> group. That means dev->dma_ops should be set and DMA API calls should be
> going through iommu-dma, yet the callstack in the crash says we've gone
> straight from dma_map_page_attrs() to swiotlb_map(), implying the inline
> dma_direct_map_page() path.
> 
> If dev->dma_ops didn't look right in the first place, it's perhaps less
> surprising that dev->dma_io_tlb_mem might be wild as well. It doesn't seem
> plausible that we should have a race between initialising the device and
> probing its driver, so maybe the whole dev pointer is getting trampled
> earlier in the callchain (or is fundamentally wrong to begin with, but from
> a quick skim of the amdgpu code it did look like adev->dev and adev->pdev
> are appropriately set early on by amdgpu_pci_probe()).
> 
> > (although word of warning here: i915 dies horribly on my laptop if I pass
> > swiotlb=force, even with the distro 5.10 kernel)
> 
> FWIW I'd imagine you probably need to massively increase the SWIOTLB buffer
> size to have hope of that working.

Is it worth trying this still then?

Cheers,
Nathan

View attachment "7d31f1c65cc9-kasan.log" of type "text/plain" (132064 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ