lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200619011449.GC135965@carbon.dhcp.thefacebook.com>
Date:   Thu, 18 Jun 2020 18:14:49 -0700
From:   Roman Gushchin <guro@...com>
To:     Yang Shi <shy828301@...il.com>
CC:     <iommu@...ts.linux-foundation.org>, Joerg Roedel <joro@...tes.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux MM <linux-mm@...ck.org>,
        Michal Hocko <mhocko@...nel.org>,
        Johannes Weiner <hannes@...xchg.org>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Wei Yang <richardw.yang@...ux.intel.com>
Subject: Re: kernel BUG at mm/huge_memory.c:2613!

On Thu, Jun 18, 2020 at 05:46:24PM -0700, Yang Shi wrote:
> On Thu, Jun 18, 2020 at 5:19 PM Roman Gushchin <guro@...com> wrote:
> >
> > Hi!
> >
> > I was consistently hitting a VM_BUG_ON_PAGE() in split_huge_page_to_list()
> > when running vanilla 5.8-rc1 on my desktop. It was happening on every boot
> > during the system start. I haven't seen this issue on 5.7.
> >
> > It looks like split_huge_page() expects the page to be locked,
> > but it hasn't been changed from 5.7. I do not see any suspicious
> > commits around the call side either.
> >
> > I've tried the following patch:
> >
> > --
> > From 4af38fbf06a9354fadf22a78f1a42dfbb24fbc3a Mon Sep 17 00:00:00 2001
> > From: Roman Gushchin <guro@...com>
> > Date: Thu, 18 Jun 2020 16:33:47 -0700
> > Subject: [PATCH] iommu/dma: lock page before calling split_huge_page()
> >
> > split_huge_page() expects a locked page. The following stacktrace
> > is generated if debug is on. Fix this by locking the page before
> > passing it to split_huge_page().
> >
> > [   24.861385] page:ffffef044fb1fa00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 head:ffffef044fb1fa00 order:2 compound_mapcount:0 compound_pincount:0
> > [   24.861389] flags: 0x17ffffc0010000(head)
> > [   24.861393] raw: 0017ffffc0010000 dead000000000100 dead000000000122 0000000000000000
> > [   24.861395] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
> > [   24.861396] page dumped because: VM_BUG_ON_PAGE(!PageLocked(head))
> > [   24.861411] ------------[ cut here ]------------
> > [   24.861413] kernel BUG at mm/huge_memory.c:2613!
> > [   24.861428] invalid opcode: 0000 [#1] SMP NOPTI
> > [   24.861432] CPU: 10 PID: 1505 Comm: pulseaudio Not tainted 5.8.0-rc1+ #689
> > [   24.861433] Hardware name: Gigabyte Technology Co., Ltd. AB350-Gaming/AB350-Gaming-CF, BIOS F25 01/16/2019
> > [   24.861441] RIP: 0010:split_huge_page_to_list+0x731/0xae0
> > [   24.861444] Code: 44 00 00 8b 47 34 85 c0 0f 84 b4 02 00 00 f0 ff 4f 34 75 c2 e8 e0 12 f7 ff eb bb 48 c7 c6 d0 16 39 ba 4c 89 c7 e8 ef 85 f9 ff <0f> 0b 48 c7 44 24 10 ff ff ff ff 31 db e9 bb fa ff ff 48 8b 7c 24
> > [   24.861446] RSP: 0018:ffffc1030254bb50 EFLAGS: 00010286
> > [   24.861449] RAX: 0000000000000000 RBX: 0000000000000002 RCX: ffff9b54cee98d08
> > [   24.861451] RDX: 00000000ffffffd8 RSI: 0000000000000000 RDI: ffff9b54cee98d00
> > [   24.861452] RBP: ffffef044fb1fa00 R08: 0000000000000547 R09: 0000000000000003
> > [   24.861454] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9b54df37f188
> > [   24.861455] R13: ffff9b54df355000 R14: ffffef044fb1fa00 R15: ffffef044fb1fa00
> > [   24.861458] FS:  00007fd2dc132880(0000) GS:ffff9b54cee80000(0000) knlGS:0000000000000000
> > [   24.861460] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   24.861461] CR2: 00007fd2cb100000 CR3: 00000003feb16000 CR4: 00000000003406e0
> > [   24.861464] Call Trace:
> > [   24.861473]  ? __mod_lruvec_state+0x41/0xf0
> > [   24.861478]  ? __alloc_pages_nodemask+0x15c/0x320
> > [   24.861483]  iommu_dma_alloc+0x316/0x580
> > [   24.861496]  snd_dma_alloc_pages+0xdf/0x160 [snd_pcm]
> > [   24.861508]  snd_dma_alloc_pages_fallback+0x5d/0x80 [snd_pcm]
> > [   24.861516]  snd_malloc_sgbuf_pages+0x166/0x380 [snd_pcm]
> > [   24.861523]  ? snd_pcm_hw_refine+0x29d/0x310 [snd_pcm]
> > [   24.861529]  ? _cond_resched+0x16/0x40
> > [   24.861535]  snd_dma_alloc_pages+0x64/0x160 [snd_pcm]
> > [   24.861542]  snd_pcm_lib_malloc_pages+0x136/0x1d0 [snd_pcm]
> > [   24.861550]  ? snd_pcm_lib_ioctl+0x167/0x210 [snd_pcm]
> > [   24.861556]  snd_pcm_hw_params+0x3c0/0x490 [snd_pcm]
> > [   24.861563]  snd_pcm_common_ioctl+0x1c5/0x1110 [snd_pcm]
> > [   24.861571]  ? snd_pcm_info_user+0x64/0x80 [snd_pcm]
> > [   24.861578]  snd_pcm_ioctl+0x23/0x30 [snd_pcm]
> > [   24.861583]  ksys_ioctl+0x82/0xc0
> > [   24.861587]  __x64_sys_ioctl+0x16/0x20
> > [   24.861593]  do_syscall_64+0x4d/0x90
> > [   24.861597]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >
> > Signed-off-by: Roman Gushchin <guro@...com>
> > ---
> >  drivers/iommu/dma-iommu.c | 12 ++++++++++--
> >  1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index 4959f5df21bd..31e4e305d8d5 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -24,6 +24,7 @@
> >  #include <linux/scatterlist.h>
> >  #include <linux/vmalloc.h>
> >  #include <linux/crash_dump.h>
> > +#include <linux/pagemap.h>
> >
> >  struct iommu_dma_msi_page {
> >         struct list_head        list;
> > @@ -549,8 +550,15 @@ static struct page **__iommu_dma_alloc_pages(struct device *dev,
> >                         if (!PageCompound(page)) {
> >                                 split_page(page, order);
> >                                 break;
> > -                       } else if (!split_huge_page(page)) {
> > -                               break;
> > +                       } else {
> > +                               int err;
> > +
> > +                               lock_page(page);
> > +                               err = split_huge_page(page);
> > +                               unlock_page(page);
> 
> Yes, THP split does need the page locked, in addition it needs the
> caller hold a pin on the page too (refcount bump).
> 
> But, I don't get how the code could even really work by a quick look.
> Actually split_huge_page() assumes the passed in THP is user THP (anon
> or file cache) and the order is PMD order However, it looks the iommu
> driver just wants to allocate a bunch of base pages by allocating a
> huge page (could by any order if I read the code correctly) then split
> them to base pages. I don't think this is the correct approach IMO.
> Anyway I'm not iommu expert, if I miss anything please feel free to
> let me know.

I agree. The whole

	page = alloc_pages_node(nid, alloc_flags, order);
	if (!page)
		continue;
	if (!order)
		break;
	if (!PageCompound(page)) {
		split_page(page, order);
		break;
	} else if (!split_huge_page(page)) {
		break;
	}

looks very suspicious to me.
My wild guess is that gfp flags changed somewhere above, so we hit
the branch which was never hit before.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ