[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20220719013419.knzzx7thm5qkskcd@revolver>
Date: Tue, 19 Jul 2022 01:39:28 +0000
From: Liam Howlett <liam.howlett@...cle.com>
To: Hugh Dickins <hughd@...gle.com>
CC: David Hildenbrand <david@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Yu Zhao <yuzhao@...gle.com>,
Matthew Wilcox <willy@...radead.org>,
"maple-tree@...ts.infradead.org" <maple-tree@...ts.infradead.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] maple_tree: Fix sparse reported issues
* Hugh Dickins <hughd@...gle.com> [220718 17:34]:
> On Mon, 18 Jul 2022, Liam Howlett wrote:
> > >
> > > I said before that I expected the test run to hit the swapops.h
> > > migration entry !PageLocked BUG, but it did not. It ran for
> > > nearly 7 hours, and then one of its builds terminated with
> > >
> > > {standard input}: Assembler messages:
> > > {standard input}: Error: open CFI at the end of file;
> > > missing .cfi_endproc directive
> > > gcc: fatal error: Killed signal terminated program cc1
> > > compilation terminated.
> > >
> > > which I've never seen before. Usually I'd put something like that down
> > > to a error in swap, or a TLB flushing error (but I include Nadav's fix
> > > in my kernel, and wouldn't get very far without it): neither related to
> > > the maple tree patchset.
> > >
> > > But on this occasion, my guess is that it's actually an example of what
> > > the swapops.h migration entry !PageLocked BUG is trying to alert us to.
> > >
> > > Imagine when such a "stale" migration entry is found, but the page it
> > > points to (now reused for something else) just happens to be PageLocked
> > > at that instant. Then the BUG won't fire, and we proceed to use the
> > > page as if it's ours, but it's not. I think that's what happened.
> > >
> > > I must get on with the day: more testing, and thinking.
> >
> >
> > I think this is the same issue seen here:
> > https://lore.kernel.org/linux-mm/YsQt3IHbJnAhsSWl@casper.infradead.org/
>
> Yes, that's a swapops.h migration entry !PageLocked BUG on brk.
>
> >
> > Note that on 20220616, the maple tree was in the next.
> >
> > I suspect I am doing something wrong in do_brk_munmap(). I am using a
> > false VMA to munmap a partial vma by setting it up like the part of the
> > VMA that would have been split, inserted into the tree, then removed and
> > freed. I must be missing something necessary for this to function
> > correctly.
>
> Thanks for pointing to that, yes, the vma_init(&unmap, mm) etc in
> do_brk_munmap(): I hadn't noticed struct vma_area_struct unmap before.
>
> And almost coincident with your mail, my next test run crashed on
> kernel BUG at mm/huge_memory.c:2013, VM_BUG_ON_VMA(vma->vm_start > haddr),
> in __split_huge_pmd_locked(), while servicing do_brk_munmap():
> no doubt for a (different but) related reason.
>
> Presumably you noticed an opportunity to optimize out some maple tree
> operations by giving do_brk_munmap() its own processing. But I think
> you're going to have to undo all that, and make do_brk_munmap() do
> things in the standard, less direct, munmap() way - using
> do_mas_align_munmap() I presume.
>
> (Oh dear, I see that doing mas_preallocate() at the beginning,
> but then __split_vma()s inside, and __split_vma() will do a
> vma_adjust(), which will do a mas_preallocate(). Ah, but they
> are on distinct "mas"es at different levels, so it's probably okay.)
Yes, this is not as efficient as I'd like and I'd like to find out which
are needed but right now this is the safest and it turns out allocating
is super fast, especially when you get 16 per page. I think any
function that already allocates *should* be safe to use a standard
mas_store_gfp() and let the tree allocate what is necessary. I had to
add preallocations due to fs_reclaim lockdep.
Think about the insert/removal of a vma into the linked list + gaps
calculation + allocation of new vma + rebalance the rbtree that we do
today and how that plays out on the CPU cache.
>
> If rmap is to be sure to find migration entries that it inserted
> earlier, you must hold anon_vma_lock_write() (in the brk case) across
> the tricky vma manipulations. No doubt you could write an optimized
> do_brk_munmap() which does everything under anon_vma_lock_write(), but
> you'd then be duplicating far too much of the care in __vma_adjust()
> for my taste (I'm already not so happy to find it duplicated in
> vma_expand()).
We already have a simplified case of do_mmap() for do_brk_flags(),
according to the comment. I thought a simplified version for the
reverse would be acceptable as well.
>
> I'll continue with some testing, but expect it to keep hitting these
> issues until do_brk_munmap() is rewritten - perhaps it reduces to
> little more than a wrapper like do_munmap() or do_mas_munmap().
I appreciate the effort on this.
Thanks,
Liam
Powered by blists - more mailing lists