linux-kernel - Re: [PATCH] maple_tree: Fix sparse reported issues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fc9c2193-f5d7-d494-8e4e-c9f340ae8625@google.com>
Date:   Sun, 17 Jul 2022 13:57:47 -0700 (PDT)
From:   Hugh Dickins <hughd@...gle.com>
To:     Liam Howlett <liam.howlett@...cle.com>
cc:     Hugh Dickins <hughd@...gle.com>,
        David Hildenbrand <david@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Yu Zhao <yuzhao@...gle.com>,
        Matthew Wilcox <willy@...radead.org>,
        "maple-tree@...ts.infradead.org" <maple-tree@...ts.infradead.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] maple_tree: Fix sparse reported issues

On Fri, 15 Jul 2022, Liam Howlett wrote:
> * Liam R. Howlett <Liam.Howlett@...cle.com> [220713 13:50]:
> > * Hugh Dickins <hughd@...gle.com> [220713 11:56]:
> > > On Wed, 13 Jul 2022, Liam Howlett wrote:
> > > > * David Hildenbrand <david@...hat.com> [220713 04:34]:
> > > > > On 12.07.22 16:24, Liam Howlett wrote:
> > > > > > When building with C=1, the maple tree had some rcu type mismatch &
> > > > > > locking mismatches in the destroy functions.  There were cosmetic only
> > > > > > since this happens after the nodes are removed from the tree.
> > > > > > 
> > > > > > Fixes: f8acc5e9581e (Maple Tree: add new data structure)
> > > > > > Reported-by: kernel test robot <lkp@...el.com>
> > > > > > Signed-off-by: Liam R. Howlett <Liam.Howlett@...cle.com>
> > > > > 
> > > > > Sorry to say, but the fixes become hard to follow (what/where/why). :)
> > > > > 
> > > > > I guess it's time for a new series soon. Eventually it makes sense to
> > > > > send the fixes as reply to the individual problematic patches. (instead
> > > > > of fixes to commit ids that are not upstream)
> > > > > 
> > > > > [yes, I'll do more review soon :) ]
> > > > 
> > > > I appreciate the feedback, it's much better than yelling into the void.
> > > > I have one more fix in the works - for __vma_adjust() of all functions
> > > > so that'll be impossible to follow anyways :)  I'll work on a v11 to
> > > > include that last one.
> > > 
> > > Please do also post the incremental for that "one more fix" once it's
> > > ready: I have been keeping up with what you've been posting so far,
> > > folding them into my debugging here, and believe we have made some but
> > > still not enough progress on the bugs I hit.  Folding in one more fix
> > > will be easy for me, advancing to v11 of a 69-part patchset will be...
> > > dispiriting.
> > 
> > 
> > Okay, thanks.  I don't think it will fix your outstanding issues but it
> > is necessary to fix case 6 of vma_merge() on memory allocation failure
> > as reported by syzbot.
> 
> Hugh,
> 
> Please find attached the last outstanding fix for this series.  It
> covers a rare failure scenario that one of the build bots reported.
> Ironically, it changes __vma_adjust() more than the rest of the series,
> but leaves the locking in the existing order.

Thanks, I folded that in to my testing on next-20220715, along with
other you posted on Friday (mas_dead_leaves() walk fix); but have not
even glanced at the v11 you posted Saturday, though I see from responses
that v11 has some other changes, including __vma_adjust() again, but I
just have not looked.  (I've had my own experiments in __vma_adjust()).

You'll be wanting my report, I'll give it here rather than in the v11
thread.  In short, some progress, but still frustratingly none on the
main crashes.

1. swapops.h BUG on !PageLocked migration entry.  This is *not* the
most urgent to fix, I'm just listing it first to get it out of the way
here.  This is the BUG that terminates every tmpfs swapping load test
on the laptop: only progress was that, just one time, the workstation
hit it before hitting its usual problems: nice to see it there too.
I'll worry about this bug when the rest is running stably.  I've only
ever noticed it when it's brk being unmapped, I expect that's a clue.

2. Silly in do_mas_mumap():
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2513,7 +2513,7 @@ int do_mas_munmap(struct ma_state *mas, struct mm_struct *mm,
 	arch_unmap(mm, start, end);
 
 	/* Find the first overlapping VMA */
-	vma = mas_find(mas, end - 1);
+	vma = mas_find(mas, start);
 	if (!vma)
 		return 0;
 
Fixing that does fix some bad behaviors seen - I'd been having crashes in
unmap_vmas() and unlink_anon_vmas(), and put "if (WARN_ON(!vma)) return;"
in unmap_region(); but that no longer seems necessary now do_mas_munmap()
is fixed thus.  I had high hopes that it would fix all the rest, but no.

3. vma_adjust_trans_huge().  Skip this paragraph, I think there
is actually no problem here, but for safety I have rearranged the
vma_adjust_trans_huge()s inside the rmap locks, because when things
aren't working right, best to rule out some possibilities.  Why am
I even mentioning it here?  In case I send any code snippets and
you're puzzled by vma_adjust_trans_huge() having moved.

4. unlink_anon_vmas() in __vma_adjust().  Not urgent to fix (can only
be an issue when GFP_KERNEL preallocation fails, which I think means
when current is being OOM-killed), but whereas vma_expand() has careful
anon_cloned flag, __vma_adjust() does not, and I think could be
unlinking a pre-existing anon_vma.  Aside from that, I am nervous about
using unlink_anon_vmas() there like that (and in vma_expand()): IIUC it's
an unnecessary "optimization" for a very unlikely case, that's in danger
of doing more harm than good; and should be removed from them both (then
they can both simply return -ENOMEM when mas_preallocate() fails).

5. I was horrified/thrilled to notice last night that mas_store_prealloc()
does a mas_destroy() at the end.  So only the first vma_mas_store() in
__vma_adjust() gets to use the carefully preallocated nodes.  I thought
that might be responsible for lots of nastiness, but again sadly no.
Maybe it just falls back to GFP_ATOMIC when the preallocateds are gone
(I didn't look) and that often works okay.  Whether the carefully
chosen prealloc count allows for __vma_adjust() use would be another
question.  (Earlier on I did try doubling its calculation, in case it
was preallocating too few, but that made no difference either.)

6. The main problem, crashes on the workstation (never seen on the
laptop).  I keep hacking around with the debug info (and, please,
%px not %p, %lx not %lu in the debug info: I've changed them all,
and a couple of %lds, in my copy of lib/maple_tree.c).  I forget
how the typical crashes appeared with the MAPLE_DEBUGs turned off
(the BUG_ON(count != mm->map_count) in exit_mmap() perhaps); I've
turned them on, but usually comment out the mt_validate() and
mt_dump(), which generate far too much noise for non-specialists,
and delay the onset of crash tenfold (but re-enabled to give you
the attached messages.xz).

Every time, it's the cc1 (9.3.1) doing some munmapping (cc1 is
mostly what's running in this load test, so that's not surprising;
but surprising that even when I switched the laptop to using same
gcc-9, couldn't reproduce the problem there).  Typically, that
munmapping has involved splitting a small, say three page, vma
into two pages below and one page above (the "insert", and I've
hacked the debug to dump that too, see "mmap: insert" - ah,
looking at the messages now, the insert is bigger this time).

And what has happened each time I've studied it (I don't know
if this is evident from the mt dumps in the messages, I hope so)
is that the vma and the insert have been correctly placed in the
tree, except that the vma has *also* been placed several pages
earlier, and a linear search of the tree finds that misplaced
instance first, vm_start not matching mt index.

The map_count in these cases has always been around 59, 60, 61:
maybe just typical for cc1, or maybe significant for maple tree?

I won't give up quite yet, but I'm hoping you'll have an idea for
the misplaced tree entry.  Something going wrong in rebalancing?

For a long time I assumed the problem would be at the mm/mmap.c end,
and I've repeatedly tried different things with the vma_mas stuff
in __vma_adjust() (for example, using vma_mas_remove() to remove
vmas before re-adding them, and/or doing mas_reset() in more places);
but none of those attempts actually fixed the issue.  So now I'm
thinking the problem is really at the lib/maple_tree.c end.

7. If you get to do cleanups later, please shrink those double blank
lines to single blank lines.  And find a better name for the strange
vma_mas_szero() - vma_mas_erase(), or is erase something different?
I'm not even sure that it's needed: that's a special case for exec's
shift_arg_pages() VM_STACK_INCOMPLETE_SETUP, which uses __vma_adjust()
in a non-standard way.  And trace_vma_mas_szero() in vma_mas_remove()
looks wrong.

Hugh
Download attachment "messages.xz" of type "application/x-xz" (4564 bytes)