linux-kernel - Re: [PATCH] maple_tree: Fix sparse reported issues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7db5a8c5-9084-a7fe-6e83-713e52ed8539@google.com>
Date:   Mon, 18 Jul 2022 10:30:49 -0700 (PDT)
From:   Hugh Dickins <hughd@...gle.com>
To:     Liam Howlett <liam.howlett@...cle.com>
cc:     Hugh Dickins <hughd@...gle.com>,
        David Hildenbrand <david@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Yu Zhao <yuzhao@...gle.com>,
        Matthew Wilcox <willy@...radead.org>,
        "maple-tree@...ts.infradead.org" <maple-tree@...ts.infradead.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] maple_tree: Fix sparse reported issues

On Mon, 18 Jul 2022, Liam Howlett wrote:
> * Liam R. Howlett <Liam.Howlett@...cle.com> [220718 08:56]:
> > * Hugh Dickins <hughd@...gle.com> [220718 00:28]:
> > > On Mon, 18 Jul 2022, Liam Howlett wrote:
> > > > * Hugh Dickins <hughd@...gle.com> [220717 16:58]:
> > > > > On Fri, 15 Jul 2022, Liam Howlett wrote:
> > > > > > 
> > > > > > Please find attached the last outstanding fix for this series.  It
> > > > > > covers a rare failure scenario that one of the build bots reported.
> > > > > > Ironically, it changes __vma_adjust() more than the rest of the series,
> > > > > > but leaves the locking in the existing order.
> > > > > 
> > > > > Thanks, I folded that in to my testing on next-20220715, along with
> > > > > other you posted on Friday (mas_dead_leaves() walk fix);
> > > > 
> > > >   Please drop that patch, it needs more testing.
> > > 
> > > Drop the mas_dead_leaves() walk fix, or the __vma_adjust() changes
> > > which you attached in that mail to me? please let me know a.s.a.p,
> > > since I cannot proceed without knowing which.
> > 
> > Drop the mas_dead_leaves() walk fix please.  I responded to the patch
> > that it's not tested enough.  I'll respond to the rest of this email
> > soon.

Your mail situation gets even stranger.  You sent the mas_dead_leaves()
walk fix on 12th July, followed quickly by that response that it's not
tested enough, so I ignored it completely then.  But you sent the same
fix (I've only compared visually, it looks the same) again on 15th July,
so then I took it in.

I wonder whether Oracle's mail server decided to send out a repeat of
that patch in place of the missing all-important stale data copy one!

> 
> So the mas_dead_leaves() patch will most likely produce memory leaks on
> big-endian systems.

I'll take out the mas_dead_leaves() walk patch, but since I was only
testing on x86, it won't have mattered.

...

> I expect it is because your search skipped the badness inserted by the
> bug from #6.  I would think the workload that this was crashing on would
> split the nodes in a certain way that bad VMAs ended up ahead of the
> correct data?

Maybe; I can't afford to speculate further on it.

...

> So the only time I've even seen __vma_adjust() fail is with a fault
> injector failing mas_preallocate() allocations.  If it's safe to not
> unwind, I'm happy to drop both unwinds but I was concerned in the path
> of a vma_merge() calling __vma_adjust() and failing out on allocations
> then OOM recovering, leaving a VMA with a 1/2 merged vma with anon
> incorrectly set.. which is an even more unlikely scenario.

It's not half-merged, it is correctly set up (just like if a write fault
had occurred somewhere in that extent before the merge), so no need to
unwind.

...

> Right, the __split_vma() never adjusts anything but one side of the
> 'vma' VMA by inserting the 'insert' VMA.  This will result in two writes
> to the tree - but one will exactly fit in an existing range which will
> be placed without an allocation via the mas_wr_slot_store() function in
> the maple tree.  Exact fits are nice - they are fast.

I'll have to come back and think about this again later on: "Exact fits
are nice" may answer my concern in the end, but I still have the worry
that the first store destroys the prealloc, when it might be the second
store which needs the prealloc.

...

> > > > Do you have the patch
> > > > "maple_tree-Fix-stale-data-copy-in-mas_wr_node_store.patch"? It sounds
> > > > like your issue fits this fix exactly. I was seeing the same issue with
> > > > gcc 9.3.1 20200408 and this bug doesn't happen for me now.  The logs
> > > > you sent also fit the situation. I went through the same exercise
> > > > (exorcism?) of debugging the various additions and removals of the VMA
> > > > only to find the issue in the tree itself.  The fix also modified the
> > > > test code to detect the issue - which was actually hit but not detected
> > > > in the existing test cases from a live capture of VMA activities.  It is
> > > > difficult to spot in the tree dump as well.  I am sure I sent this to
> > > > Andrew as it is included in v11 and did not show up in his diff, but I
> > > > cannot find it on lore, perhaps I forgot to CC you?  I've attached it
> > > > here for you in case you missed it.
> > > 
> > > Thanks!  No, I never received that patch, nor can I see it on lore
> > > or marc.info; but I (still) haven't looked at v11, and don't know
> > > about Andrew's diff.  Anyway, sounds exciting, I'm eager to stop
> > > writing this mail and get to testing with that in - but please
> > > let me know whether it's the mas_dead_leaves() or the __vma_adjust()
> > > mods you attached previously, which you want me to leave out.

The overnight test run ended in an unexpected way, but I believe we can
count it as a success - a big success for your stale data copy fix.

(If only that fix had got through the mail system on Friday,
my report on Sunday would have been much more optimistic.)

I said before that I expected the test run to hit the swapops.h
migration entry !PageLocked BUG, but it did not.  It ran for
nearly 7 hours, and then one of its builds terminated with

{standard input}: Assembler messages:
{standard input}: Error: open CFI at the end of file;
 missing .cfi_endproc directive
gcc: fatal error: Killed signal terminated program cc1
compilation terminated.

which I've never seen before.  Usually I'd put something like that down
to a error in swap, or a TLB flushing error (but I include Nadav's fix
in my kernel, and wouldn't get very far without it): neither related to
the maple tree patchset.

But on this occasion, my guess is that it's actually an example of what
the swapops.h migration entry !PageLocked BUG is trying to alert us to.

Imagine when such a "stale" migration entry is found, but the page it
points to (now reused for something else) just happens to be PageLocked
at that instant.  Then the BUG won't fire, and we proceed to use the
page as if it's ours, but it's not.  I think that's what happened.

I must get on with the day: more testing, and thinking.

Hugh