linux-kernel - Re: [PATCH] mm: swapfile: avoid split_swap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200924063038.GD1023012@optiplex-lnx>
Date:   Thu, 24 Sep 2020 02:30:38 -0400
From:   Rafael Aquini <aquini@...hat.com>
To:     "Huang, Ying" <ying.huang@...el.com>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        akpm@...ux-foundation.org
Subject: Re: [PATCH] mm: swapfile: avoid split_swap_cluster() NULL pointer
 dereference

On Thu, Sep 24, 2020 at 11:51:17AM +0800, Huang, Ying wrote:
> Rafael Aquini <aquini@...hat.com> writes:
> > The bug here is quite simple: split_swap_cluster() misses checking for
> > lock_cluster() returning NULL before committing to change cluster_info->flags.
> 
> I don't think so.  We shouldn't run into this situation firstly.  So the
> "fix" hides the real bug instead of fixing it.  Just like we call
> VM_BUG_ON_PAGE(!PageLocked(head), head) in split_huge_page_to_list()
> instead of returning if !PageLocked(head) silently.
>

Not the same thing, obviously, as you are going for an apples-to-carrots
comparison, but since you mentioned:

split_huge_page_to_list() asserts (in debug builds) *page is locked, 
and later checks if *head bears the SwapCache flag. 
deferred_split_scan(), OTOH, doesn't hand down the compound head locked, 
but the 2nd page in the group instead. 
This doesn't necessarely means it's a problem, though, but might help
on hitting the issue. 

 
> > The fundamental problem has nothing to do with allocating, or not allocating
> > a swap cluster, but it has to do with the fact that the THP deferred split scan
> > can transiently race with swapcache insertion, and the fact that when you run
> > your swap area on rotational storage cluster_info is _always_ NULL.
> > split_swap_cluster() needs to check for lock_cluster() returning NULL because
> > that's one possible case, and it clearly fails to do so.
> 
> If there's a race, we should fix the race.  But the code path for
> swapcache insertion is,
> 
> add_to_swap()
>   get_swap_page() /* Return if fails to allocate */
>   add_to_swap_cache()
>     SetPageSwapCache()
> 
> While the code path to split THP is,
> 
> split_huge_page_to_list()
>   if PageSwapCache()
>     split_swap_cluster()
> 
> Both code paths are protected by the page lock.  So there should be some
> other reasons to trigger the bug.

As mentioned above, no they seem to not be protected (at least, not the
same page, depending on the case). While add_to_swap() will assure a 
page_lock on the compound head, split_huge_page_to_list() does not.


> And again, for HDD, a THP shouldn't have PageSwapCache() set at the
> first place.  If so, the bug is that the flag is set and we should fix
> the setting.
> 

I fail to follow your claim here. Where is the guarantee, in the code, that 
you'll never have a compound head in the swapcache? 

> > Run a workload that cause multiple THP COW, and add a memory hogger to create
> > memory pressure so you'll force the reclaimers to kick the registered
> > shrinkers. The trigger is not heavy swapping, and that's probably why
> > most swap test cases don't hit it. The window is tight, but you will get the
> > NULL pointer dereference.
> 
> Do you have a script to reproduce the bug?
> 

Nope, a convoluted set of internal regression tests we have usually
triggers it. In the wild, customers running HANNA are seeing it,
occasionally.

> > Regardless you find furhter bugs, or not, this patch is needed to correct a
> > blunt coding mistake.
> 
> As above.  I don't agree with that.
> 

It's OK to disagree, split_swap_cluster still misses the cluster_info NULL check,
though.