linux-kernel - Re: [PATCH] mm/hmm: Fix bad subpage pointer in try_to_unmap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0ee5166a-26cd-a504-b9db-cffd082ecd38@nvidia.com>
Date:   Mon, 15 Jul 2019 16:34:22 -0700
From:   John Hubbard <jhubbard@...dia.com>
To:     Andrew Morton <akpm@...ux-foundation.org>,
        Ralph Campbell <rcampbell@...dia.com>
CC:     <linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>,
        Jérôme Glisse <jglisse@...hat.com>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Jason Gunthorpe <jgg@...lanox.com>
Subject: Re: [PATCH] mm/hmm: Fix bad subpage pointer in try_to_unmap_one

On 7/15/19 3:00 PM, Andrew Morton wrote:
> On Tue, 9 Jul 2019 18:24:57 -0700 Ralph Campbell <rcampbell@...dia.com> wrote:
> 
>>
>> On 7/9/19 5:28 PM, Andrew Morton wrote:
>>> On Tue, 9 Jul 2019 15:35:56 -0700 Ralph Campbell <rcampbell@...dia.com> wrote:
>>>
>>>> When migrating a ZONE device private page from device memory to system
>>>> memory, the subpage pointer is initialized from a swap pte which computes
>>>> an invalid page pointer. A kernel panic results such as:
>>>>
>>>> BUG: unable to handle page fault for address: ffffea1fffffffc8
>>>>
>>>> Initialize subpage correctly before calling page_remove_rmap().
>>>
>>> I think this is
>>>
>>> Fixes:  a5430dda8a3a1c ("mm/migrate: support un-addressable ZONE_DEVICE page in migration")
>>> Cc: stable
>>>
>>> yes?
>>>
>>
>> Yes. Can you add this or should I send a v2?
> 
> I updated the patch.  Could we please have some review input?
> 
> 
> From: Ralph Campbell <rcampbell@...dia.com>
> Subject: mm/hmm: fix bad subpage pointer in try_to_unmap_one
> 
> When migrating a ZONE device private page from device memory to system
> memory, the subpage pointer is initialized from a swap pte which computes
> an invalid page pointer. A kernel panic results such as:
> 
> BUG: unable to handle page fault for address: ffffea1fffffffc8
> 
> Initialize subpage correctly before calling page_remove_rmap().
> 
> Link: http://lkml.kernel.org/r/20190709223556.28908-1-rcampbell@nvidia.com
> Fixes: a5430dda8a3a1c ("mm/migrate: support un-addressable ZONE_DEVICE page in migration")
> Signed-off-by: Ralph Campbell <rcampbell@...dia.com>
> Cc: "Jérôme Glisse" <jglisse@...hat.com>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
> Cc: Mike Kravetz <mike.kravetz@...cle.com>
> Cc: Jason Gunthorpe <jgg@...lanox.com>
> Cc: <stable@...r.kernel.org>
> Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
> ---
> 
>  mm/rmap.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> --- a/mm/rmap.c~mm-hmm-fix-bad-subpage-pointer-in-try_to_unmap_one
> +++ a/mm/rmap.c
> @@ -1476,6 +1476,7 @@ static bool try_to_unmap_one(struct page
>  			 * No need to invalidate here it will synchronize on
>  			 * against the special swap migration pte.
>  			 */
> +			subpage = page;
>  			goto discard;
>  		}
>  

Hi Ralph and everyone,

While the above prevents a crash, I'm concerned that it is still not
an accurate fix. This fix leads to repeatedly removing the rmap, against the
same struct page, which is odd, and also doesn't directly address the
root cause, which I understand to be: this routine can't handle migrating
the zero page properly--over and back, anyway. (We should also mention more 
about how this is triggered, in the commit description.)

I'll take a closer look at possible fixes (I have to step out for a bit) soon, 
but any more experienced help is also appreciated here.

thanks,
-- 
John Hubbard
NVIDIA