lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Wed, 22 May 2019 23:32:53 -0700
From:   John Hubbard <jhubbard@...dia.com>
To:     Jason Gunthorpe <jgg@...pe.ca>,
        Ralph Campbell <rcampbell@...dia.com>
CC:     <linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>,
        Ira Weiny <ira.weiny@...el.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Arnd Bergmann <arnd@...db.de>,
        Balbir Singh <bsingharora@...il.com>,
        Dan Carpenter <dan.carpenter@...cle.com>,
        Matthew Wilcox <willy@...radead.org>,
        Souptick Joarder <jrdr.linux@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH 5/5] mm/hmm: Fix mm stale reference use in hmm_free()

On 5/22/19 6:25 PM, Jason Gunthorpe wrote:
> On Wed, May 22, 2019 at 05:54:17PM -0700, Ralph Campbell wrote:
>>
>> On 5/22/19 4:36 PM, Jason Gunthorpe wrote:
>>> On Mon, May 06, 2019 at 04:35:14PM -0700, rcampbell@...dia.com wrote:
>>>> From: Ralph Campbell <rcampbell@...dia.com>
>>>>
>>>> The last reference to struct hmm may be released long after the mm_struct
>>>> is destroyed because the struct hmm_mirror memory may be part of a
>>>> device driver open file private data pointer. The file descriptor close
>>>> is usually after the mm_struct is destroyed in do_exit(). This is a good
>>>> reason for making struct hmm a kref_t object [1] since its lifetime spans
>>>> the life time of mm_struct and struct hmm_mirror.
>>>
>>>> The fix is to not use hmm->mm in hmm_free() and to clear mm->hmm and
>>>> hmm->mm pointers in hmm_destroy() when the mm_struct is
>>>> destroyed.
>>>
>>> I think the right way to fix this is to have the struct hmm hold a
>>> mmgrab() on the mm so its memory cannot go away until all of the hmm
>>> users release the struct hmm, hmm_ranges/etc
>>>
>>> Then we can properly use mmget_not_zero() instead of the racy/abnormal
>>> 'if (hmm->xmm == NULL || hmm->dead)' pattern (see the other
>>> thread). Actually looking at this, all these tests look very
>>> questionable. If we hold the mmget() for the duration of the range
>>> object, as Jerome suggested, then they all get deleted.
>>>
>>> That just leaves mmu_notifier_unregister_no_relase() as the remaining
>>> user of hmm->mm (everyone else is trying to do range->mm) - and it
>>> looks like it currently tries to call
>>> mmu_notifier_unregister_no_release on a NULL hmm->mm and crashes :(
>>>
>>> Holding the mmgrab fixes this as we can safely call
>>> mmu_notifier_unregister_no_relase() post exit_mmap on a grab'd mm.
>>>
>>> Also we can delete the hmm_mm_destroy() intrustion into fork.c as it
>>> can't be called when the mmgrab is active.
>>>
>>> This is the basic pattern we used in ODP when working with mmu
>>> notifiers, I don't know why hmm would need to be different.

+1 for the mmgrab() approach. I have never been able to see how these
various checks can protect anything, and refcounting it into place definitely
sounds like the right answer.


thanks,
-- 
John Hubbard
NVIDIA

Powered by blists - more mailing lists