[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2938d2da-424d-786e-5486-1e4fa9f58425@nvidia.com>
Date: Wed, 22 May 2019 17:54:17 -0700
From: Ralph Campbell <rcampbell@...dia.com>
To: Jason Gunthorpe <jgg@...pe.ca>
CC: <linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>,
John Hubbard <jhubbard@...dia.com>,
Ira Weiny <ira.weiny@...el.com>,
Dan Williams <dan.j.williams@...el.com>,
Arnd Bergmann <arnd@...db.de>,
Balbir Singh <bsingharora@...il.com>,
Dan Carpenter <dan.carpenter@...cle.com>,
Matthew Wilcox <willy@...radead.org>,
Souptick Joarder <jrdr.linux@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH 5/5] mm/hmm: Fix mm stale reference use in hmm_free()
On 5/22/19 4:36 PM, Jason Gunthorpe wrote:
> On Mon, May 06, 2019 at 04:35:14PM -0700, rcampbell@...dia.com wrote:
>> From: Ralph Campbell <rcampbell@...dia.com>
>>
>> The last reference to struct hmm may be released long after the mm_struct
>> is destroyed because the struct hmm_mirror memory may be part of a
>> device driver open file private data pointer. The file descriptor close
>> is usually after the mm_struct is destroyed in do_exit(). This is a good
>> reason for making struct hmm a kref_t object [1] since its lifetime spans
>> the life time of mm_struct and struct hmm_mirror.
>
>> The fix is to not use hmm->mm in hmm_free() and to clear mm->hmm and
>> hmm->mm pointers in hmm_destroy() when the mm_struct is
>> destroyed.
>
> I think the right way to fix this is to have the struct hmm hold a
> mmgrab() on the mm so its memory cannot go away until all of the hmm
> users release the struct hmm, hmm_ranges/etc
>
> Then we can properly use mmget_not_zero() instead of the racy/abnormal
> 'if (hmm->xmm == NULL || hmm->dead)' pattern (see the other
> thread). Actually looking at this, all these tests look very
> questionable. If we hold the mmget() for the duration of the range
> object, as Jerome suggested, then they all get deleted.
>
> That just leaves mmu_notifier_unregister_no_relase() as the remaining
> user of hmm->mm (everyone else is trying to do range->mm) - and it
> looks like it currently tries to call
> mmu_notifier_unregister_no_release on a NULL hmm->mm and crashes :(
>
> Holding the mmgrab fixes this as we can safely call
> mmu_notifier_unregister_no_relase() post exit_mmap on a grab'd mm.
>
> Also we can delete the hmm_mm_destroy() intrustion into fork.c as it
> can't be called when the mmgrab is active.
>
> This is the basic pattern we used in ODP when working with mmu
> notifiers, I don't know why hmm would need to be different.
>
>> index 2aa75dbed04a..4e42c282d334 100644
>> +++ b/mm/hmm.c
>> @@ -43,8 +43,10 @@ static inline struct hmm *mm_get_hmm(struct mm_struct *mm)
>> {
>> struct hmm *hmm = READ_ONCE(mm->hmm);
>>
>> - if (hmm && kref_get_unless_zero(&hmm->kref))
>> + if (hmm && !hmm->dead) {
>> + kref_get(&hmm->kref);
>> return hmm;
>> + }
>
> hmm->dead and mm->hmm are not being read under lock, so this went from
> something almost thread safe to something racy :(
>
>> @@ -53,25 +55,28 @@ static inline struct hmm *mm_get_hmm(struct mm_struct *mm)
>> * hmm_get_or_create - register HMM against an mm (HMM internal)
>> *
>> * @mm: mm struct to attach to
>> - * Returns: returns an HMM object, either by referencing the existing
>> - * (per-process) object, or by creating a new one.
>> + * Return: an HMM object reference, either by referencing the existing
>> + * (per-process) object, or by creating a new one.
>> *
>> - * This is not intended to be used directly by device drivers. If mm already
>> - * has an HMM struct then it get a reference on it and returns it. Otherwise
>> - * it allocates an HMM struct, initializes it, associate it with the mm and
>> - * returns it.
>> + * If the mm already has an HMM struct then return a new reference to it.
>> + * Otherwise, allocate an HMM struct, initialize it, associate it with the mm,
>> + * and return a new reference to it. If the return value is not NULL,
>> + * the caller is responsible for calling hmm_put().
>> */
>> static struct hmm *hmm_get_or_create(struct mm_struct *mm)
>> {
>> - struct hmm *hmm = mm_get_hmm(mm);
>> - bool cleanup = false;
>> + struct hmm *hmm = mm->hmm;
>>
>> - if (hmm)
>> - return hmm;
>> + if (hmm) {
>> + if (hmm->dead)
>> + goto error;
>
> Create shouldn't fail just because it is racing with something doing
> destroy
>
> The flow should be something like:
>
> spin_lock(&mm->page_table_lock); // or write side mmap_sem if you prefer
> if (mm->hmm)
> if (kref_get_unless_zero(mm->hmm))
> return mm->hmm;
> mm->hmm = NULL
>
>
>> + goto out;
>> + }
>>
>> hmm = kmalloc(sizeof(*hmm), GFP_KERNEL);
>> if (!hmm)
>> - return NULL;
>> + goto error;
>> +
>> init_waitqueue_head(&hmm->wq);
>> INIT_LIST_HEAD(&hmm->mirrors);
>> init_rwsem(&hmm->mirrors_sem);
>> @@ -83,47 +88,32 @@ static struct hmm *hmm_get_or_create(struct mm_struct *mm)
>> hmm->dead = false;
>> hmm->mm = mm;
>>
>> - spin_lock(&mm->page_table_lock);
>> - if (!mm->hmm)
>> - mm->hmm = hmm;
>> - else
>> - cleanup = true;
>> - spin_unlock(&mm->page_table_lock);
>
> BTW, Jerome this needs fixing too, it shouldn't fail the function just
> because it lost the race.
>
> More like
>
> spin_lock(&mm->page_table_lock);
> if (mm->hmm)
> if (kref_get_unless_zero(mm->hmm)) {
> kfree(hmm);
> return mm->hmm;
> }
> mm->hmm = hmm
>
>> - if (cleanup)
>> - goto error;
>> -
>> /*
>> - * We should only get here if hold the mmap_sem in write mode ie on
>> - * registration of first mirror through hmm_mirror_register()
>> + * The mmap_sem should be held for write so no additional locking
>
> Please let us have proper lockdep assertions for this kind of stuff.
>
>> + * is needed. Note that struct_mm holds a reference to hmm.
>> + * It is cleared in hmm_release().
>> */
>> + mm->hmm = hmm;
>
> Actually using the write side the mmap_sem seems sort of same if it is
> assured the write side is always held for this call..
>
>
> Hmm, there is a race with hmm_destroy touching mm->hmm that does
> hold the write lock.
>
>> +
>> hmm->mmu_notifier.ops = &hmm_mmu_notifier_ops;
>> if (__mmu_notifier_register(&hmm->mmu_notifier, mm))
>> goto error_mm;
>
> And the error unwind here is problematic as it should do
> kref_put. Actually after my patch to use container_of this
> mmu_notifier_register should go before the mm->hmm = hmm to avoid
> having to do the sketchy error unwind at all.
>
>> +out:
>> + /* Return a separate hmm reference for the caller. */
>> + kref_get(&hmm->kref);
>> return hmm;
>>
>> error_mm:
>> - spin_lock(&mm->page_table_lock);
>> - if (mm->hmm == hmm)
>> - mm->hmm = NULL;
>> - spin_unlock(&mm->page_table_lock);
>> -error:
>> + mm->hmm = NULL;
>> kfree(hmm);
>> +error:
>> return NULL;
>> }
>>
>> static void hmm_free(struct kref *kref)
>> {
>> struct hmm *hmm = container_of(kref, struct hmm, kref);
>> - struct mm_struct *mm = hmm->mm;
>> -
>> - mmu_notifier_unregister_no_release(&hmm->mmu_notifier, mm);
>
> Where did the unregister go?
>
>> -
>> - spin_lock(&mm->page_table_lock);
>> - if (mm->hmm == hmm)
>> - mm->hmm = NULL;
>> - spin_unlock(&mm->page_table_lock);
>
> Well, we still need to NULL mm->hmm if the hmm was put before the mm
> is destroyed.
>
>> kfree(hmm);
>> }
>> @@ -135,25 +125,18 @@ static inline void hmm_put(struct hmm *hmm)
>>
>> void hmm_mm_destroy(struct mm_struct *mm)
>> {
>> - struct hmm *hmm;
>> + struct hmm *hmm = mm->hmm;
>>
>> - spin_lock(&mm->page_table_lock);
>> - hmm = mm_get_hmm(mm);
>> - mm->hmm = NULL;
>> if (hmm) {
>> + mm->hmm = NULL;
>
> At this point The kref on mm is 0, so any other thread reading mm->hmm
> has a use-after-free bug. Not much point in doing this assignment , it
> is just confusing.
>
>> hmm->mm = NULL;
>> - hmm->dead = true;
>> - spin_unlock(&mm->page_table_lock);
>> hmm_put(hmm);
>> - return;
>> }
>> -
>> - spin_unlock(&mm->page_table_lock);
>> }
>>
>> static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm)
>> {
>> - struct hmm *hmm = mm_get_hmm(mm);
>> + struct hmm *hmm = mm->hmm;
>
> container_of is much safer/better
>
>> @@ -931,20 +909,14 @@ int hmm_range_register(struct hmm_range *range,
>> return -EINVAL;
>> if (start >= end)
>> return -EINVAL;
>> + hmm = mm_get_hmm(mm);
>> + if (!hmm)
>> + return -EFAULT;
>>
>> range->page_shift = page_shift;
>> range->start = start;
>> range->end = end;
>> -
>> - range->hmm = mm_get_hmm(mm);
>> - if (!range->hmm)
>> - return -EFAULT;
>> -
>> - /* Check if hmm_mm_destroy() was call. */
>> - if (range->hmm->mm == NULL || range->hmm->dead) {
>
> This comment looks bogus too, we can't race with hmm_mm_destroy as the
> caller MUST have a mmgrab or mmget on the mm already to call this API
> - ie can't be destroyed.
>
> As discussed in the other thread this should probably be
> mmget_not_zero.
>
> Jason
I think you missed the main points which are:
1) mm->hmm holds a reference to struct hmm so hmm isn't going away until
__mmdrop() is called. hmm->mm is not a reference to mm,
just a "backward" pointer.
Trying to make struct hmm hold a *reference* to mm seems wrong to me.
2) mm->hmm is only set with mm->mmap_sem held for write.
mm->hmm is only cleared when __mmdrop() is called.
hmm->mm is only cleared when __mmdrop() is called so it is long
after the call to hmm_release().
3) The mmu notifier unregister happens only as part of exit_mmap().
The hmm->dead and hmm->mm == NULL checks are more for sanity checking
since hmm_mirror_register() shouldn't be called without holding mmap_sem.
A VM_WARN or other check makes sense like you said.
Anyway, I'll wait for Jerome to weigh in as to how to proceed.
Powered by blists - more mailing lists