[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <59d4ea9e-3f6b-11c2-75d1-5baecd5b4ae2@nvidia.com>
Date: Tue, 17 Dec 2019 13:50:24 -0800
From: Ralph Campbell <rcampbell@...dia.com>
To: Jason Gunthorpe <jgg@...lanox.com>
CC: "linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>,
Jerome Glisse <jglisse@...hat.com>,
"John Hubbard" <jhubbard@...dia.com>,
Christoph Hellwig <hch@....de>,
Andrew Morton <akpm@...ux-foundation.org>,
Shuah Khan <shuah@...nel.org>
Subject: Re: [PATCH v5 1/2] mm/mmu_notifier: make interval notifier updates
safe
On 12/17/19 12:51 PM, Jason Gunthorpe wrote:
> On Mon, Dec 16, 2019 at 11:57:32AM -0800, Ralph Campbell wrote:
>> mmu_interval_notifier_insert() and mmu_interval_notifier_remove() can't
>> be called safely from inside the invalidate() callback. This is fine for
>> devices with explicit memory region register and unregister calls but it
>> is desirable from a programming model standpoint to not require explicit
>> memory region registration. Regions can be registered based on device
>> address faults but without a mechanism for updating or removing the mmu
>> interval notifiers in response to munmap(), the invalidation callbacks
>> will be for regions that are stale or apply to different mmaped regions.
>
> What we do in RDMA is drive the removal from a work queue, as we need
> a synchronize_srcu anyhow to serialize everything to do with
> destroying a part of the address space mirror.
>
> Is it really necessary to have all this stuff just to save doing
> something like a work queue?
Well, the invalidates already have to use the driver lock to synchronize
so handling the range tracking updates semi-synchronously seems more
straightforward to me.
Do you feel strongly that adding a work queue is the right way to handle
this?
> Also, I think we are not taking core kernel APIs like this with out an
> in-kernel user??
Right. I was looking for feedback before updating nouveau to use it.
>> diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
>> index 9e6caa8ecd19..55fbefcdc564 100644
>> +++ b/include/linux/mmu_notifier.h
>> @@ -233,11 +233,18 @@ struct mmu_notifier {
>> * @invalidate: Upon return the caller must stop using any SPTEs within this
>> * range. This function can sleep. Return false only if sleeping
>> * was required but mmu_notifier_range_blockable(range) is false.
>> + * @release: This function will be called when the mmu_interval_notifier
>> + * is removed from the interval tree. Defining this function also
>> + * allows mmu_interval_notifier_remove() and
>> + * mmu_interval_notifier_update() to be called from the
>> + * invalidate() callback function (i.e., they won't block waiting
>> + * for invalidations to finish.
>
> Having a function called remove that doesn't block seems like very
> poor choice of language, we've tended to use put to describe that
> operation.
>
> The difference is meaningful as people often create use after free
> bugs in drivers when presented with interfaces named 'remove' or
> 'destroy' that don't actually guarentee there is not going to be
> continued accesses to the memory.
OK. I can rename it put().
>> */
>> struct mmu_interval_notifier_ops {
>> bool (*invalidate)(struct mmu_interval_notifier *mni,
>> const struct mmu_notifier_range *range,
>> unsigned long cur_seq);
>> + void (*release)(struct mmu_interval_notifier *mni);
>> };
>>
>> struct mmu_interval_notifier {
>> @@ -246,6 +253,8 @@ struct mmu_interval_notifier {
>> struct mm_struct *mm;
>> struct hlist_node deferred_item;
>> unsigned long invalidate_seq;
>> + unsigned long deferred_start;
>> + unsigned long deferred_last;
>
> I couldn't quite understand how something like this can work, what is
> preventing parallel updates?
It is serialized by the struct mmu_notifier_mm lock.
If there are no tasks walking the interval tree, the update
happens synchronously under the lock. If there are walkers,
the start/last values are stored under the lock and the last caller's
values are used to update the interval tree when the last walker
finishes (under the lock again).
>> +/**
>> + * mmu_interval_notifier_update - Update interval notifier end
>> + * @mni: Interval notifier to update
>> + * @start: New starting virtual address to monitor
>> + * @length: New length of the range to monitor
>> + *
>> + * This function updates the range being monitored.
>> + * If there is no release() function defined, the call will wait for the
>> + * update to finish before returning.
>> + */
>> +int mmu_interval_notifier_update(struct mmu_interval_notifier *mni,
>> + unsigned long start, unsigned long length)
>> +{
>
> Update should probably be its own patch
>
> Jason
OK.
Thanks for the review.
Powered by blists - more mailing lists