[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <ea141f88-a0d3-df97-5141-706fa18b7ad3@linux.ibm.com>
Date: Wed, 24 Apr 2019 09:57:05 +0200
From: Laurent Dufour <ldufour@...ux.ibm.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: akpm@...ux-foundation.org, mhocko@...nel.org, kirill@...temov.name,
ak@...ux.intel.com, dave@...olabs.net, jack@...e.cz,
Matthew Wilcox <willy@...radead.org>,
aneesh.kumar@...ux.ibm.com, benh@...nel.crashing.org,
mpe@...erman.id.au, paulus@...ba.org,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, hpa@...or.com,
Will Deacon <will.deacon@....com>,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
sergey.senozhatsky.work@...il.com,
Andrea Arcangeli <aarcange@...hat.com>,
Alexei Starovoitov <alexei.starovoitov@...il.com>,
kemi.wang@...el.com, Daniel Jordan <daniel.m.jordan@...cle.com>,
David Rientjes <rientjes@...gle.com>,
Jerome Glisse <jglisse@...hat.com>,
Ganesh Mahendran <opensource.ganesh@...il.com>,
Minchan Kim <minchan@...nel.org>,
Punit Agrawal <punitagrawal@...il.com>,
vinayak menon <vinayakm.list@...il.com>,
Yang Shi <yang.shi@...ux.alibaba.com>,
zhong jiang <zhongjiang@...wei.com>,
Haiyan Song <haiyanx.song@...el.com>,
Balbir Singh <bsingharora@...il.com>, sj38.park@...il.com,
Michel Lespinasse <walken@...gle.com>,
Mike Rapoport <rppt@...ux.ibm.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
haren@...ux.vnet.ibm.com, npiggin@...il.com,
paulmck@...ux.vnet.ibm.com, Tim Chen <tim.c.chen@...ux.intel.com>,
linuxppc-dev@...ts.ozlabs.org, x86@...nel.org
Subject: Re: [PATCH v12 21/31] mm: Introduce find_vma_rcu()
Le 23/04/2019 à 11:27, Peter Zijlstra a écrit :
> On Tue, Apr 16, 2019 at 03:45:12PM +0200, Laurent Dufour wrote:
>> This allows to search for a VMA structure without holding the mmap_sem.
>>
>> The search is repeated while the mm seqlock is changing and until we found
>> a valid VMA.
>>
>> While under the RCU protection, a reference is taken on the VMA, so the
>> caller must call put_vma() once it not more need the VMA structure.
>>
>> At the time a VMA is inserted in the MM RB tree, in vma_rb_insert(), a
>> reference is taken to the VMA by calling get_vma().
>>
>> When removing a VMA from the MM RB tree, the VMA is not release immediately
>> but at the end of the RCU grace period through vm_rcu_put(). This ensures
>> that the VMA remains allocated until the end the RCU grace period.
>>
>> Since the vm_file pointer, if valid, is released in put_vma(), there is no
>> guarantee that the file pointer will be valid on the returned VMA.
>
> What I'm missing here, and in the previous patch introducing the
> refcount (also see refcount_t), is _why_ we need the refcount thing at
> all.
The need for the VMA's refcount is to ensure that the VMA will remain
until the end of the SPF handler. This is a consequence of the use of
RCU instead of SRCU to protect the RB tree.
I was not aware of the refcount_t type, it would be better here to avoid
wrapping.
> My original plan was to use SRCU, which at the time was not complete
> enough so I abused/hacked preemptible RCU, but that is no longer the
> case, SRCU has all the required bits and pieces.
When I did test using SRCU it was involving a lot a scheduling to run
the SRCU callback mechanism. In some workload the impact on the
perfomance was significant [1].
I can't see this overhead using RCU.
>
> Also; the initial motivation was prefaulting large VMAs and the
> contention on mmap was killing things; but similarly, the contention on
> the refcount (I did try that) killed things just the same.
Doing prefaulting should be doable, I'll try to think further about that.
Regarding the refcount, I should I missed something, this is an atomic
counter, so there should not be contention on it but cache exclusivity,
not ideal I agree but I can't see what else to use here.
> So I'm really sad to see the refcount return; and without any apparent
> justification.
I'm not opposed to use another mechanism here, but SRCU didn't show good
performance with some workload, and I can't see how to use RCU without a
reference counter here. So please, advise.
Thanks,
Laurent.
[1]
https://lore.kernel.org/linux-mm/7ca80231-fe02-a3a7-84bc-ce81690ea051@intel.com/
Powered by blists - more mailing lists