[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <ce7a039a-2697-f16e-b0b3-f6ae41391682@linux.vnet.ibm.com>
Date: Thu, 6 Jul 2017 17:29:26 +0200
From: Laurent Dufour <ldufour@...ux.vnet.ibm.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: paulmck@...ux.vnet.ibm.com, akpm@...ux-foundation.org,
kirill@...temov.name, ak@...ux.intel.com, mhocko@...nel.org,
dave@...olabs.net, jack@...e.cz,
Matthew Wilcox <willy@...radead.org>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
haren@...ux.vnet.ibm.com, khandual@...ux.vnet.ibm.com,
npiggin@...il.com, bsingharora@...il.com,
Tim Chen <tim.c.chen@...ux.intel.com>
Subject: Re: [RFC v5 09/11] mm: Try spin lock in speculative path
On 06/07/2017 16:48, Peter Zijlstra wrote:
> On Thu, Jul 06, 2017 at 03:46:59PM +0200, Laurent Dufour wrote:
>> On 05/07/2017 20:50, Peter Zijlstra wrote:
>>> On Fri, Jun 16, 2017 at 07:52:33PM +0200, Laurent Dufour wrote:
>>>> @@ -2294,8 +2295,19 @@ static bool pte_map_lock(struct vm_fault *vmf)
>>>> if (vma_has_changed(vmf->vma, vmf->sequence))
>>>> goto out;
>>>>
>>>> - pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd,
>>>> - vmf->address, &ptl);
>
>>>> + ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
>>>> + pte = pte_offset_map(vmf->pmd, vmf->address);
>>>> + if (unlikely(!spin_trylock(ptl))) {
>>>> + pte_unmap(pte);
>>>> + goto out;
>>>> + }
>>>> +
>>>> if (vma_has_changed(vmf->vma, vmf->sequence)) {
>>>> pte_unmap_unlock(pte, ptl);
>>>> goto out;
>>>
>>> Right, so if you look at my earlier patches you'll see I did something
>>> quite disgusting here.
>>>
>>> Not sure that wants repeating, but I cannot remember why I thought this
>>> deadlock didn't exist anymore.
>>
>> Regarding the deadlock I did face it on my Power victim node, so I guess it
>> is still there, and the stack traces are quiet explicit.
>> Am I missing something here ?
>
> No, you are right in that the deadlock is quite real. What I cannot
> remember is what made me think to remove the really 'wonderful' code I
> had to deal with it.
>
> That said, you might want to look at how often you terminate the
> speculation because of your trylock failing. If that shows up at all we
> might need to do something about it.
Based on the benchmarks I run, it doesn't fail so much often, but I was
thinking about adding some counters here. The system is accounting for
major page faults and minor ones, respectively current->maj_flt and
current->min_flt. I was wondering if an additional type like async_flt will
be welcome or if there is another smarter way to get that metric.
Feel free to advise.
Thanks
Laurent.
Powered by blists - more mailing lists