[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0590862b-0705-fb9b-be1b-ed0745ca1b76@linux.alibaba.com>
Date: Wed, 29 Jul 2020 10:27:34 +0800
From: Alex Shi <alex.shi@...ux.alibaba.com>
To: Alexander Duyck <alexander.duyck@...il.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Mel Gorman <mgorman@...hsingularity.net>,
Tejun Heo <tj@...nel.org>, Hugh Dickins <hughd@...gle.com>,
Konstantin Khlebnikov <khlebnikov@...dex-team.ru>,
Daniel Jordan <daniel.m.jordan@...cle.com>,
Yang Shi <yang.shi@...ux.alibaba.com>,
Matthew Wilcox <willy@...radead.org>,
Johannes Weiner <hannes@...xchg.org>,
kbuild test robot <lkp@...el.com>,
linux-mm <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>, cgroups@...r.kernel.org,
Shakeel Butt <shakeelb@...gle.com>,
Joonsoo Kim <iamjoonsoo.kim@....com>,
Wei Yang <richard.weiyang@...il.com>,
"Kirill A. Shutemov" <kirill@...temov.name>,
Rong Chen <rong.a.chen@...el.com>,
Michal Hocko <mhocko@...nel.org>,
Vladimir Davydov <vdavydov.dev@...il.com>
Subject: Re: [PATCH v17 17/21] mm/lru: replace pgdat lru_lock with lruvec lock
在 2020/7/29 上午9:27, Alexander Duyck 写道:
> On Tue, Jul 28, 2020 at 6:00 PM Alex Shi <alex.shi@...ux.alibaba.com> wrote:
>>
>>
>>
>> 在 2020/7/28 下午10:54, Alexander Duyck 写道:
>>> On Tue, Jul 28, 2020 at 4:20 AM Alex Shi <alex.shi@...ux.alibaba.com> wrote:
>>>>
>>>>
>>>>
>>>> 在 2020/7/28 上午7:34, Alexander Duyck 写道:
>>>>>> @@ -1876,6 +1876,12 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec,
>>>>>> * list_add(&page->lru,)
>>>>>> * list_add(&page->lru,) //corrupt
>>>>>> */
>>>>>> + new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
>>>>>> + if (new_lruvec != lruvec) {
>>>>>> + if (lruvec)
>>>>>> + spin_unlock_irq(&lruvec->lru_lock);
>>>>>> + lruvec = lock_page_lruvec_irq(page);
>>>>>> + }
>>>>>> SetPageLRU(page);
>>>>>>
>>>>>> if (unlikely(put_page_testzero(page))) {
>>>>> I was going through the code of the entire patch set and I noticed
>>>>> these changes in move_pages_to_lru. What is the reason for adding the
>>>>> new_lruvec logic? My understanding is that we are moving the pages to
>>>>> the lruvec provided are we not?If so why do we need to add code to get
>>>>> a new lruvec? The code itself seems to stand out from the rest of the
>>>>> patch as it is introducing new code instead of replacing existing
>>>>> locking code, and it doesn't match up with the description of what
>>>>> this function is supposed to do since it changes the lruvec.
>>>>
>>>> this new_lruvec is the replacement of removed line, as following code:
>>>>>> - lruvec = mem_cgroup_page_lruvec(page, pgdat);
>>>> This recheck is for the page move the root memcg, otherwise it cause the bug:
>>>
>>> Okay, now I see where the issue is. You moved this code so now it has
>>> a different effect than it did before. You are relocking things before
>>> you needed to. Don't forget that when you came into this function you
>>> already had the lock. In addition the patch is broken as it currently
>>> stands as you aren't using similar logic in the code just above this
>>> addition if you encounter an evictable page. As a result this is
>>> really difficult to review as there are subtle bugs here.
>>
>> Why you think its a bug? the relock only happens if locked lruvec is different.
>> and unlock the old one.
>
> The section I am talking about with the bug is this section here:
> while (!list_empty(list)) {
> + struct lruvec *new_lruvec = NULL;
> +
> page = lru_to_page(list);
> VM_BUG_ON_PAGE(PageLRU(page), page);
> list_del(&page->lru);
> if (unlikely(!page_evictable(page))) {
> - spin_unlock_irq(&pgdat->lru_lock);
> + spin_unlock_irq(&lruvec->lru_lock);
> putback_lru_page(page);
> - spin_lock_irq(&pgdat->lru_lock);
> + spin_lock_irq(&lruvec->lru_lock);
It would be still fine. The lruvec->lru_lock will be checked again before
we take and use it.
And this lock will optimized in patch 19th which did by Hugh Dickins.
> continue;
> }
>
> Basically it probably is not advisable to be retaking the
> lruvec->lru_lock directly as the lruvec may have changed so it
> wouldn't be correct for the next page. It would make more sense to be
> using your API and calling unlock_page_lruvec_irq and
> lock_page_lruvec_irq instead of using the lock directly.
>
>>>
>>> I suppose the correct fix is to get rid of this line, but it should
>>> be placed everywhere the original function was calling
>>> spin_lock_irq().
>>>
>>> In addition I would consider changing the arguments/documentation for
>>> move_pages_to_lru. You aren't moving the pages to lruvec, so there is
>>> probably no need to pass that as an argument. Instead I would pass
>>> pgdat since that isn't going to be moving and is the only thing you
>>> actually derive based on the original lruvec.
>>
>> yes, The comments should be changed with the line was introduced from long ago. :)
>> Anyway, I am wondering if it worth a v18 version resend?
>
> So I have been looking over the function itself and I wonder if it
> isn't worth looking at rewriting this to optimize the locking behavior
> to minimize the number of times we have to take the LRU lock. I have
> some code I am working on that I plan to submit as an RFC in the next
> day or so after I can get it smoke tested. The basic idea would be to
> defer returning the evictiable pages or freeing the compound pages
> until after we have processed the pages that can be moved while still
> holding the lock. I would think it should reduce the lock contention
> significantly while improving the throughput.
>
I had tried once, but the freeing page cross onto release_pages which hard to deal with.
I am very glad to wait your patch, and hope it could be resovled. :)
Thanks
Alex
Powered by blists - more mailing lists