[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <eb099842-ed15-2d86-2a65-67f1f41999fb@linux.alibaba.com>
Date: Tue, 3 Nov 2020 12:58:29 +0800
From: Alex Shi <alex.shi@...ux.alibaba.com>
To: Johannes Weiner <hannes@...xchg.org>
Cc: akpm@...ux-foundation.org, mgorman@...hsingularity.net,
tj@...nel.org, hughd@...gle.com, khlebnikov@...dex-team.ru,
daniel.m.jordan@...cle.com, willy@...radead.org, lkp@...el.com,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
cgroups@...r.kernel.org, shakeelb@...gle.com,
iamjoonsoo.kim@....com, richard.weiyang@...il.com,
kirill@...temov.name, alexander.duyck@...il.com,
rong.a.chen@...el.com, mhocko@...e.com, vdavydov.dev@...il.com,
shy828301@...il.com, Michal Hocko <mhocko@...nel.org>,
Yang Shi <yang.shi@...ux.alibaba.com>
Subject: Re: [PATCH v20 18/20] mm/lru: replace pgdat lru_lock with lruvec lock
在 2020/11/3 上午4:41, Johannes Weiner 写道:
> On Fri, Oct 30, 2020 at 10:49:41AM +0800, Alex Shi wrote:
>>
>>
>> @@ -1367,6 +1380,51 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgd
>> return lruvec;
>> }
>>
>> +struct lruvec *lock_page_lruvec(struct page *page)
>> +{
>> + struct lruvec *lruvec;
>> + struct pglist_data *pgdat = page_pgdat(page);
>> +
>> + rcu_read_lock();
>> + lruvec = mem_cgroup_page_lruvec(page, pgdat);
>> + spin_lock(&lruvec->lru_lock);
>> + rcu_read_unlock();
>> +
>> + lruvec_memcg_debug(lruvec, page);
>> +
>> + return lruvec;
>> +}
>> +
>> +struct lruvec *lock_page_lruvec_irq(struct page *page)
>> +{
>> + struct lruvec *lruvec;
>> + struct pglist_data *pgdat = page_pgdat(page);
>> +
>> + rcu_read_lock();
>> + lruvec = mem_cgroup_page_lruvec(page, pgdat);
>> + spin_lock_irq(&lruvec->lru_lock);
>> + rcu_read_unlock();
>> +
>> + lruvec_memcg_debug(lruvec, page);
>> +
>> + return lruvec;
>> +}
>> +
>> +struct lruvec *lock_page_lruvec_irqsave(struct page *page, unsigned long *flags)
>> +{
>> + struct lruvec *lruvec;
>> + struct pglist_data *pgdat = page_pgdat(page);
>> +
>> + rcu_read_lock();
>> + lruvec = mem_cgroup_page_lruvec(page, pgdat);
>> + spin_lock_irqsave(&lruvec->lru_lock, *flags);
>> + rcu_read_unlock();
>> +
>> + lruvec_memcg_debug(lruvec, page);
>> +
>> + return lruvec;
>> +}
>
> As these are used quite widely throughout the VM code now, it would be
> good to give them kerneldoc comments that explain the interface.
>
> In particular, I think it's necessary to explain the contexts from
> which this is safe to use (in particular wrt pages moving between
> memcgs - see the comment in commit_charge()).
>
> I'm going to go through the callsites that follow and try to identify
> what makes them safe. It's mostly an exercise to double check our
> thinking here.
>
> Most of them are straight-forward, and I don't think they warrant
> individual comments. But some do, IMO. And it appears at least one
> actually isn't safe yet:
Thanks a lot reminder. is the following comments fine?
/**
* lock_page_lruvec - return lruvec for the locked page.
* @page: the page
*
* This series functions should be used in either conditions:
* PageLRU is cleared or unset
* or, page->_refcount is zero
*/
struct lruvec *lock_page_lruvec(struct page *page)
{
....
>> @@ -274,9 +270,8 @@ void lru_note_cost(struct lruvec *lruvec, bool file, unsigned int nr_pages)
>> {
>> do {
>> unsigned long lrusize;
>> - struct pglist_data *pgdat = lruvec_pgdat(lruvec);
>>
>> - spin_lock_irq(&pgdat->lru_lock);
>> + spin_lock_irq(&lruvec->lru_lock);
>> /* Record cost event */
>> if (file)
>> lruvec->file_cost += nr_pages;
>> @@ -300,7 +295,7 @@ void lru_note_cost(struct lruvec *lruvec, bool file, unsigned int nr_pages)
>> lruvec->file_cost /= 2;
>> lruvec->anon_cost /= 2;
>> }
>> - spin_unlock_irq(&pgdat->lru_lock);
>> + spin_unlock_irq(&lruvec->lru_lock);
>> } while ((lruvec = parent_lruvec(lruvec)));
>> }
>
> This is safe because it either comes from
>
> 1) the pinned lruvec in reclaim, or
>
> 2) from a pre-LRU page during refault (which also holds the
> rcu lock, so would be safe even if the page was on the LRU
> and could move simultaneously to a new lruvec).
>
> The second one seems a bit tricky. It could be good to add a comment
> to lru_note_cost_page() that explains why its mem_cgroup_page_lruvec()
> is safe.
Thanks for pointed, is the following comments fine?
diff --git a/mm/swap.c b/mm/swap.c
index 9fe5ff9a8111..55ccc93ffb49 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -264,6 +264,13 @@ void lru_note_cost(struct lruvec *lruvec, bool file, unsigned int nr_pages)
do {
unsigned long lrusize;
+ /*
+ * Hold lruvec->lru_lock is safe here, since
+ * 1) The pinned lruvec in reclaim, or
+ * 2) From a pre-LRU page during refault (which also holds the
+ * rcu lock, so would be safe even if the page was on the LRU
+ * and could move simultaneously to a new lruvec).
+ */
spin_lock_irq(&lruvec->lru_lock);
/* Record cost event */
>
>> @@ -364,13 +359,13 @@ static inline void activate_page_drain(int cpu)
>>
>> static void activate_page(struct page *page)
>> {
>> - pg_data_t *pgdat = page_pgdat(page);
>> + struct lruvec *lruvec;
>>
>> page = compound_head(page);
>> - spin_lock_irq(&pgdat->lru_lock);
>> + lruvec = lock_page_lruvec_irq(page);
>
> I don't see what makes this safe. There is nothing that appears to
> lock out a concurrent page move between memcgs/lruvecs, which means
> the following could manipulate an unlocked lru list:
>
This funtion is for !CONFIG_SMP, could the cpu be preempt with RT kernel?
>> if (PageLRU(page))
>> - __activate_page(page, mem_cgroup_page_lruvec(page, pgdat));
>> - spin_unlock_irq(&pgdat->lru_lock);
>> + __activate_page(page, lruvec);
>> + unlock_page_lruvec_irq(lruvec);
>> }
>> #endif
>
> Shouldn't this be something like this?
>
> if (TestClearPageLRU()) {
> lruvec = lock_page_lruvec_irq(page);
> __activate_page(page, lruvec);
> unlock_page_lruvec_irq(lruvec);
> SetPageLRU(page);
> }
But anyway, your new changes are more beautiful and logcially. I will change
to this.
Thanks a lot!
Alex
Powered by blists - more mailing lists