linux-kernel - Re: [PATCH v3 3/7] mm/lru: replace pgdat lru

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20191116043806.GD20752@bombadil.infradead.org>
Date:   Fri, 15 Nov 2019 20:38:06 -0800
From:   Matthew Wilcox <willy@...radead.org>
To:     Alex Shi <alex.shi@...ux.alibaba.com>
Cc:     cgroups@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, akpm@...ux-foundation.org,
        mgorman@...hsingularity.net, tj@...nel.org, hughd@...gle.com,
        khlebnikov@...dex-team.ru, daniel.m.jordan@...cle.com,
        yang.shi@...ux.alibaba.com, Johannes Weiner <hannes@...xchg.org>,
        Michal Hocko <mhocko@...nel.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Roman Gushchin <guro@...com>,
        Shakeel Butt <shakeelb@...gle.com>,
        Chris Down <chris@...isdown.name>,
        Thomas Gleixner <tglx@...utronix.de>,
        Vlastimil Babka <vbabka@...e.cz>, Qian Cai <cai@....pw>,
        Andrey Ryabinin <aryabinin@...tuozzo.com>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Jérôme Glisse <jglisse@...hat.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        David Rientjes <rientjes@...gle.com>,
        "Aneesh Kumar K.V" <aneesh.kumar@...ux.ibm.com>,
        swkhack <swkhack@...il.com>,
        "Potyra, Stefan" <Stefan.Potyra@...ktrobit.com>,
        Mike Rapoport <rppt@...ux.vnet.ibm.com>,
        Stephen Rothwell <sfr@...b.auug.org.au>,
        Colin Ian King <colin.king@...onical.com>,
        Jason Gunthorpe <jgg@...pe.ca>,
        Mauro Carvalho Chehab <mchehab+samsung@...nel.org>,
        Peng Fan <peng.fan@....com>,
        Nikolay Borisov <nborisov@...e.com>,
        Ira Weiny <ira.weiny@...el.com>,
        Kirill Tkhai <ktkhai@...tuozzo.com>,
        Yafang Shao <laoar.shao@...il.com>
Subject: Re: [PATCH v3 3/7] mm/lru: replace pgdat lru_lock with lruvec lock

On Sat, Nov 16, 2019 at 11:15:02AM +0800, Alex Shi wrote:
> This is the main patch to replace per node lru_lock with per memcg
> lruvec lock. It also fold the irqsave flags into lruvec.

I have to say, I don't love the part where we fold the irqsave flags
into the lruvec.  I know it saves us an argument, but it opens up the
possibility of mismatched expectations.  eg we currently have:

static void __split_huge_page(struct page *page, struct list_head *list,
			struct lruvec *lruvec, pgoff_t end)
{
...
	spin_unlock_irqrestore(&lruvec->lru_lock, lruvec->irqflags);

so if we introduce a new caller, we have to be certain that this caller
is also using lock_page_lruvec_irqsave() and not lock_page_lruvec_irq().
I can't think of a way to make the compiler enforce that, and if we don't,
then we can get some odd crashes with interrupts being unexpectedly
enabled or disabled, depending on how ->irqflags was used last.

So it makes the code more subtle.  And that's not a good thing.

> +static inline struct lruvec *lock_page_lruvec_irq(struct page *page,
> +						struct pglist_data *pgdat)
> +{
> +	struct lruvec *lruvec = mem_cgroup_page_lruvec(page, pgdat);
> +
> +	spin_lock_irq(&lruvec->lru_lock);
> +
> +	return lruvec;
> +}

...

> +static struct lruvec *lock_page_lru(struct page *page, int *isolated)
>  {
>  	pg_data_t *pgdat = page_pgdat(page);
> +	struct lruvec *lruvec = lock_page_lruvec_irq(page, pgdat);
>  
> -	spin_lock_irq(&pgdat->lru_lock);
>  	if (PageLRU(page)) {
> -		struct lruvec *lruvec;
>  
> -		lruvec = mem_cgroup_page_lruvec(page, pgdat);
>  		ClearPageLRU(page);
>  		del_page_from_lru_list(page, lruvec, page_lru(page));
>  		*isolated = 1;
>  	} else
>  		*isolated = 0;
> +
> +	return lruvec;
>  }

But what if the page is !PageLRU?  What lruvec did we just lock?
According to the comments on mem_cgroup_page_lruvec(),

 * This function is only safe when following the LRU page isolation
 * and putback protocol: the LRU lock must be held, and the page must
 * either be PageLRU() or the caller must have isolated/allocated it.

and now it's being called in order to find out which LRU lock to take.
So this comment needs to be updated, if it's wrong, or this patch has
a race.