linux-kernel - RE: [PATCH 1/1] mm: memory-failure: Re-split hw-poisoned huge page on -EAGAIN

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CY8PR11MB7134A31039FA79E85300DA2A8996A@CY8PR11MB7134.namprd11.prod.outlook.com>
Date: Wed, 20 Dec 2023 08:56:45 +0000
From: "Zhuo, Qiuxu" <qiuxu.zhuo@...el.com>
To: Miaohe Lin <linmiaohe@...wei.com>
CC: "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>, "Luck, Tony"
	<tony.luck@...el.com>, "Huang, Ying" <ying.huang@...el.com>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, HORIGUCHI NAOYA <naoya.horiguchi@....com>,
	"Yin, Fengwei" <fengwei.yin@...el.com>
Subject: RE: [PATCH 1/1] mm: memory-failure: Re-split hw-poisoned huge page on
 -EAGAIN

Hi Miaohe,

Thanks for the review.
Please see the comments below.

> From: Miaohe Lin <linmiaohe@...wei.com>
> ...
> > +
> > +static void split_thp_work_fn(struct work_struct *work) {
> > +	struct split_thp_req *req = container_of(work, typeof(*req),
> work.work);
> > +	int ret;
> > +
> > +	/* Split the thp. */
> > +	get_page(req->thp);
> 
> Can req->thp be freed when split_thp_work_fn is scheduled ?

It's possible. Thanks for catching this.

Instead of making a new work to re-split the thp, 
I'll leverage the existing memory_failure_queue() to resplit the thp in the v2.

> 
> > +	lock_page(req->thp);
> > +	ret = split_huge_page(req->thp);
> > +	unlock_page(req->thp);
> > +	put_page(req->thp);
> > +
> > +	/* Retry with an exponential backoff. */
> > +	if (ret && ++req->retries < SPLIT_THP_MAX_RETRY_CNT) {
> > +		schedule_delayed_work(to_delayed_work(work),
> > +
> msecs_to_jiffies(SPLIT_THP_INIT_DELAYED_MS << req->retries));
> > +		return;
> > +	}
> > +
> > +	pr_err("%#lx: split unsplit thp %ssuccessfully.\n", page_to_pfn(req-
> >thp), ret ? "un" : "");
> > +	kfree(req);
> > +	split_thp_pending = false;
> 
> split_thp_pending is not protected against split_thp_delayed? Though this
> race should be benign.

Thanks for being concerned about this.

As the Read-Check-Modify of "split_thp_pending" is protected by the
mutex " &mf_mutex", and the worker only modified it to false (no read on it). 
In theory, there is no race here. 

Will leverage the existing memory_failure_queue() in v2. There should be no
such concern about this race. 😊

-Qiuxu