[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241213072651.1475826-1-lizhi.xu@windriver.com>
Date: Fri, 13 Dec 2024 15:26:51 +0800
From: Lizhi Xu <lizhi.xu@...driver.com>
To: <dhowells@...hat.com>
CC: <asmadeus@...ewreck.org>, <brauner@...nel.org>, <ericvh@...nel.org>,
<linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>,
<linux_oss@...debyte.com>, <lizhi.xu@...driver.com>,
<lucho@...kov.net>,
<syzbot+1fc6f64c40a9d143cfb6@...kaller.appspotmail.com>,
<syzkaller-bugs@...glegroups.com>, <v9fs@...ts.linux.dev>
Subject: Re: [PATCH] netfs: If didn't read new data then abandon retry
On Mon, 09 Dec 2024 15:53:04 +0000, David Howells wrote:
> David
> ---
> commit d0906b4a4611709c02de610d3c34d6172aa28aaf
> Author: David Howells <dhowells@...hat.com>
> Date: Fri Nov 8 11:40:20 2024 +0800
>
> netfs: Work around recursion by abandoning retry if nothing read
>
> syzkaller reported recursion with a loop of three calls (netfs_rreq_assess,
> netfs_retry_reads and netfs_rreq_terminated) hitting the limit of the stack
> during an unbuffered or direct I/O read.
>
> There are a number of issues:
>
> (1) There is no limit on the number of retries.
>
> (2) A subrequest is supposed to be abandoned if it does not transfer
> anything (NETFS_SREQ_NO_PROGRESS), but that isn't checked under all
> circumstances.
>
> (3) The actual root cause, which is this:
>
> if (atomic_dec_and_test(&rreq->nr_outstanding))
> netfs_rreq_terminated(rreq, ...);
>
> When we do a retry, we bump the rreq->nr_outstanding counter to
> prevent the final cleanup phase running before we've finished
> dispatching the retries. The problem is if we hit 0, we have to do
> the cleanup phase - but we're in the cleanup phase and end up
> repeating the retry cycle, hence the recursion.
>
> Work around the problem by limiting the number of retries. This is based
> on Lizhi Xu's patch[1], and makes the following changes:
>
> (1) Replace NETFS_SREQ_NO_PROGRESS with NETFS_SREQ_MADE_PROGRESS and make
> the filesystem set it if it managed to read or write at least one byte
> of data. Clear this bit before issuing a subrequest.
Will there be conflicts when reading and writing use the same flag to mark?
>
> (2) Add a ->retry_count member to the subrequest and increment it any time
> we do a retry.
>
> (3) Remove the NETFS_SREQ_RETRYING flag as it is superfluous with
> ->retry_count. If the latter is non-zero, we're doing a retry.
>
> (4) Abandon a subrequest if retry_count is non-zero and we made no
> progress.
>
> (5) Use ->retry_count in both the write-side and the read-size.
BR,
Lizhi
Powered by blists - more mailing lists