linux-kernel - Re: [PATCH] netfs: If didn't read new data then abandon retry

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20241213072651.1475826-1-lizhi.xu@windriver.com>
Date: Fri, 13 Dec 2024 15:26:51 +0800
From: Lizhi Xu <lizhi.xu@...driver.com>
To: <dhowells@...hat.com>
CC: <asmadeus@...ewreck.org>, <brauner@...nel.org>, <ericvh@...nel.org>,
        <linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>,
        <linux_oss@...debyte.com>, <lizhi.xu@...driver.com>,
        <lucho@...kov.net>,
        <syzbot+1fc6f64c40a9d143cfb6@...kaller.appspotmail.com>,
        <syzkaller-bugs@...glegroups.com>, <v9fs@...ts.linux.dev>
Subject: Re: [PATCH] netfs: If didn't read new data then abandon retry

On Mon, 09 Dec 2024 15:53:04 +0000, David Howells wrote:
> David
> ---
> commit d0906b4a4611709c02de610d3c34d6172aa28aaf
> Author: David Howells <dhowells@...hat.com>
> Date:   Fri Nov 8 11:40:20 2024 +0800
> 
>     netfs: Work around recursion by abandoning retry if nothing read
>     
>     syzkaller reported recursion with a loop of three calls (netfs_rreq_assess,
>     netfs_retry_reads and netfs_rreq_terminated) hitting the limit of the stack
>     during an unbuffered or direct I/O read.
>     
>     There are a number of issues:
>     
>      (1) There is no limit on the number of retries.
>     
>      (2) A subrequest is supposed to be abandoned if it does not transfer
>          anything (NETFS_SREQ_NO_PROGRESS), but that isn't checked under all
>          circumstances.
>     
>      (3) The actual root cause, which is this:
>     
>             if (atomic_dec_and_test(&rreq->nr_outstanding))
>                     netfs_rreq_terminated(rreq, ...);
>     
>          When we do a retry, we bump the rreq->nr_outstanding counter to
>          prevent the final cleanup phase running before we've finished
>          dispatching the retries.  The problem is if we hit 0, we have to do
>          the cleanup phase - but we're in the cleanup phase and end up
>          repeating the retry cycle, hence the recursion.
>     
>     Work around the problem by limiting the number of retries.  This is based
>     on Lizhi Xu's patch[1], and makes the following changes:
>     
>      (1) Replace NETFS_SREQ_NO_PROGRESS with NETFS_SREQ_MADE_PROGRESS and make
>          the filesystem set it if it managed to read or write at least one byte
>          of data.  Clear this bit before issuing a subrequest.
Will there be conflicts when reading and writing use the same flag to mark?
>     
>      (2) Add a ->retry_count member to the subrequest and increment it any time
>          we do a retry.
>     
>      (3) Remove the NETFS_SREQ_RETRYING flag as it is superfluous with
>          ->retry_count.  If the latter is non-zero, we're doing a retry.
>     
>      (4) Abandon a subrequest if retry_count is non-zero and we made no
>          progress.
>     
>      (5) Use ->retry_count in both the write-side and the read-size.

BR,
Lizhi