lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAH2r5mv2m3z+PHC_t1AaFAoV0+tU3fHU+HvX1HeK5S11u_KspA@mail.gmail.com>
Date: Sat, 28 Jun 2025 00:16:04 -0500
From: Steve French <smfrench@...il.com>
To: David Howells <dhowells@...hat.com>
Cc: Christian Brauner <christian@...uner.io>, Paulo Alcantara <pc@...guebit.com>, netfs@...ts.linux.dev, 
	linux-afs@...ts.infradead.org, linux-cifs@...r.kernel.org, 
	linux-nfs@...r.kernel.org, ceph-devel@...r.kernel.org, v9fs@...ts.linux.dev, 
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Paulo Alcantara <pc@...guebit.org>
Subject: Re: [PATCH v2 01/16] netfs: Fix hang due to missing case in final DIO
 read result collection

You can add my tested by to the first 11 in the series.  I have
verified that they fix the netfs regression (e.g. hangs in xfstest
generic/013 etc.).  The series appears important to make sure gets in
6.16

On Wed, Jun 25, 2025 at 11:44 AM David Howells <dhowells@...hat.com> wrote:
>
> When doing a DIO read, if the subrequests we issue fail and cause the
> request PAUSE flag to be set to put a pause on subrequest generation, we
> may complete collection of the subrequests (possibly discarding them) prior
> to the ALL_QUEUED flags being set.
>
> In such a case, netfs_read_collection() doesn't see ALL_QUEUED being set
> after netfs_collect_read_results() returns and will just return to the app
> (the collector can be seen unpausing the generator in the trace log).
>
> The subrequest generator can then set ALL_QUEUED and the app thread reaches
> netfs_wait_for_request().  This causes netfs_collect_in_app() to be called
> to see if we're done yet, but there's missing case here.
>
> netfs_collect_in_app() will see that a thread is active and set inactive to
> false, but won't see any subrequests in the read stream, and so won't set
> need_collect to true.  The function will then just return 0, indicating
> that the caller should just sleep until further activity (which won't be
> forthcoming) occurs.
>
> Fix this by making netfs_collect_in_app() check to see if an active thread
> is complete - i.e. that ALL_QUEUED is set and the subrequests list is empty
> - and to skip the sleep return path.  The collector will then be called
> which will clear the request IN_PROGRESS flag, allowing the app to
> progress.
>
> Fixes: 2b1424cd131c ("netfs: Fix wait/wake to be consistent about the waitqueue used")
> Reported-by: Steve French <sfrench@...ba.org>
> Signed-off-by: David Howells <dhowells@...hat.com>
> Reviewed-by: Paulo Alcantara <pc@...guebit.org>
> cc: linux-cifs@...r.kernel.org
> cc: netfs@...ts.linux.dev
> cc: linux-fsdevel@...r.kernel.org
> ---
>  fs/netfs/misc.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/fs/netfs/misc.c b/fs/netfs/misc.c
> index 43b67a28a8fa..0a54b1203486 100644
> --- a/fs/netfs/misc.c
> +++ b/fs/netfs/misc.c
> @@ -381,7 +381,7 @@ void netfs_wait_for_in_progress_stream(struct netfs_io_request *rreq,
>  static int netfs_collect_in_app(struct netfs_io_request *rreq,
>                                 bool (*collector)(struct netfs_io_request *rreq))
>  {
> -       bool need_collect = false, inactive = true;
> +       bool need_collect = false, inactive = true, done = true;
>
>         for (int i = 0; i < NR_IO_STREAMS; i++) {
>                 struct netfs_io_subrequest *subreq;
> @@ -400,9 +400,11 @@ static int netfs_collect_in_app(struct netfs_io_request *rreq,
>                         need_collect = true;
>                         break;
>                 }
> +               if (subreq || !test_bit(NETFS_RREQ_ALL_QUEUED, &rreq->flags))
> +                       done = false;
>         }
>
> -       if (!need_collect && !inactive)
> +       if (!need_collect && !inactive && !done)
>                 return 0; /* Sleep */
>
>         __set_current_state(TASK_RUNNING);
>
>


-- 
Thanks,

Steve

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ