lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aRW7yWziEnN_kkk5@li-dc0c254c-257c-11b2-a85c-98b6c1322444.ibm.com>
Date: Thu, 13 Nov 2025 16:36:49 +0530
From: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
To: "Darrick J. Wong" <djwong@...nel.org>
Cc: zlang@...hat.com, fstests@...r.kernel.org, linux-ext4@...r.kernel.org
Subject: Re: [PATCH 3/7] generic/778: fix background loop control with
 sentinel files

On Mon, Nov 10, 2025 at 10:26:48AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@...nel.org>
> 
> This test fails on my slowish QA VM with 32k-fsblock xfs:
> 
>  --- /run/fstests/bin/tests/generic/778.out      2025-10-20 10:03:43.432910446 -0700
>  +++ /var/tmp/fstests/generic/778.out.bad        2025-11-04 12:01:31.137813652 -0800
>  @@ -1,2 +1,137 @@
>   QA output created by 778
>  -Silence is golden
>  +umount: /opt: target is busy.
>  +mount: /opt: /dev/sda4 already mounted on /opt.
>  +       dmesg(1) may have more information after failed mount system call.
>  +cycle mount failed
>  +(see /var/tmp/fstests/generic/778.full for details)
> 
> Injecting a 'ps auxfww' into the _scratch_cycle_mount helper reveals
> that this process is still sitting on /opt:
> 
> root     1804418  9.0  0.8 144960 134368 pts/0   Dl+  12:01   0:00 /run/fstests/xfsprogs/io/xfs_io -i -c open -fsd /opt/testfile -c pwrite -S 0x61 -DA -V1 -b 134217728 134217728 134217728
> 
> Yes, that's the xfs_io process started by atomic_write_loop.
> Inexplicably, the awloop killing code terminates the subshell running
> the for loop in atomic_write_loop but only waits for the subshell itself
> to exit.  It doesn't wait for any of that subshell's children, and
> that's why the unmount fails.

Ouch, thanks for catching this. This approach looks good to me.

Feel free to add:
Reviewed-by: Ojaswin Mujoo <ojaswin@...ux.ibm.com>

Regards,
ojaswin

> 
> A bare "wait" (without the $awloop_pid parameter) also doesn't wait for
> the xfs_io because the parent shell sees the subshell exit and treats
> that as job completion.  We can't use killall here because the system
> could be running check-parallel, nor can we use pkill here because the
> pid namespace containment code was removed.
> 
> The simplest stupid answer is to use sentinel files to control the loop.
> 
> Cc: <fstests@...r.kernel.org> # v2025.10.20
> Fixes: ca954527ff9d97 ("generic: Add sudden shutdown tests for multi block atomic writes")
> Signed-off-by: "Darrick J. Wong" <djwong@...nel.org>
> ---
>  tests/generic/778 |   36 ++++++++++++++++++++++++++++++------
>  1 file changed, 30 insertions(+), 6 deletions(-)
> 
> 
> diff --git a/tests/generic/778 b/tests/generic/778
> index 7cfabc3a47a521..715de458268ebc 100755
> --- a/tests/generic/778
> +++ b/tests/generic/778
> @@ -21,6 +21,9 @@ _scratch_mount >> $seqres.full
>  testfile=$SCRATCH_MNT/testfile
>  touch $testfile
>  
> +awloop_runfile=$tmp.awloop_running
> +awloop_killfile=$tmp.awloop_kill
> +
>  awu_max=$(_get_atomic_write_unit_max $testfile)
>  blksz=$(_get_block_size $SCRATCH_MNT)
>  echo "Awu max: $awu_max" >> $seqres.full
> @@ -31,25 +34,48 @@ num_blocks=$((awu_max / blksz))
>  filesize=$(( 10 * 1024 * 1024 * 1024 ))
>  
>  _cleanup() {
> -	[ -n "$awloop_pid" ] && kill $awloop_pid &> /dev/null
> -	wait
> +	kill_awloop
>  }
>  
>  atomic_write_loop() {
>  	local off=0
>  	local size=$awu_max
> +
> +	rm -f $awloop_killfile
> +	touch $awloop_runfile
> +
>  	for ((i=0; i<$((filesize / $size )); i++)); do
>  		# Due to sudden shutdown this can produce errors so just
>  		# redirect them to seqres.full
>  		$XFS_IO_PROG -c "open -fsd $testfile" -c "pwrite -S 0x61 -DA -V1 -b $size $off $size" >> /dev/null 2>>$seqres.full
> +		if [ ! -w "$testfile" ] || [ -e "$awloop_killfile" ]; then
> +			break
> +		fi
>  		echo "Written to offset: $((off + size))" >> $tmp.aw
>  		off=$((off + size))
>  	done
> +
> +	rm -f $awloop_runfile
> +}
> +
> +# Use sentinel files to control the loop execution because we don't know the
> +# pid of the xfs_io process and so we can't wait for it directly.  A bare
> +# wait command won't wait for a D-state xfs_io process so we can't do that
> +# either.  We can't use killall because check-parallel, and we can't pkill
> +# because the pid namespacing code was removed withotu fixing check-parallel.
> +kill_awloop() {
> +	test -e $awloop_runfile || return
> +
> +	touch $awloop_killfile
> +
> +	for ((i=0;i<300;i++)); do
> +		test -e $awloop_runfile || break
> +		sleep 0.1
> +	done
>  }
>  
>  start_atomic_write_and_shutdown() {
>  	atomic_write_loop &
> -	awloop_pid=$!
>  	local max_loops=100
>  
>  	local i=0
> @@ -70,9 +96,7 @@ start_atomic_write_and_shutdown() {
>  	echo "# Shutting down filesystem while write is running" >> $seqres.full
>  	_scratch_shutdown
>  
> -	kill $awloop_pid 2>/dev/null  # the process might have finished already
> -	wait $awloop_pid
> -	unset $awloop_pid
> +	kill_awloop
>  }
>  
>  # This test has the following flow:
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ