lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: 
 <IA1PR10MB7240890CA0B9A1697674D84698F02@IA1PR10MB7240.namprd10.prod.outlook.com>
Date: Mon, 27 May 2024 09:56:19 +0000
From: Gulam Mohamed <gulam.mohamed@...cle.com>
To: Shinichiro Kawasaki <shinichiro.kawasaki@....com>
CC: "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "axboe@...nel.dk" <axboe@...nel.dk>,
        "chaitanyak@...dia.com"
	<chaitanyak@...dia.com>,
        "hch@....de" <hch@....de>
Subject: RE: [PATCH blktests] loop: Test to detect a race condition between
 loop detach and open

Hi,

> -----Original Message-----
> From: Shinichiro Kawasaki <shinichiro.kawasaki@....com>
> Sent: Thursday, May 23, 2024 4:45 PM
> To: Gulam Mohamed <gulam.mohamed@...cle.com>
> Cc: linux-block@...r.kernel.org; linux-kernel@...r.kernel.org;
> axboe@...nel.dk; chaitanyak@...dia.com; hch@....de
> Subject: Re: [PATCH 2/2] loop: Test to detect a race condition between loop
> detach and open
> 
> On May 21, 2024 / 22:42, Gulam Mohamed wrote:
> > When one process opens a loop device partition and another process
> > detaches it, there will be a race condition due to which stale loop
> > partitions are created causing IO errors. This test will detect the
> > race
> 
> Hello Gulam, thanks for the patch. I ran the new test case on my system and
> observed it failed. It looks working as expected.
> 
> This is the patch not for Linux kernel but for blktests, then it's the better to
> separate it out. I suggest to add the subject title prefix "[PATCH blktests]"
> to this patch so that reviewers can tell the target repository easily.
> 
Done. Changed the subject line prefix to "[PATCH blktests]"
 
> I suggest to take a look in the "new" script. It has some guides for blktests,
> such as "Indent with tabs" or the global variable "TMPDIR". Also, please run
> "make check" which runs shellcheck.
> 
>   $ make check
>   shellcheck -x -e SC2119 -f gcc check common/* \
>           tests/*/rc tests/*/[0-9]*[0-9] src/*.sh
>   tests/loop/010:23:17: note: Instead of '[ 1 ]', use 'true'. [SC2161]
>   tests/loop/010:32:17: note: Instead of '[ 1 ]', use 'true'. [SC2161]
>   tests/loop/010:84:7: note: Double quote to prevent globbing and word
> splitting. [SC2086]
>   make: *** [Makefile:21: check] Error 1
> 
> Please address the shellcheck warnings above. It will help us to keep script
> quality at some level.
Done. Resolved the warnings
> 
> >
> > Signed-off-by: Gulam Mohamed <gulam.mohamed@...cle.com>
> > ---
> >  tests/loop/010     | 90
> ++++++++++++++++++++++++++++++++++++++++++++++
> >  tests/loop/010.out |  2 ++
> >  2 files changed, 92 insertions(+)
> >  create mode 100755 tests/loop/010
> >  create mode 100644 tests/loop/010.out
> >
> > diff --git a/tests/loop/010 b/tests/loop/010 new file mode 100755
> > index 000000000000..ea93a120d51a
> > --- /dev/null
> > +++ b/tests/loop/010
> > @@ -0,0 +1,90 @@
> > +#!/bin/bash
> > +# SPDX-License-Identifier: GPL-3.0+
> > +# Copyright (C) 2024, Oracle and/or its affiliates.
> > +#
> > +# Test to detect a race between loop detach and loop open which
> > +creates # stale loop partitions when one process opens the loop
> > +partition and # another process detaches the loop device # .
> > +tests/loop/rc DESCRIPTION="check stale loop partition"
> > +TIMED=1
> > +
> > +requires() {
> > +        _have_program parted
> > +        _have_program mkfs.xfs
> > +}
> > +
> > +image_file="/tmp/loopImg"
> 
> $TMPDIR is recommended instead of /tmp. $TMPDIR is cleaned up after test
> run.
Done. Using $TMPDIR now.
> 
> > +line1="partition scan of loop0 failed (rc=-16)"
> > +
> > +function create_loop()
> 
> Nit: the "function" keyword is not used in blktests (there is one exception
> though).
> 
Done. Removed the keyword "function" now
> > +{
> > +        while [ 1 ]
> > +        do
> > +                loop_device="$(losetup -P -f --show "${image_file}")"
> > +                blkid /dev/loop0p1 >> /dev/null 2>&1
> > +        done
> > +}
> > +
> > +function detach_loop()
> > +{
> > +        while [ 1 ]
> > +        do
> > +                if [ -e /dev/loop0 ]; then
> > +                        losetup -d /dev/loop0 > /dev/null 2>&1
> > +                fi
> > +        done
> > +}
> > +
> > +test() {
> > +	echo "Running ${TEST_NAME}"
> > +	local failure="/tmp/failure"
> 
> Nit: some local variables are declared with "local" keyword, and some other
> are not.
Done. Using the "local" keyword for all local variables now
> 
> > +	touch $failure
> > +	echo 0 > $failure
> 
> Why the file is required to keep the fail/pass status? Just a bash variable looks
> enough.
In the beginning I tried with a bash variable but it was not working as the variable need to be accessed by other two child threads also.
> 
> > +	dmesg -c > /dev/null 2>&1
> 
> This kernel ring buffer clear will likely affect test harnesses which run blktests.
> To check the kernel message generated during each test case run, please use
> _dmesg_since_test_start(): ref nbd/004, nvme/003.
Done. Used the above function
> 
> > +
> > +	truncate -s 2G "${image_file}"
> 
> Other test cases in loop group have 1G file size. Is 2G required for this test?
> If not, 1G size would be the better to reduce disk shortage risk.
> 
Using 1G is not an issue. Done
> > +	parted -a none -s "${image_file}" mklabel gpt
> > +	loop_device="$(losetup -P -f --show "${image_file}")"
> > +	parted -a none -s "${loop_device}" mkpart primary 64s 109051s
> > +
> > +	udevadm settle
> > +        if [ ! -e "${loop_device}" ]; then
> > +		return 1
> > +        fi
> > +
> > +        mkfs.xfs -f "${loop_device}p1" > /dev/null 2>&1
> > +
> > +        losetup -d "${loop_device}" >  /dev/null 2>&1
> > +
> > +        create_loop &
> > +	create_pid=$!
> > +        detach_loop &
> > +	detach_pid=$!
> > +
> > +	sleep "${TIMEOUT:-180}"
> 
> The default 180 seconds sounds a bit long. How about 30 seconds?
Using 180 seconds is just to give more time to reproduce the issue as 30 seconds may not be sufficient for the race condition to occur

Will send the version 2 as soon as possible, with all these changes
> 
> > +        {
> > +                kill -9 $create_pid
> > +		kill -9 $detach_pid
> > +                wait
> > +                sleep 1
> > +        } 2>/dev/null
> > +
> > +        losetup -D > /dev/null 2>&1
> > +        dmesg  | while IFS= read -r line2
> > +	do
> > +        	match=$(echo "$line2" | grep -o "$line1")
> > +                if [ "$line1" == "$match" ]; then
> > +			echo 1 > "/tmp/failure"
> > +			break
> > +                fi
> > +        done
> > +	failed=$(cat $failure)
> > +	if [ $failed -eq 0 ]; then
> > +		echo "Test complete"
> > +	else
> > +		echo "Test failed"
> > +	fi
> > +	rm -f $failure
> > +}
> > diff --git a/tests/loop/010.out b/tests/loop/010.out new file mode
> > 100644 index 000000000000..64a6aee00b8a
> > --- /dev/null
> > +++ b/tests/loop/010.out
> > @@ -0,0 +1,2 @@
> > +Running loop/010
> > +Test complete
> > --
> > 2.39.3
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ