linux-kernel - Re: [PATCH] Fix loop device flush before configure v2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1496898372.8617.27.camel@gmx.de>
Date:   Thu, 08 Jun 2017 07:06:12 +0200
From:   Mike Galbraith <efault@....de>
To:     James Wang <jnwang@...e.com>, axboe@...com, ming.lei@...hat.com
Cc:     hare@...e.com, linux-block@...r.kernel.org,
        linux-kernel@...r.kernel.org, mgorman@...e.com
Subject: Re: [PATCH] Fix loop device flush before configure v2

On Thu, 2017-06-08 at 10:17 +0800, James Wang wrote:
> This condition check was exist at before commit b5dd2f6047ca ("block: loop:
> improve performance via blk-mq") When add MQ support to loop device, it be
> removed because the member of '->lo_thread' be removed. And then upstream
> add '->worker_task', I think they forget add it to here.
> 
> When I install SLES-12 product is base on 4.4 kernel I found installer will
> hang +60 second at scan disks. and I found LVM tools would take this action.
> finally I found this problem is more obvious on AMD platform. This problem
> will impact all scenarios that scan loop devcie.
> 
> When the loop device didn't configure backing file or Request Queue, we
> shouldn't to cost a lot of time to flush it.

The changelog sounds odd to me, perhaps reword/condense a bit?...

While installing SLES-12 (based on v4.4), I found that the installer
will stall for 60+ seconds during LVM disk scan.  The root cause was
determined to be the removal of a bound device check in loop_flush()
by commit b5dd2f6047ca ("block: loop: improve performance via blk-mq").

Restoring this check, examining ->lo_state as set by loop_set_fd()
eliminates the bad behavior.

Test method:
modprobe loop max_loop=64
dd if=/dev/zero of=disk bs=512 count=200K
for((i=0;i<4;i++))do losetup -f disk; done
mkfs.ext4 -F /dev/loop0
for((i=0;i<4;i++))do mkdir t$i; mount /dev/loop$i t$i;done
for f in `ls /dev/loop[0-9]*|sort`; do \
	echo $f; dd if=$f of=/dev/null  bs=512 count=1; \
	done

Test output:  stock          patched
/dev/loop0    18.1217e-05    8.3842e-05
/dev/loop1     6.1114e-05    0.000147979
/dev/loop10    0.414701      0.000116564
/dev/loop11    0.7474        6.7942e-05
/dev/loop12    0.747986      8.9082e-05
/dev/loop13    0.746532      7.4799e-05
/dev/loop14    0.480041      9.3926e-05
/dev/loop15    1.26453       7.2522e-05

Note that from loop10 onward, the device is not mounted, yet the
stock kernel consumes several orders of magnitude more wall time
than it does for a mounted device.

Reviewed-by: Hannes Reinecke <hare@...e.com>
Signed-off-by: James Wang <jnwang@...e.com>
Fixes: b5dd2f6047ca ("block: loop: improve performance via blk-mq")
---
>  drivers/block/loop.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 48f6fa6f810e..2e5b8538760c 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -625,6 +625,9 @@ static int loop_switch(struct loop_device *lo, struct file *file)
>   */
>  static int loop_flush(struct loop_device *lo)
>  {
> +	/* loop not yet configured, no running thread, nothing to flush */
> +	if (lo->lo_state != Lo_bound)
> +		return 0;
>  	return loop_switch(lo, NULL);
>  }
>