linux-kernel - RE: [PATCH V2 for-6.10/block 1/2] loop: Fix a race between loop detach and loop open

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: 
 <IA1PR10MB7240AB2157B46F325669859D98EB2@IA1PR10MB7240.namprd10.prod.outlook.com>
Date: Wed, 22 May 2024 19:12:59 +0000
From: Gulam Mohamed <gulam.mohamed@...cle.com>
To: Yu Kuai <yukuai1@...weicloud.com>, Jens Axboe <axboe@...nel.dk>,
        "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC: "shinichiro.kawasaki@....com" <shinichiro.kawasaki@....com>,
        "chaitanyak@...dia.com" <chaitanyak@...dia.com>,
        "hch@....de" <hch@....de>, "yukuai (C)" <yukuai3@...wei.com>
Subject: RE: [PATCH V2 for-6.10/block 1/2] loop: Fix a race between loop
 detach and loop open

Hi Kuai,

> -----Original Message-----
> From: Yu Kuai <yukuai1@...weicloud.com>
> Sent: Wednesday, May 22, 2024 8:01 AM
> To: Jens Axboe <axboe@...nel.dk>; Gulam Mohamed
> <gulam.mohamed@...cle.com>; linux-block@...r.kernel.org; linux-
> kernel@...r.kernel.org
> Cc: shinichiro.kawasaki@....com; chaitanyak@...dia.com; hch@....de;
> yukuai (C) <yukuai3@...wei.com>
> Subject: Re: [PATCH V2 for-6.10/block 1/2] loop: Fix a race between loop
> detach and loop open
> 
> Hi,
> 
> 在 2024/05/22 9:39, Jens Axboe 写道:
> > On 5/21/24 4:42 PM, Gulam Mohamed wrote:
> >> Description
> >> ===========
> >>
> >> 1. Userspace sends the command "losetup -d" which uses the open() call
> >>     to open the device
> >> 2. Kernel receives the ioctl command "LOOP_CLR_FD" which calls the
> >>     function loop_clr_fd()
> >> 3. If LOOP_CLR_FD is the first command received at the time, then the
> >>     AUTOCLEAR flag is not set and deletion of the
> >>     loop device proceeds ahead and scans the partitions (drop/add
> >>     partitions)
> >>
> >> 	if (disk_openers(lo->lo_disk) > 1) {
> >> 		lo->lo_flags |= LO_FLAGS_AUTOCLEAR;
> >> 		loop_global_unlock(lo, true);
> >> 		return 0;
> >> 	}
> >>
> >>   4. Before scanning partitions, it will check to see if any partition of
> >>      the loop device is currently opened
> >>   5. If any partition is opened, then it will return EBUSY:
> >>
> >>      if (disk->open_partitions)
> >> 		return -EBUSY;
> >>   6. So, after receiving the "LOOP_CLR_FD" command and just before the
> above
> >>      check for open_partitions, if any other command
> >>      (like blkid) opens any partition of the loop device, then the partition
> >>      scan will not proceed and EBUSY is returned as shown in above code
> >>   7. But in "__loop_clr_fd()", this EBUSY error is not propagated
> >>   8. We have noticed that this is causing the partitions of the loop to
> >>      remain stale even after the loop device is detached resulting in the
> >>      IO errors on the partitions
> >>
> >> Fix
> >> ---
> >> Re-introduce the lo_open() call to restrict any process to open the
> >> loop device when its being detached
> >>
> >> Test case
> >> =========
> >> Test case involves the following two scripts:
> >>
> >> script1.sh
> >> ----------
> >> while [ 1 ];
> >> do
> >> 	losetup -P -f /home/opt/looptest/test10.img
> >> 	blkid /dev/loop0p1
> >> done
> >>
> >> script2.sh
> >> ----------
> >> while [ 1 ];
> >> do
> >> 	losetup -d /dev/loop0
> >> done
> >>
> >> Without fix, the following IO errors have been observed:
> >>
> >> kernel: __loop_clr_fd: partition scan of loop0 failed (rc=-16)
> >> kernel: I/O error, dev loop0, sector 20971392 op 0x0:(READ) flags 0x80700
> >>          phys_seg 1 prio class 0
> >> kernel: I/O error, dev loop0, sector 108868 op 0x0:(READ) flags 0x0
> >>          phys_seg 1 prio class 0
> >> kernel: Buffer I/O error on dev loop0p1, logical block 27201, async page
> >>          read
> >>
> >> V1->V2:
> >> 	Added a test case, 010, in blktests in tests/loop/
> >> Signed-off-by: Gulam Mohamed <gulam.mohamed@...cle.com>
> >> ---
> >>   drivers/block/loop.c | 19 +++++++++++++++++++
> >>   1 file changed, 19 insertions(+)
> >>
> >> diff --git a/drivers/block/loop.c b/drivers/block/loop.c index
> >> 28a95fd366fe..9a235d8c062d 100644
> >> --- a/drivers/block/loop.c
> >> +++ b/drivers/block/loop.c
> >> @@ -1717,6 +1717,24 @@ static int lo_compat_ioctl(struct block_device
> *bdev, blk_mode_t mode,
> >>   }
> >>   #endif
> >>
> >> +static int lo_open(struct gendisk *disk, blk_mode_t mode) {
> >> +        struct loop_device *lo = disk->private_data;
> >> +        int err;
> >> +
> >> +        if (!lo)
> >> +                return -ENXIO;
> >> +
> >> +        err = mutex_lock_killable(&lo->lo_mutex);
> >> +        if (err)
> >> +                return err;
> >> +
> >> +        if (lo->lo_state == Lo_rundown)
> >> +                err = -ENXIO;
> >> +        mutex_unlock(&lo->lo_mutex);
> 
> This doesn't fix the problem completely, there is still a race window.
> 
> lo_release
>   if (disk_openers(disk) > 0)
>    reutrn
>    -> no openers now
> 		lo_open
> 		 if (lo->lo_state == Lo_rundown)
> 		 -> no set yet
> 		 open succeed
>   mutex_lock(lo_mutex)
>   lo->lo_state = Lo_rundown
>   mutex_unlock(lo_mutex)
>   __loop_clr_fd
We have noticed that, at block layer, both open() and release() are protected by gendisk->open_mutex.
So, this race may not happen. Can you please let me know if I understand correctly?
> 
> And with the respect, loop_clr_fd() has the same problem.
> 
> I think probably loop need a open counter for itself.
We are looking to see how to handle this case
> 
> Thanks,
> Kuai
> 
> >> +	return err;
> >> +}
> >
> > Most of this function uses spaces rather than tabs.
> >