lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 20 Oct 2010 09:33:16 +1100
From:	Neil Brown <neilb@...e.de>
To:	Mikulas Patocka <mpatocka@...hat.com>
Cc:	linux-kernel@...r.kernel.org, dm-devel@...hat.com,
	linux-fsdevel@...r.kernel.org, jaxboe@...ionio.com,
	Milan Broz <mbroz@...hat.com>,
	Alasdair G Kergon <agk@...hat.com>
Subject: Re: i_size misuses

On Tue, 19 Oct 2010 14:55:31 -0400 (EDT)
Mikulas Patocka <mpatocka@...hat.com> wrote:

> 
> 
> On Sun, 12 Sep 2010, Neil Brown wrote:
> 
> > On Wed, 8 Sep 2010 09:32:13 -0400 (EDT)
> > Mikulas Patocka <mpatocka@...hat.com> wrote:
> > 
> > > Hi
> > > 
> > > when reviewing some i_size problem, I searched the kernel for i_size usage 
> > > (the variable should really be written with i_size_write and read with 
> > > i_size_read).
> > > 
> > > Properly locked direct use of "i_size" inside memory management or 
> > > filesystems may not be a problem, but there are many problems in general 
> > > code outside mm.
> > > 
> > > The misuses are:
> > > SOUND/SOUND_FIRMWARE.C:l = filp->f_path.dentry->d_inode->i_size;
> > > KERNEL/RELAY.C:buf->dentry->d_inode->i_size = buf->early_bytes;
> > > KERNEL/RELAY.C:buf->dentry->d_inode->i_size += buf->chan->subbuf_size 
> > > -buf->padding[old_subbuf];
> > > DRIVERS/USB/CORE/INODE.C:dev->usbfs_dentry->d_inode->i_size = i_size;
> > > DRIVERS/MTD/DEVICES/BLOCK2MTD.C:dev->mtd.size = 
> > > dev->blkdev->bd_inode->i_size & PAGE_MASK;
> > > DRIVERS/MD/MD.C: many reads of i_size 
> > > DRIVERS/BLOCK/NBD.C: many writes to i_size without i_size_write
> > > DRIVERS/BLOCK/DRBD/DRBD_INT.H: return bdev ? bdev->bd_inode->i_size >> 9 : 0;
> > > DRIVERS/BLOCK/DRBD/DRBD_WRAPPERS.H: mdev->this_bdev->bd_inode->i_size = 
> > > (loff_t)size << 9;
> > > BLOCK/BLK-CORE.C:printk(KERN_INFO "%s: rw=%ld, want=%Lu, limit=%Lu\n",
> > >                 bdevname(bio->bi_bdev, b),
> > >                 bio->bi_rw,
> > >                 (unsigned long long)bio->bi_sector + bio_sectors(bio),
> > >                 (long long)(bio->bi_bdev->bd_inode->i_size >> 9));
> > > maxsector = bio->bi_bdev->bd_inode->i_size >> 9;
> > > BLOCK/COMPAT_IOCTL.C: size = bdev->bd_inode->i_size;
> > > return compat_put_u64(arg, bdev->bd_inode->i_size);
> > > BLOCK/IOCTL.C: if (start + len > (bdev->bd_inode->i_size >> 9))
> > > size = bdev->bd_inode->i_size;
> > > return put_u64(arg, bdev->bd_inode->i_size);
> > > 
> > > The problem with this code is that if you read i_size without i_size_read 
> > > and the size wraps around 32 bits, for example from 0xffffffff to 
> > > 0x100000000 , there is a possibility on 32-bit machines to read an invalid 
> > > value (either 0 or 0x1ffffffff). Similarly, if you write i_size without 
> > > i_size_write, the readers can see intermediate invalid values.
> > > 
> > > 
> > > The original problem that caused this investigation is the question, how a 
> > > block device driver can change the size of its device. Normal method (used 
> > > in a few drivers, including dm), consists of
> > > mutex_lock(&inode->i_mutex);
> > > i_size_write(inode, new_size);
> > > mutex_unlock(&inode->i_mutex);
> > 
> > Don't you just do
> > 
> >   set_capacity(gendisk, sectors);
> >   revalidate(gendisk);
> > 
> > ??
> > 
> > NeilBrown
> 
> revalidate_disk() has still the problem that it changes i_size without 
> holding i_mutex and other kernel parts (for example generic_file_llseek) 
> assume that if we hold the lock, i_size_can't be changed.

generic_file_llseek is not used for block devices.  They use block_llseek
which uses i_size_read, so I think it is safe.

> 
> The rules for accessing i_size must be specified and followed.

I agree.  However the rules can be different for different file systems and
file types.
A filesystem that used the generic_* function would need to only change
i_size under i_mutex as you say.
For block devices it appears that the rule is that it can only be changed
under bd_mutex.
For 'relay' (which you mentioned above), it seem the relevant mutex is the
global relay_channels_mutex, though I didn't read the code thoroughly to be
sure.

It still would probably be useful to review all the i_size related code to
ensure that it is safe, but you should not assume that everything follows the
same rules.  So first you need to work out the rule for a given subsystem,
then audit it against that rule (and maybe document that rule if it isn't
already documented!)

NeilBrown


> 
> Mikulas

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ