linux-kernel - Re: Kernels v4.9+ cause short reads of block devices

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <FCA4A132-E6E1-4568-8A89-3DE441F189AB@dilger.ca>
Date:   Wed, 23 Aug 2017 15:01:38 -0600
From:   Andreas Dilger <adilger@...ger.ca>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Doug Nazar <nazard@...ar.ca>, Al Viro <viro@...iv.linux.org.uk>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Wei Fang <fangwei1@...wei.com>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        Mark Fasheh <mfasheh@...sity.com>,
        Joel Becker <jlbec@...lplan.org>,
        Dave Kleikamp <shaggy@...nel.org>
Subject: Re: Kernels v4.9+ cause short reads of block devices

On Aug 23, 2017, at 2:13 PM, Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> 
> On Wed, Aug 23, 2017 at 12:53 PM, Doug Nazar <nazard@...ar.ca> wrote:
>> 
>> It's compiling now, but I think it's already set to MAX_LFS_FILESIZE.
>> 
>> [  169.095127] ppos=80180006000, s_maxbytes=7ffffffffff, magic=0x62646576,
>> type=bdev
> 
> Oh, right you are - I'm much too used to 64-bit, where
> MAX_LFS_FILESIZE is basically infinite, and was jusr assuming that it
> was something like the UFS bug we had not that long ago that was due
> to the 32-bit limit.
> 
> But yes, on 32-bit, we are limited by the 32-bit index into the page
> cache, and we limit the index to 31 bits too, so we have (PAGE_SIZE <<
> 31) -1, which is that 7ffffffffff.
> 
> And that also explains why people haven't seen it. You do need
> 
> (a) 32-bit environment
> 
> (b) a disk larger than that 8TB in size
> 
> The *hard* limit for the page cache on a 32-bit environment should
> actually be (PAGE_SIZE << 32)-PAGE_SIZE (that final PAGE_SIZE
> subtraction is to make sure we don't generate that page cache with
> index -1), so having a disk that is 16TB or larger is not going to
> work, but your disk is right in that 8TB-16TB hole that used to work
> and was broken by that check.
> 
> Anyway, that makes me feel better. I should have looked at your disk
> size more, now I at least understand why nobody noticed before.
> 
> So just throw away my patch. That's wrong, and garbage.
> 
> The *right* patch is likely to just this instead:
> 
>  -#define MAX_LFS_FILESIZE       (((loff_t)PAGE_SIZE << (BITS_PER_LONG-1))-1)
>  +#define MAX_LFS_FILESIZE       (((loff_t)PAGE_SIZE <<
> BITS_PER_LONG)-PAGE_SIZE)
> 
> which should make MAX_LFS_FILESIZE be 0xffffffff000 and you disk size
> should be ok.

Doug,
I noticed while checking for other implications of changing MAX_LFS_FILESIZE
that fs/jfs/super.c is also working around this limit.  If you are going
to submit a patch for this, it also makes sense to fix jfs_fill_super() to
use MAX_LFS_FILESIZE instead of JFS rolling its own, something like:

	/* logical blocks are represented by 40 bits in pxd_t, etc.
	 * and page cache is indexed by long. */
	sb->s_maxbytes = min((u64)sb->s_blocksize) << 40,
                             MAX_LFS_FILESIZE);

It also looks like ocfs2_max_file_offset() is trying to avoid overflowing
the old 31-bit limit, and isn't using MAX_LFS_FILESIZE directly, so it will
now be wrong.  It looks like it could use "bitshift = 32; trim = bytes;",
but Joel or Mark should confirm.

Finally, there is a check in fs/super.c::mount_fs() that is verifying
s_maxbytes is not set too large, but this has been present since 2.6.32
and should probably be removed at this point, or changed to a BUG_ON()
(see commit 42cb56ae2ab for details).

Cheers, Andreas






Download attachment "signature.asc" of type "application/pgp-signature" (196 bytes)