[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <FCA4A132-E6E1-4568-8A89-3DE441F189AB@dilger.ca>
Date: Wed, 23 Aug 2017 15:01:38 -0600
From: Andreas Dilger <adilger@...ger.ca>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Doug Nazar <nazard@...ar.ca>, Al Viro <viro@...iv.linux.org.uk>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Wei Fang <fangwei1@...wei.com>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
Mark Fasheh <mfasheh@...sity.com>,
Joel Becker <jlbec@...lplan.org>,
Dave Kleikamp <shaggy@...nel.org>
Subject: Re: Kernels v4.9+ cause short reads of block devices
On Aug 23, 2017, at 2:13 PM, Linus Torvalds <torvalds@...ux-foundation.org> wrote:
>
> On Wed, Aug 23, 2017 at 12:53 PM, Doug Nazar <nazard@...ar.ca> wrote:
>>
>> It's compiling now, but I think it's already set to MAX_LFS_FILESIZE.
>>
>> [ 169.095127] ppos=80180006000, s_maxbytes=7ffffffffff, magic=0x62646576,
>> type=bdev
>
> Oh, right you are - I'm much too used to 64-bit, where
> MAX_LFS_FILESIZE is basically infinite, and was jusr assuming that it
> was something like the UFS bug we had not that long ago that was due
> to the 32-bit limit.
>
> But yes, on 32-bit, we are limited by the 32-bit index into the page
> cache, and we limit the index to 31 bits too, so we have (PAGE_SIZE <<
> 31) -1, which is that 7ffffffffff.
>
> And that also explains why people haven't seen it. You do need
>
> (a) 32-bit environment
>
> (b) a disk larger than that 8TB in size
>
> The *hard* limit for the page cache on a 32-bit environment should
> actually be (PAGE_SIZE << 32)-PAGE_SIZE (that final PAGE_SIZE
> subtraction is to make sure we don't generate that page cache with
> index -1), so having a disk that is 16TB or larger is not going to
> work, but your disk is right in that 8TB-16TB hole that used to work
> and was broken by that check.
>
> Anyway, that makes me feel better. I should have looked at your disk
> size more, now I at least understand why nobody noticed before.
>
> So just throw away my patch. That's wrong, and garbage.
>
> The *right* patch is likely to just this instead:
>
> -#define MAX_LFS_FILESIZE (((loff_t)PAGE_SIZE << (BITS_PER_LONG-1))-1)
> +#define MAX_LFS_FILESIZE (((loff_t)PAGE_SIZE <<
> BITS_PER_LONG)-PAGE_SIZE)
>
> which should make MAX_LFS_FILESIZE be 0xffffffff000 and you disk size
> should be ok.
Doug,
I noticed while checking for other implications of changing MAX_LFS_FILESIZE
that fs/jfs/super.c is also working around this limit. If you are going
to submit a patch for this, it also makes sense to fix jfs_fill_super() to
use MAX_LFS_FILESIZE instead of JFS rolling its own, something like:
/* logical blocks are represented by 40 bits in pxd_t, etc.
* and page cache is indexed by long. */
sb->s_maxbytes = min((u64)sb->s_blocksize) << 40,
MAX_LFS_FILESIZE);
It also looks like ocfs2_max_file_offset() is trying to avoid overflowing
the old 31-bit limit, and isn't using MAX_LFS_FILESIZE directly, so it will
now be wrong. It looks like it could use "bitshift = 32; trim = bytes;",
but Joel or Mark should confirm.
Finally, there is a check in fs/super.c::mount_fs() that is verifying
s_maxbytes is not set too large, but this has been present since 2.6.32
and should probably be removed at this point, or changed to a BUG_ON()
(see commit 42cb56ae2ab for details).
Cheers, Andreas
Download attachment "signature.asc" of type "application/pgp-signature" (196 bytes)
Powered by blists - more mailing lists