linux-kernel - Re: regression introduced by "block: Add support for DAX reads/writes to block devices"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <55C2BB9E.3040709@hp.com>
Date:	Wed, 05 Aug 2015 21:42:54 -0400
From:	Linda Knippers <linda.knippers@...com>
To:	Dave Chinner <david@...morbit.com>, Jeff Moyer <jmoyer@...hat.com>
CC:	"matthew r. wilcox" <matthew.r.wilcox@...el.com>,
	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: regression introduced by "block: Add support for DAX reads/writes
 to block devices"

On 08/05/2015 06:01 PM, Dave Chinner wrote:
> On Wed, Aug 05, 2015 at 04:19:08PM -0400, Jeff Moyer wrote:
>> Hi, Matthew,
>>
>> Linda Knippers noticed that commit (bbab37ddc20b) breaks mkfs.xfs:
>>
>> # mkfs -t xfs -f /dev/pmem0
>> meta-data=/dev/pmem0             isize=256    agcount=4, agsize=524288 blks
>>          =                       sectsz=512   attr=2, projid32bit=1
>>          =                       crc=0        finobt=0
>> data     =                       bsize=4096   blocks=2097152, imaxpct=25
>>          =                       sunit=0      swidth=0 blks
>> naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
>> log      =internal log           bsize=4096   blocks=2560, version=2
>>          =                       sectsz=512   sunit=0 blks, lazy-count=1
>> realtime =none                   extsz=4096   blocks=0, rtextents=0
>> mkfs.xfs: read failed: Numerical result out of range
>>
>> I sat down with Linda to look into it, and the problem is that mkfs.xfs
>> sets the blocksize of the device to 512 (via BLKBSZSET), and then reads
>> from the last sector of the device.  This results in dax_io trying to do
>> a page-sized I/O at 512 bytes from the end of the device.
> 
> Right - we have to be able to do IO to that last sector, so this is
> a sanity check to tell if the block dev is large enough. The XFS
> kernel code does the same end-of-device sector read when the
> filesystem is mounted, too.
> 
>> bdev_direct_access, receiving this bogus pos/size combo, returns
>> -ERANGE:
>>
>> 	if ((sector + DIV_ROUND_UP(size, 512)) >
>> 					part_nr_sects_read(bdev->bd_part))
>> 		return -ERANGE;
>>
>> Given that file systems supporting dax refuse to mount with a blocksize
>> != page size, I'm guessing this is sort of expected behavior.  However,
>> we really shouldn't be breaking direct I/O on pmem devices.
> 
> If the device is advertising 512 byte sector size support, then this
> needs to work, especially as DAX is completely transparent on the
> block device. Remember that DAX through a filesystem works on
> filesystem data block size boundaries, so a 512 byte sector/4k block
> size filesystem will be able to use DAX for mmapped files just fine.
> 
>> So, what do you want to do?  We could make the pmem device's logical
>> block size fixed at the sytem page size.  Or, we could modify the dax
>> code to work with blocksize < pagesize.  Or, we could continue using the
>> direct I/O codepath for direct block device access.  What do you think?
> 
> I don't know how the pmem device sets up it's limits. Can you post
> the output of:
> 
> 	/sys/block/pmem0/queue/logical_block_size
512

> 	/sys/block/pmem0/queue/physical_block_size
512

> 	/sys/block/pmem0/queue/hw_sector_size
512

> 	/sys/block/pmem0/queue/minimum_io_size
512

> 	/sys/block/pmem0/queue/optimal_io_size
0

Let me know if you need anything else.

-- ljk


> As these all affect how mkfs.xfs configures the filesystem being
> made and so influences the size and alignment of the IO is does....
> 
> Cheers,
> 
> Dave.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/