lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 6 Aug 2015 08:01:13 +1000
From:	Dave Chinner <david@...morbit.com>
To:	Jeff Moyer <jmoyer@...hat.com>
Cc:	"matthew r. wilcox" <matthew.r.wilcox@...el.com>,
	linda.knippers@...com, linux-kernel@...r.kernel.org,
	linux-fsdevel@...r.kernel.org
Subject: Re: regression introduced by "block: Add support for DAX
 reads/writes to block devices"

On Wed, Aug 05, 2015 at 04:19:08PM -0400, Jeff Moyer wrote:
> Hi, Matthew,
> 
> Linda Knippers noticed that commit (bbab37ddc20b) breaks mkfs.xfs:
> 
> # mkfs -t xfs -f /dev/pmem0
> meta-data=/dev/pmem0             isize=256    agcount=4, agsize=524288 blks
>          =                       sectsz=512   attr=2, projid32bit=1
>          =                       crc=0        finobt=0
> data     =                       bsize=4096   blocks=2097152, imaxpct=25
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
> log      =internal log           bsize=4096   blocks=2560, version=2
>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> mkfs.xfs: read failed: Numerical result out of range
> 
> I sat down with Linda to look into it, and the problem is that mkfs.xfs
> sets the blocksize of the device to 512 (via BLKBSZSET), and then reads
> from the last sector of the device.  This results in dax_io trying to do
> a page-sized I/O at 512 bytes from the end of the device.

Right - we have to be able to do IO to that last sector, so this is
a sanity check to tell if the block dev is large enough. The XFS
kernel code does the same end-of-device sector read when the
filesystem is mounted, too.

> bdev_direct_access, receiving this bogus pos/size combo, returns
> -ERANGE:
> 
> 	if ((sector + DIV_ROUND_UP(size, 512)) >
> 					part_nr_sects_read(bdev->bd_part))
> 		return -ERANGE;
> 
> Given that file systems supporting dax refuse to mount with a blocksize
> != page size, I'm guessing this is sort of expected behavior.  However,
> we really shouldn't be breaking direct I/O on pmem devices.

If the device is advertising 512 byte sector size support, then this
needs to work, especially as DAX is completely transparent on the
block device. Remember that DAX through a filesystem works on
filesystem data block size boundaries, so a 512 byte sector/4k block
size filesystem will be able to use DAX for mmapped files just fine.

> So, what do you want to do?  We could make the pmem device's logical
> block size fixed at the sytem page size.  Or, we could modify the dax
> code to work with blocksize < pagesize.  Or, we could continue using the
> direct I/O codepath for direct block device access.  What do you think?

I don't know how the pmem device sets up it's limits. Can you post
the output of:

	/sys/block/pmem0/queue/logical_block_size
	/sys/block/pmem0/queue/physical_block_size
	/sys/block/pmem0/queue/hw_sector_size
	/sys/block/pmem0/queue/minimum_io_size
	/sys/block/pmem0/queue/optimal_io_size

As these all affect how mkfs.xfs configures the filesystem being
made and so influences the size and alignment of the IO is does....

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ