[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <55C8D208.1070903@hp.com>
Date: Mon, 10 Aug 2015 12:32:08 -0400
From: Linda Knippers <linda.knippers@...com>
To: Boaz Harrosh <boaz@...xistor.com>,
Dave Chinner <david@...morbit.com>
Cc: Jeff Moyer <jmoyer@...hat.com>,
"matthew r. wilcox" <matthew.r.wilcox@...el.com>,
linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
Vishal Verma <vishal.l.verma@...el.com>
Subject: Re: regression introduced by "block: Add support for DAX reads/writes
to block devices"
On 8/9/2015 4:52 AM, Boaz Harrosh wrote:
> On 08/06/2015 11:34 PM, Dave Chinner wrote:
>> On Thu, Aug 06, 2015 at 10:52:47AM +0300, Boaz Harrosh wrote:
>>> On 08/06/2015 06:24 AM, Dave Chinner wrote:
>>>> On Wed, Aug 05, 2015 at 09:42:54PM -0400, Linda Knippers wrote:
>>>>> On 08/05/2015 06:01 PM, Dave Chinner wrote:
>>>>>> On Wed, Aug 05, 2015 at 04:19:08PM -0400, Jeff Moyer wrote:
>>> <>
>>>>>>>
>>>>>>> I sat down with Linda to look into it, and the problem is that mkfs.xfs
>>>>>>> sets the blocksize of the device to 512 (via BLKBSZSET), and then reads
>>>>>>> from the last sector of the device. This results in dax_io trying to do
>>>>>>> a page-sized I/O at 512 bytes from the end of the device.
>>>>>>
>>>
>>> This part I do not understand. how is mkfs.xfs reading the sector?
>>> Is it through open(/dev/pmem0,...) ? O_DIRECT?
>>
>> mkfs.xfs uses O_DIRECT. Only if open(O_DIRECT) fails or mkfs.xfs is
>> told that it is working on an image file does it fall back to
>> buffered IO. All of the XFS userspace tools work this way to prevent
>> page cache pollution issues with read-once or write-once data during
>> operation.
>>
>
> Thanks, yes makes sense. This is a bug at the DAX implementation of
> bdev. Since as you know with DAX there is no difference between
> O_DIRECT and buffered, we must support any aligned IO. I bet it
> should be something with bdev not giving 4K buffer-heads to dax.c.
>
> Or ... It might just be the infamous bug where the actual partition
> they used was not 4k aligned on its start sector. So the last sector IO
> after partition translation came out wrong. This bug then should be
> fixed by: https://lists.01.org/pipermail/linux-nvdimm/2015-July/001555.html
> by:Vishal Verma
>
> Vishal I think we should add CC: stable@...r.kernel.org to your patch
> because of these fdisk bugs.
That patch does cause 'mkfs -t xfs' to work.
Before:
$ sudo mkfs -t xfs -f /dev/pmem3
meta-data=/dev/pmem3 isize=256 agcount=4, agsize=524288 blks
= sectsz=512 attr=2, projid32bit=1
= crc=0 finobt=0
data = bsize=4096 blocks=2097152, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=0
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
mkfs.xfs: read failed: Numerical result out of range
After:
$ sudo mkfs -t xfs -f /dev/pmem3
meta-data=/dev/pmem3 isize=256 agcount=4, agsize=524288 blks
= sectsz=4096 attr=2, projid32bit=1
= crc=0 finobt=0
data = bsize=4096 blocks=2097152, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=0
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
$ cat /sys/block/pmem3/queue/logical_block_size
512
$ cat /sys/block/pmem3/queue/physical_block_size
4096
$ cat /sys/block/pmem3/queue/hw_sector_size
512
$ cat /sys/block/pmem3/queue/minimum_io_size
4096
Previously physical_block_size was 512 and minimum_io_size was 0.
What about logical_block_size and hw_sector_size still being 512?
So do we want to change pmem rather than changing DAX?
-- ljk
>
>> Cheers,
>> Dave.
>
> Thanks
> Boaz
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists