[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPcyv4hsicQybj1091n1n9aKDtQ1JB2fEhjK+_21mi4ta5S46Q@mail.gmail.com>
Date: Fri, 4 Aug 2017 11:24:49 -0700
From: Dan Williams <dan.j.williams@...el.com>
To: Ross Zwisler <ross.zwisler@...ux.intel.com>
Cc: Minchan Kim <minchan@...nel.org>,
Matthew Wilcox <willy@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"karam . lee" <karam.lee@....com>,
Jerome Marchand <jmarchan@...hat.com>,
Nitin Gupta <ngupta@...are.org>, seungho1.park@....com,
Christoph Hellwig <hch@....de>,
Dave Chinner <david@...morbit.com>, Jan Kara <jack@...e.cz>,
Jens Axboe <axboe@...nel.dk>,
Vishal Verma <vishal.l.verma@...el.com>,
"linux-nvdimm@...ts.01.org" <linux-nvdimm@...ts.01.org>,
Dave Jiang <dave.jiang@...el.com>
Subject: Re: [PATCH 0/3] remove rw_page() from brd, pmem and btt
On Fri, Aug 4, 2017 at 11:21 AM, Ross Zwisler
<ross.zwisler@...ux.intel.com> wrote:
> On Fri, Aug 04, 2017 at 11:01:08AM -0700, Dan Williams wrote:
>> [ adding Dave who is working on a blk-mq + dma offload version of the
>> pmem driver ]
>>
>> On Fri, Aug 4, 2017 at 1:17 AM, Minchan Kim <minchan@...nel.org> wrote:
>> > On Fri, Aug 04, 2017 at 12:54:41PM +0900, Minchan Kim wrote:
>> [..]
>> >> Thanks for the testing. Your testing number is within noise level?
>> >>
>> >> I cannot understand why PMEM doesn't have enough gain while BTT is significant
>> >> win(8%). I guess no rw_page with BTT testing had more chances to wait bio dynamic
>> >> allocation and mine and rw_page testing reduced it significantly. However,
>> >> in no rw_page with pmem, there wasn't many cases to wait bio allocations due
>> >> to the device is so fast so the number comes from purely the number of
>> >> instructions has done. At a quick glance of bio init/submit, it's not trivial
>> >> so indeed, i understand where the 12% enhancement comes from but I'm not sure
>> >> it's really big difference in real practice at the cost of maintaince burden.
>> >
>> > I tested pmbench 10 times in my local machine(4 core) with zram-swap.
>> > In my machine, even, on-stack bio is faster than rw_page. Unbelievable.
>> >
>> > I guess it's really hard to get stable result in severe memory pressure.
>> > It would be a result within noise level(see below stddev).
>> > So, I think it's hard to conclude rw_page is far faster than onstack-bio.
>> >
>> > rw_page
>> > avg 5.54us
>> > stddev 8.89%
>> > max 6.02us
>> > min 4.20us
>> >
>> > onstack bio
>> > avg 5.27us
>> > stddev 13.03%
>> > max 5.96us
>> > min 3.55us
>>
>> The maintenance burden of having alternative submission paths is
>> significant especially as we consider the pmem driver ising more
>> services of the core block layer. Ideally, I'd want to complete the
>> rw_page removal work before we look at the blk-mq + dma offload
>> reworks.
>>
>> The change to introduce BDI_CAP_SYNC is interesting because we might
>> have use for switching between dma offload and cpu copy based on
>> whether the I/O is synchronous or otherwise hinted to be a low latency
>> request. Right now the dma offload patches are using "bio_segments() >
>> 1" as the gate for selecting offload vs cpu copy which seem
>> inadequate.
>
> Okay, so based on the feedback above and from Jens[1], it sounds like we want
> to go forward with removing the rw_page() interface, and instead optimize the
> regular I/O path via on-stack BIOS and dma offload, correct?
>
> If so, I'll prepare patches that fully remove the rw_page() code, and let
> Minchan and Dave work on their optimizations.
I think the conversion to on-stack-bio should be done in the same
patchset that removes rw_page. We don't want to leave a known
performance regression while the on-stack-bio work is in-flight.
Powered by blists - more mailing lists