[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 8 Aug 2016 12:30:18 +0000
From: "Boylston, Brian" <brian.boylston@....com>
To: Jan Kara <jack@...e.cz>
CC: Dave Chinner <david@...morbit.com>,
"Kani, Toshimitsu" <toshi.kani@....com>,
"linux-nvdimm@...ts.01.org" <linux-nvdimm@...ts.01.org>,
"xfs@....sgi.com" <xfs@....sgi.com>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
Ross Zwisler <ross.zwisler@...ux.intel.com>
Subject: RE: Subtle races between DAX mmap fault and write path
Jan Kara wrote on 2016-08-08:
> On Fri 05-08-16 19:58:33, Boylston, Brian wrote:
>> Dave Chinner wrote on 2016-08-05:
>>> [ cut to just the important points ]
>>> On Thu, Aug 04, 2016 at 06:40:42PM +0000, Kani, Toshimitsu wrote:
>>>> On Tue, 2016-08-02 at 10:21 +1000, Dave Chinner wrote:
>>>>> If I drop the fsync from the
>>>>> buffered IO path, bandwidth remains the same but runtime drops to
>>>>> 0.55-0.57s, so again the buffered IO write path is faster than DAX
>>>>> while doing more work.
>>>>
>>>> I do not think the test results are relevant on this point because both
>>>> buffered and dax write() paths use uncached copy to avoid clflush. The
>>>> buffered path uses cached copy to the page cache and then use uncached copy to
>>>> PMEM via writeback. Therefore, the buffered IO path also benefits from using
>>>> uncached copy to avoid clflush.
>>>
>>> Except that I tested without the writeback path for buffered IO, so
>>> there was a direct comparison for single cached copy vs single
>>> uncached copy.
>>>
>>> The undenial fact is that a write() with a single cached copy with
>>> all the overhead of dirty page tracking is /faster/ than a much
>>> shorter, simpler IO path that uses an uncached copy. That's what the
>>> numbers say....
>>>
>>>> Cached copy (req movq) is slightly faster than uncached copy,
>>>
>>> Not according to Boaz - he claims that uncached is 20% faster than
>>> cached. How about you two get together, do some benchmarking and get
>>> your story straight, eh?
>>>
>>>> and should be
>>>> used for writing to the page cache. For writing to PMEM, however, additional
>>>> clflush can be expensive, and allocating cachelines for PMEM leads to evict
>>>> application's cachelines.
>>>
>>> I keep hearing people tell me why cached copies are slower, but
>>> no-one is providing numbers to back up their statements. The only
>>> numbers we have are the ones I've published showing cached copies w/
>>> full dirty tracking is faster than uncached copy w/o dirty tracking.
>>>
>>> Show me the numbers that back up your statements, then I'll listen
>>> to you.
>>
>> Here are some numbers for a particular scenario, and the code is below.
>>
>> Time (in seconds) to copy a 16KiB buffer 1M times to a 4MiB NVDIMM buffer
>> (1M total memcpy()s). For the cached+clflush case, the flushes are done
>> every 4MiB (which seems slightly faster than flushing every 16KiB):
>>
>> NUMA local NUMA remote
>> Cached+clflush 13.5 37.1
>> movnt 1.0 1.3
>
> Thanks for the test Brian. But looking at the current source of libpmem
> this seems to be comparing apples to oranges. Let me explain the details
> below:
>
>> In the code below, pmem_persist() does the CLFLUSH(es) on the given range,
>> and pmem_memcpy_persist() does non-temporal MOVs with an SFENCE:
>
> Yes. libpmem does what you describe above and the name
> pmem_memcpy_persist() is thus currently misleading because it is not
> guaranteed to be persistent with the current implementation of DAX in
> the kernel.
>
> It is important to know which kernel version and what filesystem have you
> used for the test to be able judge the details but generally pmem_persist()
> does properly tell the filesystem to flush all metadata associated with the
> file, commit open transactions etc. That's the full cost of persistence.
I used NVML 1.1 for the measurements. In this version and with the hardware
that I used, the pmem_persist() flow is:
pmem_persist()
pmem_flush()
Func_flush() == flush_clflush
CLFLUSH
pmem_drain()
Func_predrain_fence() == predrain_fence_empty
no-op
So, I don't think that pmem_persist() does anything to cause the filesystem
to flush metadata as it doesn't make any system calls?
> pmem_memcpy_persist() makes sure the data writes have reached persistent
> storage but nothing guarantees associated metadata changes have reached
> persistent storage as well.
While metadata is certainly important, my goal with this specific test was
to measure the "raw" performance of cached+flush vs uncached, without
anything else in the way.
> To assure that, fsync() (or pmem_persist()
> if you wish) is currently the only way from userspace.
Perhaps you mean pmem_msync() here? pmem_msync() calls msync(), but
pmem_persist() does not.
> At which point
> you've lost most of the advantages using movnt. Ross researches into
> possibilities of allowing more efficient userspace implementation but
> currently there are none.
Apart from the current performance discussion, if the metadata for a file
is already established (file created, space allocated by explicit writes(),
and everything synced), then if I map it and do pmem_memcpy_persist(),
are there any "ongoing" metadata updates that would need to be flushed
(besides timestamps)?
Brian
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists