linux-kernel - Re: [PATCH 0/2] "big hammer" for DAX msync/fsync correctness

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAPcyv4iptRuGb0O13+LN0Qv7XUDdJYG6TCJNOVMASDTLw90gtw@mail.gmail.com>
Date:	Fri, 6 Nov 2015 15:17:27 -0800
From:	Dan Williams <dan.j.williams@...el.com>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	Ross Zwisler <ross.zwisler@...ux.intel.com>,
	Jeff Moyer <jmoyer@...hat.com>,
	linux-nvdimm <linux-nvdimm@...1.01.org>, X86 ML <x86@...nel.org>,
	Dave Chinner <david@...morbit.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>, Jan Kara <jack@...e.com>
Subject: Re: [PATCH 0/2] "big hammer" for DAX msync/fsync correctness

On Fri, Nov 6, 2015 at 9:35 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
> On Fri, 6 Nov 2015, Dan Williams wrote:
>> On Fri, Nov 6, 2015 at 12:06 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
>> > Just for the record. Such a flush mechanism with
>> >
>> >      on_each_cpu()
>> >         wbinvd()
>> >         ...
>> >
>> > will make that stuff completely unusable on Real-Time systems. We've
>> > been there with the big hammer approach of the intel graphics
>> > driver.
>>
>> Noted.  This means RT systems either need to disable DAX or avoid
>> fsync.  Yes, this is a wart, but not an unexpected one in a first
>> generation persistent memory platform.
>
> And it's not just only RT. The folks who are aiming for 100%
> undisturbed user space (NOHZ_FULL) will be massively unhappy about
> that as well.
>
> Is it really required to do that on all cpus?
>

I believe it is, but I'll double check.

I assume the folks that want undisturbed userspace are ok with the
mitigation to modify their application to flush by individual cache
lines if they want to use DAX without fsync.  At least until the
platform can provide a cheaper fsync implementation.

The option to drive cache flushing from the radix is at least
interruptible, but it might be long running depending on how much
virtual address space is dirty.  Altogether, the options in the
current generation are:

1/ wbinvd driven: quick flush O(size of cache), but long interrupt-off latency

2/ radix driven: long flush O(size of dirty range), but at least preempt-able

3/ DAX without calling fsync: userspace takes direct responsibility
for cache management of DAX mappings

4/ DAX disabled: fsync is the standard page cache writeback latency

We could potentially argue about 1 vs 2 ad nauseum, but I wonder if
there is room to it punt it to a configuration option or make it
dynamic?  My stance is do 1 with the hope of riding options 3 and 4
until the platform happens to provide a better alternative.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/