lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.11.1511070919160.4032@nanos>
Date:	Sat, 7 Nov 2015 09:38:36 +0100 (CET)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Dan Williams <dan.j.williams@...el.com>
cc:	"H. Peter Anvin" <hpa@...or.com>,
	Ross Zwisler <ross.zwisler@...ux.intel.com>,
	Jeff Moyer <jmoyer@...hat.com>,
	linux-nvdimm <linux-nvdimm@...1.01.org>, X86 ML <x86@...nel.org>,
	Dave Chinner <david@...morbit.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...hat.com>, Jan Kara <jack@...e.com>
Subject: Re: [PATCH 0/2] "big hammer" for DAX msync/fsync correctness

On Sat, 7 Nov 2015, Dan Williams wrote:
> On Fri, Nov 6, 2015 at 10:50 PM, Thomas Gleixner <tglx@...utronix.de> wrote:
> > On Fri, 6 Nov 2015, H. Peter Anvin wrote:
> >> On 11/06/15 15:17, Dan Williams wrote:
> >> >>
> >> >> Is it really required to do that on all cpus?
> >> >
> >> > I believe it is, but I'll double check.
> >> >
> >>
> >> It's required on all CPUs on which the DAX memory may have been dirtied.
> >>  This is similar to the way we flush TLBs.
> >
> > Right. And that's exactly the problem: "may have been dirtied"
> >
> > If DAX is used on 50% of the CPUs and the other 50% are plumming away
> > happily in user space or run low latency RT tasks w/o ever touching
> > it, then having an unconditional flush on ALL CPUs is just wrong
> > because you penalize the uninvolved cores with a completely pointless
> > SMP function call and drain their caches.
> >
> 
> It's not wrong and pointless, it's all we have available outside of
> having the kernel remember every virtual address that might have been
> touched since the last fsync and sit in a loop flushing those virtual
> address cache line by cache line.
> 
> There is a crossover point where wbinvd is better than a clwb loop
> that needs to be determined.

This is a totally different issue and I'm well aware that there is a
tradeoff between wbinvd() and a clwb loop. wbinvd() might be more
efficient performance wise above some number of cache lines, but then
again it's draining all unrelated stuff as well, which can result in a
even larger performance hit.

Now what really concerns me more is that you just unconditionally
flush on all CPUs whether they were involved in that DAX stuff or not.

Assume that DAX using application on CPU 0-3 and some other unrelated
workload on CPU4-7. That flush will 

  - Interrupt CPU4-7 for no reason (whether you use clwb or wbinvd)

  - Drain the cache for CPU4-7 for no reason if done with wbinvd()
   
  - Render Cache Allocation useless if done with wbinvd()

And we are not talking about a few micro seconds here. Assume that
CPU4-7 have cache allocated and it's mostly dirty. We've measured the
wbinvd() impact on RT, back then when the graphic folks used it as a
big hammer. The maximum latency spike was way above one millisecond.

We have similar issues with TLB flushing, but there we 

  - are tracking where it was used and never flush on innocent cpus

  - one can design his application in a way that it uses different
    processes so cross CPU flushing does not happen

I know that this is not an easy problem to solve, but you should be
aware that various application scenarios are going to be massively
unhappy about that.

Thanks,

	tglx







--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ