lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 2 Sep 2015 07:23:15 -0700
From:	Dave Hansen <dave.hansen@...ux.intel.com>
To:	Boaz Harrosh <boaz@...xistor.com>,
	Dave Chinner <david@...morbit.com>,
	Ross Zwisler <ross.zwisler@...ux.intel.com>,
	Christoph Hellwig <hch@....de>, linux-kernel@...r.kernel.org,
	Alexander Viro <viro@...iv.linux.org.uk>,
	Andrew Morton <akpm@...l.org>,
	"H. Peter Anvin" <hpa@...or.com>, Hugh Dickins <hughd@...gle.com>,
	Ingo Molnar <mingo@...hat.com>,
	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
	linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
	linux-nvdimm@...ts.01.org, Matthew Wilcox <willy@...ux.intel.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>, x86@...nel.org
Subject: Re: [PATCH] dax, pmem: add support for msync

On 09/02/2015 03:27 AM, Boaz Harrosh wrote:
>> > Yet you're ignoring the fact that flushing the entire range of the
>> > relevant VMAs may not be very efficient. It may be a very
>> > large mapping with only a few pages that need flushing from the
>> > cache, but you still iterate the mappings flushing GB ranges from
>> > the cache at a time.
>> > 
> So actually you are wrong about this. We have a working system and as part
> of our testing rig we do performance measurements, constantly. Our random
> mmap 4k writes test preforms very well and is in par with the random-direct-write
> implementation even though on every unmap, we do a VMA->start/end cl_flushing.
> 
> The cl_flush operation is a no-op if the cacheline is not dirty and is a
> memory bus storm with all the CLs that are dirty. So the only cost
> is the iteration of vma->start-to-vma->end i+=64

I'd be curious what the cost is in practice.  Do you have any actual
numbers of the cost of doing it this way?

Even if the instruction is a "noop", I'd really expect the overhead to
really add up for a tens-of-gigabytes mapping, no matter how much the
CPU optimizes it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ