lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFzLgPZoLKRK5rPk8hpCS=Y8CNh59K_tzEZEVKpt1VyBWg@mail.gmail.com>
Date:	Tue, 10 Feb 2015 12:49:46 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	"Wang, Yalin" <Yalin.Wang@...ymobile.com>,
	"viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"Gao, Neil" <Neil.Gao@...ymobile.com>
Subject: Re: [RFC V2] test_bit before clear files_struct bits

On Tue, Feb 10, 2015 at 12:22 PM, Andrew Morton
<akpm@...ux-foundation.org> wrote:
>
> The patch is good but I'm still wondering if any CPUs can do this
> speedup for us.  The CPU has to pull in the target word to modify the
> bit and what it *could* do is to avoid dirtying the cacheline if it
> sees that the bit is already in the desired state.

Sadly, no CPU I know of actually does this.  Probably because it would
take a bit more core resources, and conditional writes to memory are
not normally part of an x86 core (it might be more natural for
something like old-style ARM that has conditional writes).

Also, even if the store were to be conditional, the cacheline would
have been acquired in exclusive state, and in many cache protocols the
state machine is from exclusive to dirty (since normally the only
reason to get a cacheline for exclusive use is in order to write to
it). So a "read, test, conditional write" ends up actually being more
complicated than you'd think - because you *want* that
exclusive->dirty state for the case where you really are going to
change the bit, and to avoid extra cache protocol stages you don't
generally want to read the cacheline into a shared read mode first
(only to then have to turn it into exclusive/dirty as a second state)

So at least on current x86 (and for reasons above, likely in the
future, including other architectures with read-modify-write memory
access models), the default assumption is that the bit operations will
actually change the bit, and unlikely bit setting/clearing for
cachelines that are very likely to otherwise stay clean should
probably be conditionalized in software. Like in this patch.

                         Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ