linux-kernel - Re: [PATCH]: Fix SMP-reordering race in mark_buffer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0804030050290.15850@artax.karlin.mff.cuni.cz>
Date:	Thu, 3 Apr 2008 00:53:14 +0200 (CEST)
From:	Mikulas Patocka <mikulas@...ax.karlin.mff.cuni.cz>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
cc:	Andrew Morton <akpm@...ux-foundation.org>, viro@...iv.linux.org.uk,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH]: Fix SMP-reordering race in mark_buffer_dirty



On Wed, 2 Apr 2008, Linus Torvalds wrote:

> 
> 
> On Wed, 2 Apr 2008, Andrew Morton wrote:
> > 
> > But then the test-and-set of an already-set flag would newly cause the
> > cacheline to be dirtied, requiring additional bus usage to write it back?
> > 
> > The CPU's test-and-set-bit operation could of course optimise that away in
> > this case.  But does it?
> 
> No, afaik no current x86 uarch will optimize away the write on a locked 
> instuction if it turns out to be unnecessary. 

No, it doesn't. Try this:

#include <string.h>
#include <pthread.h>
void *pth(void *p)
{
        int i;
        for (i = 0; i < 100000000; i++)
                __asm__ volatile ("lock;btsl $0, %0"::"m"(*(int 
*)p):"cc");
        return NULL;
}
int args[2000];
int main(void)
{
        pthread_t t1, t2, t3, t4;
        memset(args, -1, sizeof args);
        pthread_create(&t1, NULL, pth, &args[0]);
        pthread_create(&t2, NULL, pth, &args[16]);
        pthread_create(&t3, NULL, pth, &args[32]);
        pthread_create(&t4, NULL, pth, &args[48]);
        pthread_join(t1, NULL);
        pthread_join(t2, NULL);
        pthread_join(t3, NULL);
        pthread_join(t4, NULL);
        return 0;
}

--- when the &args[] indices are in a conflicting cacheline, I get 9 times 
slower execution. I tried it on 2 double-core Core 2 Xeons.

Mikulas

> Can somebody find a timing reason to have the ugly code?
> 
> 		Linus
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/