linux-kernel - RE: [PATCH v2] futex: lower the lock contention on the HB lock during wake up

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BN3PR0301MB121902CFD361CA745A79930E835B0@BN3PR0301MB1219.namprd03.prod.outlook.com>
Date:	Wed, 16 Sep 2015 00:17:15 +0000
From:	Zhu Jefferry <Jefferry.Zhu@...escale.com>
To:	Thomas Gleixner <tglx@...utronix.de>
CC:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"bigeasy@...utronix.de" <bigeasy@...utronix.de>
Subject: RE: [PATCH v2] futex: lower the lock contention on the HB lock during
 wake up

Thanks for your detail guideline and explanations. Please see my questions in-line.

> -----Original Message-----
> From: Thomas Gleixner [mailto:tglx@...utronix.de]
> Sent: Wednesday, September 16, 2015 8:01 AM
> To: Zhu Shuangjun-R65879
> Cc: linux-kernel@...r.kernel.org; bigeasy@...utronix.de
> Subject: RE: [PATCH v2] futex: lower the lock contention on the HB lock
> during wake up
> 
> On Tue, 15 Sep 2015, Zhu Jefferry wrote:
> 
> Please configure your e-mail client proper and follow the basic rules:
> 
> - Choose a meaningful subject for your questions
> 
>   You just copied a random subject line from some other mail thread,
>   which makes your mail look like a patch. But it's not a patch. You
>   have a question about futexes and patches related to them.
> 
> - Make sure your lines are no longer than 72 to 76 characters in length.
> 
>   You can't assume that all e-mail and news clients behave like yours,
> and while yours might wrap lines automatically when the text reaches the
> right of the window containing it, not all do.
> 
>   For the sentence above I need a 190 character wide display ....
> 
> - Do not use colors or other gimmicks. They just make the mail
>   unreadable in simple text based readers.
> 
> > Just in the list, I see the patch "[PATCH v2] futex: lower the lock
> > contention on the HB lock during wake up" at
> > http://www.gossamer-
> threads.com/lists/linux/kernel/2199938?search_string=futex;#2199938.
> 
> > But I see another patch with same name, different content here,
> >
> > 23b7776290b10297fe2cae0fb5f166a4f2c68121(http://code.metager.de/source
> > /xref/linux/stable/kernel/futex.c?r=23b7776290b10297fe2cae0fb5f166a4f2
> > c68121)
> 
> I have no idea what that metager thing tells you and I really don't want
> to know. Plain git tells me:
> 
> # git show 23b7776290b10297fe2cae0fb5f166a4f2c68121
> Merge: 6bc4c3ad3619 6fab54101923
> Author: Linus Torvalds <torvalds@...ux-foundation.org
> Date:   Mon Jun 22 15:52:04 2015 -0700
> 
>     Merge branch 'sched-core-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> 
> So that's a merge commit where Linus pulled a pile of changes into the
> mainline kernel. And that merge does not contain the patch above, but it
> contains a different change to the futex code.
> 
> > Could you please help to give a little bit more explanation on this,
> > why they have same name with different modify in the futex.c? I'm a
> > newbie in the community.
> 
> Use the proper tools and not some random web interface. The commit you
> are looking for is a completely different one.
> 
> # git log kernel/futex.c
> ....
> commit 802ab58da74bb49ab348d2872190ef26ddc1a3e0
> Author: Sebastian Andrzej Siewior <bigeasy@...utronix.de
> Date:   Wed Jun 17 10:33:50 2015 +0200
> 
>     futex: Lower the lock contention on the HB lock during wake up ....
> 
> And that's the same as the one in the LKML thread plus a fixup.
> 
> > Actually, I encounter a customer issue which is related to the glibc
> > code "pthread_mutex_lock", which is using the futex service in kernel,
> > without the patches above.
> 
> The patches above are merily an optimization and completely unrelated to
> your problem.
> 
> You fail to provide the real interesting information here:
> 
>  - Which architecture/SoC
>  - Which kernel version and which extra patches
>  - Which glibc version and which extra patches
> 
> > After lots of customer discussing, ( I could not reproduce the failure
> > in my office), I seriously suspect there might be some particular
> > corner cases in the futex code.
> 
> The futex code is more or less a conglomorate of corner cases.
> 
> But again you fail to provide the real interesting information:
> 
>  - What is the actual failure ?
> 
> The information that you discussed that with your customer is completely
> irrelevant and your suspicion does not clarify the issue either.
> 
> > In the unlock flow, the user space code (pthread_mutex_unlock) will
> > check FUTEX_WAITERS flag first, then wakeup the waiters in the kernel
> > list. But in the lock flow, the kernel code (futex) will set
> > FUTEX_WAITERS in first too, then try to get the waiter from the list.
> > They are following same sequence, flag first, entry in list secondly.
> > But there might be some timing problem in SMP system, if the query
> > (unlock flow) is executing just before the list adding action (lock
> > flow).
> 
> There might be some timing problem, if the code would look like the
> scheme you painted below, but it does not.
> 
> > It might cause the mutex is never really released, and other threads
> > will infinite waiting. Could you please help to take a look at it?
> >
> > CPU 0 (trhead 0)                                CPU 1 (thread 1)
> >
> >  mutex_lock
> >  val = *futex;
> >  sys_futex(LOCK_PI, futex, val);
> >
> >  return to user space
> 
> If the futex is uncontended then you don't enter the kernel for acquiring
> the futex.
> 
> >  after acquire the lock                           mutex_lock
> >                                                   val = *futex;
> >                                                   sys_futex(LOCK_PI,
> > futex, val);
> 
> The futex FUTEX_LOCK_PI operation does not take the user space value.
> That's what FUTEX_WAIT does.
> 
> >
> lock(hash_bucket(futex));
> >                                                     set FUTEX_WAITERS
> flag
> >
> > unlock(hash_bucket(futex)) and retry due to page fault
> 
> So here you are completely off the track. If the 'set FUTEX_WAITERS bit'
> operation fails due to a page fault, then the FUTEX_WAITERS bit is not
> set. So it cannot be observed on the other core.
> 
> The flow is:
> 
> sys_futex(LOCK_PI, futex, ...)
> 
>  retry:
>   lock(hb(futex));
>   ret = set_waiter_bit(futex);
>   if (ret == -EFAULT) {
>     unlock(hb(futex));
>     handle_fault();
>     goto retry;
>   }
> 
>   list_add();
>   unlock(hb(futex));
>   schedule();
> 
> So when set_waiter_bit() succeeds, then the hash bucket lock is held and
> blocks the waker. So it's guaranteed that the waker will see the waiter
> on the list.
> 
> If set_waiter_bit() faults, then the waiter bit is not set and therefor
> there is nothing to wake. So the waker will not enter the kernel because
> the futex is uncontended.
> 

I assume your pseudo code set_waiter_bit is mapped to the real code "futex_lock_pi_atomic",
It's possible for futex_lock_pi_atomic to successfully set FUTEX_WAITERS bit, but return with
Page fault, for example, like fail in lookup_pi_state().


> So now, lets assume that the waiter failed to set the waiter bit and the
> waker unlocked the futex. When the waiter retries then it actually checks
> whether the futex still has an owner. So it observes the owner has been
> cleared, it acquires the futex and returns.
> 
> It's a bit more complex than that due to handling of the gazillion of
> corner cases, but that's the basic synchronization mechanism and there is
> no hidden timing issue on SMP.
> 
> Random speculation is not helping here.
> 
> Thanks,
> 
> 	tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/