lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 24 Jun 2022 08:42:22 +0200
From:   Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To:     Waiman Long <longman@...hat.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>,
        Boqun Feng <boqun.feng@...il.com>,
        linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
        Juri Lelli <juri.lelli@...hat.com>,
        Mike Stowell <mstowell@...hat.com>
Subject: Re: [PATCH v2] locking/rtmutex: Limit # of lock stealing for non-RT
 waiters

On 2022-06-23 10:41:17 [-0400], Waiman Long wrote:
> 
> On 6/23/22 09:32, Sebastian Andrzej Siewior wrote:
> > Do you have more insight on how this was tested/ created? Based on that,
> > systemd and a random kworker waited on a lock for more than 10 minutes.
> 
> The hang happens when our QE team run thier kernel tier 1 test which, I
> think, lasts several hours. The hang happens in some runs but not all of
> them. So it is kind of opportunistic. Mike should be able to provide a
> better idea about frequency and so on.

So we talk here about 64+ CPU or more than that?

> > I added a trace-printk each time a non-RT waiter got the lock stolen,
> > kicked a kernel build and a package upgrade and took a look at the stats
> > an hour later:
> > - sh got its lock stolen 3416 times. I didn't lock the pid so I can't
> >    look back and check how long it waited since the first time.
> > - the median average of stolen locks is 173.
> Maybe we should also more lock stealing per waiter than the 10 that I used
> in the patch. I am open to suggestion to what is a good value to use.

I have no idea either. I just looked at a run to see what the number
actually are. I have no numbers in terms of performance. So what most
likely happens is that on an unlock operation the waiter gets a wake-up
but before he gets a chance to acquire the lock, it is already taken and
he goes back to sleep again. While this looks painful it might be better
performance wise because the other task was able to acquire the lock
without waiting. But then it is not fair and this happens.
One thing that I'm curious about is, what lock is it (one or two global
hot spots or many). And how to benchmark this…

> > > Fixes: 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock stealing")
> > > Reported-by: Mike Stowell <mstowell@...hat.com>
> > > Signed-off-by: Waiman Long <longman@...hat.com>
> 
> Thanks for your time looking at the patch.

no problem.

> Cheers,
> Longman

Sebastian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ