lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 21 Sep 2023 09:42:47 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Waiman Long <longman@...hat.com>
Cc:     Ingo Molnar <mingo@...hat.com>, Will Deacon <will.deacon@....com>,
        Boqun Feng <boqun.feng@...il.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] locking/semaphore: Use wake_q to wake up processes
 outside lock critical section

On Fri, Sep 09, 2022 at 03:28:48PM -0400, Waiman Long wrote:
> It was found that a circular lock dependency can happen with the
> following locking sequence:
> 
>    +--> (console_sem).lock --> &p->pi_lock --> &rq->__lock --+
>    |                                                         |
>    +---------------------------------------------------------+
> 
> The &p->pi_lock --> &rq->__lock sequence is very common in all the
> task_rq_lock() calls.
> 
> The &rq->__lock --> (console_sem).lock sequence happens when the
> scheduler code calling printk() or more likely the various WARN*()
> macros while holding the rq lock. The (console_sem).lock is actually
> a raw spinlock guarding the semaphore. In the particular lockdep splat
> that I saw, it was caused by SCHED_WARN_ON() call in update_rq_clock().
> To work around this locking sequence, we may have to ban all WARN*()
> calls when the rq lock is held, which may be too restrictive, or we
> may have to add a WARN_DEFERRED() call and modify all the call sites
> to use it.

No, this is all because printk() is pure garbage -- but I believe it's
being worked on.

And I despise that whole deferred thing -- that's just worse garbage.

If you map printk to early_printk none of this is a problem (and this is
what i've been doing for something close to a decade).

Printk should not do synchronous, or in-context, printing to non-atomic
consoles. Doubly so when atomic console are actually available.

As long as it does this printk is fundamentally unreliable and any of
these hacks are just that.

> Even then, a deferred printk or WARN function may still call
> console_trylock() which may, in turn, calls up_console_sem() leading
> to this locking sequence.
> 
> The other ((console_sem).lock --> &p->pi_lock) locking sequence
> was caused by the fact that the semaphore up() function is calling
> wake_up_process() while holding the semaphore raw spinlock. This lockiing
> sequence can be easily eliminated by moving the wake_up_processs()
> call out of the raw spinlock critical section using wake_q which is
> what this patch implements. That is the easiest and the most certain
> way to break this circular locking sequence.

So I don't mind the patch, but I hate everything about your
justification for it.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ