linux-kernel - Re: [PATCH v2] locking/semaphore: Use wake_q to wake up processes outside lock critical section

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <06163c13-03b2-bfe0-90b4-5267a039a02c@redhat.com>
Date:   Fri, 22 Sep 2023 14:45:04 -0400
From:   Waiman Long <longman@...hat.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Ingo Molnar <mingo@...hat.com>, Will Deacon <will.deacon@....com>,
        Boqun Feng <boqun.feng@...il.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] locking/semaphore: Use wake_q to wake up processes
 outside lock critical section

On 9/21/23 03:42, Peter Zijlstra wrote:
> On Fri, Sep 09, 2022 at 03:28:48PM -0400, Waiman Long wrote:
>> It was found that a circular lock dependency can happen with the
>> following locking sequence:
>>
>>     +--> (console_sem).lock --> &p->pi_lock --> &rq->__lock --+
>>     |                                                         |
>>     +---------------------------------------------------------+
>>
>> The &p->pi_lock --> &rq->__lock sequence is very common in all the
>> task_rq_lock() calls.
>>
>> The &rq->__lock --> (console_sem).lock sequence happens when the
>> scheduler code calling printk() or more likely the various WARN*()
>> macros while holding the rq lock. The (console_sem).lock is actually
>> a raw spinlock guarding the semaphore. In the particular lockdep splat
>> that I saw, it was caused by SCHED_WARN_ON() call in update_rq_clock().
>> To work around this locking sequence, we may have to ban all WARN*()
>> calls when the rq lock is held, which may be too restrictive, or we
>> may have to add a WARN_DEFERRED() call and modify all the call sites
>> to use it.
> No, this is all because printk() is pure garbage -- but I believe it's
> being worked on.
>
> And I despise that whole deferred thing -- that's just worse garbage.
>
> If you map printk to early_printk none of this is a problem (and this is
> what i've been doing for something close to a decade).
>
> Printk should not do synchronous, or in-context, printing to non-atomic
> consoles. Doubly so when atomic console are actually available.
>
> As long as it does this printk is fundamentally unreliable and any of
> these hacks are just that.

Thanks for the explanation.

I believe early_printk should only be used in the non-SMP boot process. 
The use of printk() is frequently used for debugging purpose and the 
insertion of printk at some lock critical section can cause the lockdep 
splat to come out obscuring the debugging process.

>
>> Even then, a deferred printk or WARN function may still call
>> console_trylock() which may, in turn, calls up_console_sem() leading
>> to this locking sequence.
>>
>> The other ((console_sem).lock --> &p->pi_lock) locking sequence
>> was caused by the fact that the semaphore up() function is calling
>> wake_up_process() while holding the semaphore raw spinlock. This lockiing
>> sequence can be easily eliminated by moving the wake_up_processs()
>> call out of the raw spinlock critical section using wake_q which is
>> what this patch implements. That is the easiest and the most certain
>> way to break this circular locking sequence.
> So I don't mind the patch, but I hate everything about your
> justification for it.
>
OK, I can reword the commit log and see you are OK with that.

Cheers,
Longman