lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <06163c13-03b2-bfe0-90b4-5267a039a02c@redhat.com>
Date:   Fri, 22 Sep 2023 14:45:04 -0400
From:   Waiman Long <longman@...hat.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Ingo Molnar <mingo@...hat.com>, Will Deacon <will.deacon@....com>,
        Boqun Feng <boqun.feng@...il.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] locking/semaphore: Use wake_q to wake up processes
 outside lock critical section

On 9/21/23 03:42, Peter Zijlstra wrote:
> On Fri, Sep 09, 2022 at 03:28:48PM -0400, Waiman Long wrote:
>> It was found that a circular lock dependency can happen with the
>> following locking sequence:
>>
>>     +--> (console_sem).lock --> &p->pi_lock --> &rq->__lock --+
>>     |                                                         |
>>     +---------------------------------------------------------+
>>
>> The &p->pi_lock --> &rq->__lock sequence is very common in all the
>> task_rq_lock() calls.
>>
>> The &rq->__lock --> (console_sem).lock sequence happens when the
>> scheduler code calling printk() or more likely the various WARN*()
>> macros while holding the rq lock. The (console_sem).lock is actually
>> a raw spinlock guarding the semaphore. In the particular lockdep splat
>> that I saw, it was caused by SCHED_WARN_ON() call in update_rq_clock().
>> To work around this locking sequence, we may have to ban all WARN*()
>> calls when the rq lock is held, which may be too restrictive, or we
>> may have to add a WARN_DEFERRED() call and modify all the call sites
>> to use it.
> No, this is all because printk() is pure garbage -- but I believe it's
> being worked on.
>
> And I despise that whole deferred thing -- that's just worse garbage.
>
> If you map printk to early_printk none of this is a problem (and this is
> what i've been doing for something close to a decade).
>
> Printk should not do synchronous, or in-context, printing to non-atomic
> consoles. Doubly so when atomic console are actually available.
>
> As long as it does this printk is fundamentally unreliable and any of
> these hacks are just that.

Thanks for the explanation.

I believe early_printk should only be used in the non-SMP boot process. 
The use of printk() is frequently used for debugging purpose and the 
insertion of printk at some lock critical section can cause the lockdep 
splat to come out obscuring the debugging process.

>
>> Even then, a deferred printk or WARN function may still call
>> console_trylock() which may, in turn, calls up_console_sem() leading
>> to this locking sequence.
>>
>> The other ((console_sem).lock --> &p->pi_lock) locking sequence
>> was caused by the fact that the semaphore up() function is calling
>> wake_up_process() while holding the semaphore raw spinlock. This lockiing
>> sequence can be easily eliminated by moving the wake_up_processs()
>> call out of the raw spinlock critical section using wake_q which is
>> what this patch implements. That is the easiest and the most certain
>> way to break this circular locking sequence.
> So I don't mind the patch, but I hate everything about your
> justification for it.
>
OK, I can reword the commit log and see you are OK with that.

Cheers,
Longman

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ