linux-kernel - Re: [External] Re: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in _

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <688eadd7-4ca3-3e32-3520-25977ff059a6@bytedance.com>
Date:   Mon, 28 Sep 2020 18:04:23 +0800
From:   Chengming Zhou <zhouchengming@...edance.com>
To:     Peter Zijlstra <peterz@...radead.org>,
        maarten.lankhorst@...ux.intel.com, mripard@...nel.org,
        tzimmermann@...e.de, airlied@...ux.ie, daniel@...ll.ch,
        dri-devel@...ts.freedesktop.org
Cc:     linux-kernel@...r.kernel.org, pmladek@...e.com,
        sergey.senozhatsky@...il.com, rostedt@...dmis.org,
        mingo@...hat.com, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        bsegall@...gle.com, mgorman@...e.de, songmuchun@...edance.com
Subject: Re: [External] Re: [PATCH 2/2] sched: mark
 PRINTK_DEFERRED_CONTEXT_MASK in __schedule()


在 2020/9/28 下午5:01, Peter Zijlstra 写道:
> On Mon, Sep 28, 2020 at 04:54:53PM +0800, Chengming Zhou wrote:
>> 在 2020/9/28 下午3:32, Peter Zijlstra 写道:
>>> On Mon, Sep 28, 2020 at 12:11:30AM +0800, Chengming Zhou wrote:
>>>> The WARN_ON/WARN_ON_ONCE with rq lock held in __schedule() should be
>>>> deferred by marking the PRINTK_DEFERRED_CONTEXT_MASK, or will cause
>>>> deadlock on rq lock in the printk path.
>>> It also shouldn't happen in the first place, so who bloody cares.
>> Yes, but if our box deadlock just because a WARN_ON_ONCE, we have to
>> reboot : (
> You have to reboot anyway to get into the fixed kernel.

Mostly, WARN_ON_ONCE happened in the perf code on our machines, we actually

don't care too much about the perf function works : )   Looks like we
have to find and

fix that perf bug before go on...

>> So these WARN_ON_ONCE have BUG_ON effect ?  Or we should change to use
>> BUG_ON ?
> Or use a sane printk implementation, I never suffer this.

Well, you are lucky. So it's a problem in our printk implementation.

The deadlock path is:

printk

  vprintk_emit

    console_unlock

      vt_console_print

        hide_cursor

          bit_cursor

            soft_cursor

              queue_work_on

                __queue_work

                  try_to_wake_up

                    _raw_spin_lock

                      native_queued_spin_lock_slowpath

Looks like it's introduced by this commit:

eaa434defaca1781fb2932c685289b610aeb8b4b

"drm/fb-helper: Add fb_deferred_io support"

    This adds deferred io support to drm_fb_helper.
    The fbdev framebuffer changes are flushed using the callback
    (struct drm_framebuffer *)->funcs->dirty() by a dedicated worker
    ensuring that it always runs in process context.
   
    For those wondering why we need to be able to handle atomic calling
    contexts: Both panic paths and cursor code and fbcon blanking can run
    from atomic. See
   
    commit bcb39af4486be07e896fc374a2336bad3104ae0a
    Author: Dave Airlie <airlied@...hat.com>
    Date:   Thu Feb 7 11:19:15 2013 +1000
   
        drm/udl: make usage as a console safer
   
    for where this was originally discovered.