linux-kernel - Re: [PATCH v2] sched/debug: Use sched_debug_lock to serialize use of cgroup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <626d056e-1489-d406-62cc-4b981ff94175@redhat.com>
Date:   Tue, 30 Mar 2021 13:43:23 -0400
From:   Waiman Long <longman@...hat.com>
To:     Daniel Thompson <daniel.thompson@...aro.org>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Bharata B Rao <bharata@...ux.vnet.ibm.com>,
        Phil Auld <pauld@...hat.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] sched/debug: Use sched_debug_lock to serialize use of
 cgroup_path[] only

On 3/30/21 6:42 AM, Daniel Thompson wrote:
> On Mon, Mar 29, 2021 at 03:32:35PM -0400, Waiman Long wrote:
>> The handling of sysrq keys should normally be done in an user context
>> except when MAGIC_SYSRQ_SERIAL is set and the magic sequence is typed
>> in a serial console.
> This seems to be a poor summary of the typical calling context for
> handle_sysrq() except in the trivial case of using
> /proc/sysrq-trigger.
>
> For example on my system then the backtrace when I do sysrq-h on a USB
> keyboard shows us running from a softirq handler and with interrupts
> locked. Note also that the interrupt lock is present even on systems that
> handle keyboard input from a kthread due to the interrupt lock in
> report_input_key().
I will reword this part of the patch. I don't have a deep understanding 
of how the different way of keyword input work and thanks for showing me 
that there are other ways of getting keyboard input.
>
>> Currently in print_cpu() of kernel/sched/debug.c, sched_debug_lock is taken
>> with interrupt disabled for the whole duration of the calls to print_*_stats()
>> and print_rq() which could last for the quite some time if the information dump
>> happens on the serial console.
>>
>> If the system has many cpus and the sched_debug_lock is somehow busy
>> (e.g. parallel sysrq-t), the system may hit a hard lockup panic, like
> <snip>
>
>> The purpose of sched_debug_lock is to serialize the use of the global
>> cgroup_path[] buffer in print_cpu(). The rests of the printk() calls
>> don't need serialization from sched_debug_lock.
>>
>> Calling printk() with interrupt disabled can still be/proc/sysrq-trigger
>> problematic. Allocating a stack buffer of PATH_MAX bytes is not
>> feasible. So a compromised solution is used where a small stack buffer
>> is allocated for pathname. If the actual pathname is short enough, it
>> is copied to the stack buffer with sched_debug_lock release afterward
>> before printk().  Otherwise, the global group_path[] buffer will be
>> used with sched_debug_lock held until after printk().
> Does this actually fix the problem in any circumstance except when the
> sysrq is triggered using /proc/sysrq-trigger?

I have a reproducer that generates hard lockup panic when there are 
multiple instances of sysrq-t via /proc/sysrq-trigger. This is probably 
less a problem on console as I don't think we can do multiple 
simultaneous sysrq-t there. Anyway, my goal is to limit the amount of 
time that irq is disabled. Doing a printk can take a while depending on 
whether there are contention in the underlying locks or resources. Even 
if I limit the the critical sections to just those printk() that outputs 
cgroup path, I can still cause the panic.

Cheers,
Longman

The approach used by this patch should minimize the chance of a panic 
happening. However, if there are many tasks with very long cgroup paths, 
I suppose that panic may still happen under some extreme conditions. So 
I won't say this will completely fix the problem until the printk() 
rework that makes printk work more like printk_deferred() is merged.