lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4014fe97-5875-f64a-7b68-854a2b08394e@redhat.com>
Date:   Sun, 4 Apr 2021 21:27:00 -0400
From:   Waiman Long <longman@...hat.com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Bharata B Rao <bharata@...ux.vnet.ibm.com>,
        Phil Auld <pauld@...hat.com>,
        Daniel Thompson <daniel.thompson@...aro.org>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] sched/debug: Use sched_debug_lock to serialize use of
 cgroup_path[] only

On 4/4/21 12:02 PM, Steven Rostedt wrote:
> On Fri, 2 Apr 2021 23:09:09 -0400
> Waiman Long <longman@...hat.com> wrote:
>
>> The main problem with sched_debug_lock is that under certain
>> circumstances, a lock waiter may wait a long time to acquire the lock
>> (in seconds). We can't insert touch_nmi_watchdog() while the cpu is
>> waiting for the spinlock.
> The problem I have with the patch is that it seems to be a hack (as it
> doesn't fix the issue in all cases). Since sched_debug_lock is
> "special", perhaps we can add wrappers to take it, and instead of doing
> the spin_lock_irqsave(), do a trylock loop. Add lockdep annotation to
> tell lockdep that this is not a try lock (so that it can still detect
> deadlocks).
>
> Then have the strategically placed touch_nmi_watchdog() also increment
> a counter. Then in that trylock loop, if it sees the counter get
> incremented, it knows that forward progress is being made by the lock
> holder, and it too can call touch_nmi_watchdog().

Thanks for the suggestion, but it also sound complicated.

I think we can fix this lockup problem if we are willing to lose some 
information in case of contention. As you have suggested, a trylock will 
be used to acquire sched_debug_lock. If succeeded, all is good. 
Otherwise, a shorter stack buffer will be used for cgroup path. The path 
may be truncated in this case. If we detect that the full length of the 
buffer is used, we assume truncation and add, e.g. "...", to indicate 
there is more to the actual path.

Do you think this is an acceptable comprise?

Cheers,
Longman

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ