linux-kernel - Re: [PATCH] watchdog/hardlockup: Avoid large stack frames in watchdog_hardlockup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAD=FV=UFwBO-haMTFyZZbPqf-B0HHQ_aPqiaoVXPK-cQX3pnUg@mail.gmail.com>
Date:   Thu, 3 Aug 2023 16:10:24 -0700
From:   Doug Anderson <dianders@...omium.org>
To:     Michal Hocko <mhocko@...e.com>
Cc:     Petr Mladek <pmladek@...e.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        kernel test robot <lkp@...el.com>,
        Lecopzer Chen <lecopzer.chen@...iatek.com>,
        Pingfan Liu <kernelfans@...il.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] watchdog/hardlockup: Avoid large stack frames in watchdog_hardlockup_check()

Hi,

On Thu, Aug 3, 2023 at 1:30 AM Michal Hocko <mhocko@...e.com> wrote:
>
> On Thu 03-08-23 10:12:12, Petr Mladek wrote:
> > On Wed 2023-08-02 07:12:29, Doug Anderson wrote:
> > > Hi,
> > >
> > > On Wed, Aug 2, 2023 at 12:27 AM Michal Hocko <mhocko@...e.com> wrote:
> > > >
> > > > On Tue 01-08-23 08:41:49, Doug Anderson wrote:
> > > > [...]
> > > > > Ah, I see what you mean. The one issue I have with your solution is
> > > > > that the ordering of the stack crawls is less ideal in the "dump all"
> > > > > case when cpu != this_cpu. We really want to see the stack crawl of
> > > > > the locked up CPU first and _then_ see the stack crawls of other CPUs.
> > > > > With your solution the locked up CPU will be interspersed with all the
> > > > > others and will be harder to find in the output (you've got to match
> > > > > it up with the "Watchdog detected hard LOCKUP on cpu N" message).
> > > > > While that's probably not a huge deal, it's nicer to make the output
> > > > > easy to understand for someone trying to parse it...
> > > >
> > > > Is it worth to waste memory for this arguably nicer output? Identifying
> > > > the stack of the locked up CPU is trivial.
> > >
> > > I guess it's debatable, but as someone who has spent time staring at
> > > trawling through reports generated like this, I'd say "yes", it's
> > > super helpful in understanding the problem to have the hung CPU first.
> > > Putting the memory usage in perspective:
> >
> > nmi_trigger_cpumask_backtrace() has its own copy of the cpu mask.
> > What about changing the @exclude_self parameter to @exclude_cpu
> > and do:
> >
> >       if (exclude_cpu >= 0)
> >               cpumask_clear_cpu(exclude_cpu, to_cpumask(backtrace_mask));
> >
> >
> > It would require changing also arch_trigger_cpumask_backtrace() to
> >
> >       void arch_trigger_cpumask_backtrace(const struct cpumask *mask,
> >                                   int exclude_cpu);
> >
> > but it looks doable.
>
> Yes, but sparc is doing its own thing so it would require changing that
> as well. But this looks reasonable as well.

OK. I've tried a v3 with that:

https://lore.kernel.org/r/20230803160649.v3.2.I501ab68cb926ee33a7c87e063d207abf09b9943c@changeid

-Doug