lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1b9b0482-9a6c-4182-ae87-9202f15725ca@quicinc.com>
Date: Wed, 11 Dec 2024 18:45:39 +0800
From: Zhongqiu Han <quic_zhonhan@...cinc.com>
To: Peter Zijlstra <peterz@...radead.org>
CC: <mingo@...hat.com>, <juri.lelli@...hat.com>, <vincent.guittot@...aro.org>,
        <dietmar.eggemann@....com>, <rostedt@...dmis.org>,
        <bsegall@...gle.com>, <mgorman@...e.de>, <vschneid@...hat.com>,
        <linux-kernel@...r.kernel.org>,
        Zhongqiu Han <quic_zhonhan@...cinc.com>
Subject: Re: [RFC PATCH] sched/core: Enhanced debug logs in do_task_dead()

On 12/10/2024 10:18 PM, Peter Zijlstra wrote:
> On Tue, Dec 10, 2024 at 09:45:13PM +0800, Zhongqiu Han wrote:
>> If BUG() is a NOP, dump the problematic stack for debugging purposes.
>>
>> Signed-off-by: Zhongqiu Han <quic_zhonhan@...cinc.com>
>> ---
>> If BUG() is a NOP, it should make sense for debugging purposes. However,
>> just arising the patch with RFC, because at least for now, I haven't found
>> a definition of BUG() as NOP in various architectures. Thanks~
> 
> Yeah, this don't make sense. If you want a stack-trace you shouldn't
> have killed BUG.
> 
> And yeah, having done a quick peek, I don't see how you can kill BUG
> these days other than explicitly modyfing your source, in which case you
> get to keep the pieces.

Hi Peter,
Thanks a lot for your review and reply!

My initial thought was that if BUG() is a NOP, we should dump the stack
for debugging purposes before executing cpu_relax(), because according
to the current do_task_dead() code comments, BUG() might be a NOP in
some cases.
void __noreturn do_task_dead(void)
{
         /* Causes final put_task_struct in finish_task_switch(): */
         set_special_state(TASK_DEAD);

         /* Tell freezer to ignore us: */
         current->flags |= PF_NOFREEZE;

         __schedule(SM_NONE);
         BUG();

         /* Avoid "noreturn function does return" - but don't continue if
BUG() is a NOP: */-------------------> This comments
         for (;;)
                 cpu_relax();
}

For example, in the code _before_ 7/4/2014 commit a4b5d580e078 ('bug:
Make BUG() always stop the machine')., BUG() is indeed a NOP if (
!CONFIG_BUG && !HAVE_ARCH_BUG)

  #else /* !CONFIG_BUG */
  #ifndef HAVE_ARCH_BUG
-#define BUG() do {} while (0)
+#define BUG() do {} while (1)
  #endif



However, since 10/4/2024 commit 5284984a4fba ('bug: Fix no-return
statement warning with !CONFIG_BUG'), with !CONFIG_BUG, the default
BUG() is implemented to call unreachable() to ensure it does not return,
and if all architecture-specific implementations of BUG() also do not
return, I now think my thought should be meaningless.

commit 5284984a4fba ('bug: Fix no-return-statement warning with 
!CONFIG_BUG')
  #else /* !CONFIG_BUG */
  #ifndef HAVE_ARCH_BUG
-#define BUG() do {} while (1)
+#define BUG() do {             \
+       do {} while (1);        \
+       unreachable();          \
+} while (0)
  #endif


By the way, do you think it is reasonable to remove the cpu_relax code
by evaluating whether BUG() does not return in all architectures and
configurations? Additionally, _if_ some architecture-specific
implementations of BUG() do not have the capability to output the thread
stack, should we consider using the panic() function as a replacement?

Thanks for your time.



-- 
Thx and BRs,
Zhongqiu Han

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ