lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240301105931.GB5795@aspen.lan>
Date: Fri, 1 Mar 2024 10:59:31 +0000
From: Daniel Thompson <daniel.thompson@...aro.org>
To: Liuye <liu.yeC@....com>
Cc: "jason.wessel@...driver.com" <jason.wessel@...driver.com>,
	"dianders@...omium.org" <dianders@...omium.org>,
	"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
	"jirislaby@...nel.org" <jirislaby@...nel.org>,
	"kgdb-bugreport@...ts.sourceforge.net" <kgdb-bugreport@...ts.sourceforge.net>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-serial@...r.kernel.org" <linux-serial@...r.kernel.org>
Subject: Re: 答复: [PATCH] kdb: Fix the
 deadlock issue in KDB debugging.

On Fri, Mar 01, 2024 at 03:30:25AM +0000, Liuye wrote:
> >On Wed, Feb 28, 2024 at 10:56:02AM +0800, LiuYe wrote:
> >> master cpu : After executing the go command, a deadlock occurs.
> >> slave cpu: may be performing thread migration, acquiring the
> >> running queue lock of master CPU.  Then it was interrupted by kdb
> >> NMI and entered the nmi_handler process.  (nmi_handle->
> >> kgdb_nmicallback-> kgdb_cpu_enter while(1){ touch wathcdog}.)
> >
> >I think this description is a little short and doesn't clearly
> >explain the cause. How about:
> >
> >Currently, if kgdboc includes 'kdb', then kgdboc will attempt to use
> >schedule_work() to provoke a keyboard reset when transitioning out of
> >the debugger and back to normal operation. This can cause deadlock
> >because schedule_work() is not NMI-safe.
> >
> >The stack trace below shows an example of the problem. In this case
> >the master cpu is not running from NMI but it has parked the slace
> >CPUs using an NMI and the parked CPUs is holding spinlocks needed by
> >schedule_work().
>
> Due to the brevity of the description, there may be some
> misunderstanding, so a detailed description is provided as follows:

So, there is a small mistake in the example description I provided.
After double checking the code it should start slightly differently:
  "Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will
  attempt to use schedule_work() ...".

However other than that I think it is correct.

The important bit of feedback here is that the patch description should
commence with a description of the bug rather than a description of the
symptom. In this case the bug is kgdboc calls a function that is not
safe to call from this calling context.

It is really useful to describe the symptom as part of the patch
description. However if we focus on the symptom without additional
code review then we can end up with the wrong fix. That is what
happened here. It is unsafe to call schedule_work() and checking
the runqueue locks is insufficient to make it safe because we are
still calling a function from an inappropriate calling context..


Daniel.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ