[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHp75VdZF-Qi-9ahhXTLxdQqVb7wBJu7KqjD8xj6byVC5W+-yw@mail.gmail.com>
Date: Wed, 10 Apr 2024 06:59:30 +0300
From: Andy Shevchenko <andy.shevchenko@...il.com>
To: liu.yec@....com
Cc: daniel.thompson@...aro.org, gregkh@...uxfoundation.org,
dianders@...omium.org, jason.wessel@...driver.com, jirislaby@...nel.org,
kgdb-bugreport@...ts.sourceforge.net, linux-kernel@...r.kernel.org,
linux-serial@...r.kernel.org
Subject: Re: [PATCH V10] kdb: Fix the deadlock issue in KDB debugging.
On Wed, Apr 10, 2024 at 5:07 AM <liu.yec@....com> wrote:
>
> From: LiuYe <liu.yeC@....com>
>
> Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will
> attempt to use schedule_work() to provoke a keyboard reset when
> transitioning out of the debugger and back to normal operation.
> This can cause deadlock because schedule_work() is not NMI-safe.
>
> The stack trace below shows an example of the problem. In this
> case the master cpu is not running from NMI but it has parked
> the slave CPUs using an NMI and the parked CPUs is holding
> spinlocks needed by schedule_work().
>
> Example:
> BUG: spinlock lockup suspected on CPU#0. owner_cpu: 1
> CPU1: Call Trace:
> __schedule
> schedule
> schedule_hrtimeout_range_clock
> mutex_unlock
> ep_scan_ready_list
> schedule_hrtimeout_range
> ep_poll
> wake_up_q
> SyS_epoll_wait
> entry_SYSCALL_64_fastpath
>
> CPU0: Call Trace:
> dump_stack
> spin_dump
> do_raw_spin_lock
> _raw_spin_lock
> try_to_wake_up
> wake_up_process
> insert_work
> __queue_work
> queue_work_on
> kgdboc_post_exp_handler
> kgdb_cpu_enter
> kgdb_handle_exception
> __kgdb_notify
> kgdb_notify
> notifier_call_chain
> notify_die
> do_int3
> int3
>
> We fix the problem by using irq_work to call schedule_work()
> instead of calling it directly. This is because we cannot
> resynchronize the keyboard state from the hardirq context
> provided by irq_work. This must be done from the task context
> in order to call the input subsystem.
>
> Therefore, we have to defer the work twice. First, safely
> switch from the debug trap context (similar to NMI) to the
> hardirq, and then switch from the hardirq to the system work queue.
..
> Signed-off-by: Greg KH <gregkh@...uxfoundation.org>
> Signed-off-by: Andy Shevchenko <andy.shevchenko@...il.com>
> V9 -> V10 : Add Signed-off-by of Greg KH and Andy Shevchenko, Acked-by of Daniel Thompson
Huh?!
--
With Best Regards,
Andy Shevchenko
Powered by blists - more mailing lists