linux-kernel - Re: [PATCH 4/5] kernel/watchdog: Adapt the watchdog

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YhygkafOHc6eeP9f@alley>
Date:   Mon, 28 Feb 2022 11:14:41 +0100
From:   Petr Mladek <pmladek@...e.com>
To:     Lecopzer Chen <lecopzer.chen@...iatek.com>
Cc:     acme@...nel.org, akpm@...ux-foundation.org,
        alexander.shishkin@...ux.intel.com, catalin.marinas@....com,
        davem@...emloft.net, jolsa@...hat.com, jthierry@...hat.com,
        keescook@...omium.org, kernelfans@...il.com,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        linux-mediatek@...ts.infradead.org,
        linux-perf-users@...r.kernel.org, mark.rutland@....com,
        masahiroy@...nel.org, matthias.bgg@...il.com, maz@...nel.org,
        mcgrof@...nel.org, mingo@...hat.com, namhyung@...nel.org,
        nixiaoming@...wei.com, peterz@...radead.org,
        sparclinux@...r.kernel.org, sumit.garg@...aro.org,
        wangqing@...o.com, will@...nel.org, yj.chiang@...iatek.com
Subject: Re: [PATCH 4/5] kernel/watchdog: Adapt the watchdog_hld interface
 for async model

On Sat 2022-02-26 18:52:29, Lecopzer Chen wrote:
> > On Sat 2022-02-12 18:43:48, Lecopzer Chen wrote:
> > > From: Pingfan Liu <kernelfans@...il.com>
> > > 
> > > from: Pingfan Liu <kernelfans@...il.com>
> > > 
> > > When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
> > > yet. E.g. on arm64, PMU is not ready until
> > > device_initcall(armv8_pmu_driver_init).  And it is deeply integrated
> > > with the driver model and cpuhp. Hence it is hard to push this
> > > initialization before smp_init().
> > > 
> > > But it is easy to take an opposite approach by enabling watchdog_hld to
> > > get the capability of PMU async.
> > > 
> > > The async model is achieved by expanding watchdog_nmi_probe() with
> > > -EBUSY, and a re-initializing work_struct which waits on a wait_queue_head.
> > > 
> > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> > > index b71d434cf648..fa8490cfeef8 100644
> > > --- a/kernel/watchdog.c
> > > +++ b/kernel/watchdog.c
> > > @@ -839,16 +843,64 @@ static void __init watchdog_sysctl_init(void)
> > >  #define watchdog_sysctl_init() do { } while (0)
> > >  #endif /* CONFIG_SYSCTL */
> > >  
> > > +static void lockup_detector_delay_init(struct work_struct *work);
> > > +enum hld_detector_state detector_delay_init_state __initdata;
> > 
> > I would call this "lockup_detector_init_state" to use the same
> > naming scheme everywhere.
> > 
> > > +
> > > +struct wait_queue_head hld_detector_wait __initdata =
> > > +		__WAIT_QUEUE_HEAD_INITIALIZER(hld_detector_wait);
> > > +
> > > +static struct work_struct detector_work __initdata =
> > 
> > I would call this "lockup_detector_work" to use the same naming scheme
> > everywhere.
> 
> For the naming part, I'll revise both of them in next patch.
> 
> > 
> > > +		__WORK_INITIALIZER(detector_work, lockup_detector_delay_init);
> > > +
> > > +static void __init lockup_detector_delay_init(struct work_struct *work)
> > > +{
> > > +	int ret;
> > > +
> > > +	wait_event(hld_detector_wait,
> > > +			detector_delay_init_state == DELAY_INIT_READY);
> > 
> > DELAY_INIT_READY is defined in the 5th patch.
> > 
> > There are many other build errors because this patch uses something
> > that is defined in the 5th patch.
> 
> Thanks for pointing this out, the I'll fix 4th and 5th patches to correct the order.
> 
> > 
> > > +	ret = watchdog_nmi_probe();
> > > +	if (!ret) {
> > > +		nmi_watchdog_available = true;
> > > +		lockup_detector_setup();
> > > +	} else {
> > > +		WARN_ON(ret == -EBUSY);
> > 
> > Why WARN_ON(), please?
> > 
> > Note that it might cause panic() when "panic_on_warn" command line
> > parameter is used.
> > 
> > Also the backtrace will not help much. The context is well known.
> > This code is called from a workqueue worker.
>  
> The motivation to WARN should be:
> 
> lockup_detector_init
> -> watchdog_nmi_probe return -EBUSY
> -> lockup_detector_delay_init checks (detector_delay_init_state == DELAY_INIT_READY)
> -> watchdog_nmi_probe checks
> +	if (detector_delay_init_state != DELAY_INIT_READY)
> +		return -EBUSY;
> 
> Since we first check detector_delay_init_state equals to DELAY_INIT_READY
> and goes into watchdog_nmi_probe() and checks detector_delay_init_state again
> becasue now we move from common part to arch part code.
> In this condition, there shouldn't have any racing to detector_delay_init_state.
> If it does happend an unknown racing, then shows a warning to it.

There should not be any race.

     wait_event(hld_detector_wait,
		detector_delay_init_state == DELAY_INIT_READY);

waits until it is waken by lockup_detector_check(). Well, it could
wait forewer when lockup_detector_check() is caller earlier, see below.


> I think it make sense to remove WARN now becasue it looks verbosely...
> However, I would rather change the following printk to
> "Delayed init for lockup detector failed."

I would print both messages. The above message says what failed.


> > > +		pr_info("Perf NMI watchdog permanently disabled\n");

And this message explains what is the result of the above failure.
It is not obvious.

> > > +	}
> > > +}
> > > +
> > > +/* Ensure the check is called after the initialization of PMU driver */
> > > +static int __init lockup_detector_check(void)
> > > +{
> > > +	if (detector_delay_init_state < DELAY_INIT_WAIT)
> > > +		return 0;
> > > +
> > > +	if (WARN_ON(detector_delay_init_state == DELAY_INIT_WAIT)) {
> > 
> > Again. Is WARN_ON() needed?
> > 
> > Also the condition looks wrong. IMHO, this is the expected state.
> > 
> 
> This does expected DELAY_INIT_READY here, which means,
> every one who comes here to be checked should be READY and WARN if you're
> still in WAIT state, and which means the previous lockup_detector_delay_init()
> failed.

No, DELAY_INIT_READY is set below. DELAY_INIT_WAIT is valid value here.
It means that lockup_detector_delay_init() work is queued.


> IMO, either keeping or removing WARN is fine with me.
> 
> I think I'll remove WARN and add
> pr_info("Delayed init checking for lockup detector failed, retry for once.");
> inside the `if (detector_delay_init_state == DELAY_INIT_WAIT)`
> 
> Or would you have any other suggestion? thanks.
> 
> > > +		detector_delay_init_state = DELAY_INIT_READY;
> > > +		wake_up(&hld_detector_wait);

I see another problem now. We should always call the wake up here
when the work was queued. Otherwise, the worker will stay blocked
forewer.

The worker will also get blocked when the late_initcall is called
before the work is proceed by a worker.

> > > +	}
> > > +	flush_work(&detector_work);
> > > +	return 0;
> > > +}
> > > +late_initcall_sync(lockup_detector_check);


OK, I think that the three states are too complicated. I suggest to
use only a single bool. Something like:

static bool lockup_detector_pending_init __initdata;

struct wait_queue_head lockup_detector_wait __initdata =
		__WAIT_QUEUE_HEAD_INITIALIZER(lockup_detector_wait);

static struct work_struct detector_work __initdata =
		__WORK_INITIALIZER(lockup_detector_work,
				   lockup_detector_delay_init);

static void __init lockup_detector_delay_init(struct work_struct *work)
{
	int ret;

	wait_event(lockup_detector_wait, lockup_detector_pending_init == false);

	ret = watchdog_nmi_probe();
	if (ret) {
		pr_info("Delayed init of the lockup detector failed: %\n);
		pr_info("Perf NMI watchdog permanently disabled\n");
		return;
	}

	nmi_watchdog_available = true;
	lockup_detector_setup();
}

/* Trigger delayedEnsure the check is called after the initialization of PMU driver */
static int __init lockup_detector_check(void)
{
	if (!lockup_detector_pending_init)
		return;

	lockup_detector_pending_init = false;
	wake_up(&lockup_detector_wait);
	return 0;
}
late_initcall_sync(lockup_detector_check);

void __init lockup_detector_init(void)
{
	int ret;

	if (tick_nohz_full_enabled())
		pr_info("Disabling watchdog on nohz_full cores by default\n");

	cpumask_copy(&watchdog_cpumask,
		     housekeeping_cpumask(HK_FLAG_TIMER));

	ret = watchdog_nmi_probe();
	if (!ret)
		nmi_watchdog_available = true;
	else if (ret == -EBUSY) {
		detector_delay_pending_init = true;
		/* Init must be done in a process context on a bound CPU. */
		queue_work_on(smp_processor_id(), system_wq, 
				  &lockup_detector_work);
	}

	lockup_detector_setup();
	watchdog_sysctl_init();
}

The result is that lockup_detector_work() will never stay blocked
forever. There are two possibilities:

1.  lockup_detector_work() called before lockup_detector_check().
    In this case, wait_event() will wait until lockup_detector_check()
    clears detector_delay_pending_init and calls wake_up().

2. lockup_detector_check() called before lockup_detector_work().
   In this case, wait_even() will immediately continue because
   it will see cleared detector_delay_pending_init.


Best Regards,
Petr