lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.22.394.2106262136340.2111427@anisinha-lenovo>
Date:   Sat, 26 Jun 2021 21:48:40 +0530 (IST)
From:   Ani Sinha <ani@...sinha.ca>
To:     Thomas Gleixner <tglx@...utronix.de>
cc:     Ani Sinha <ani@...sinha.ca>, linux-kernel@...r.kernel.org,
        anirban.sinha@...ia.com, Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Frederic Weisbecker <fweisbec@...il.com>
Subject: Re: [PATCH v3] Add kernel logs when sched clock unstable and NO_HZ_FULL
 is not possible



On Tue, 22 Jun 2021, Thomas Gleixner wrote:

> On Sun, Jun 20 2021 at 15:43, Ani Sinha wrote:
>
> > Commit 4f49b90abb4aca ("sched-clock: Migrate to use new tick
> > dependency mask model") had also removed the kernel warning
> > message informing the user that it was not possible to turn
> > on NO_HZ_FULL. Adding back that log message here. It is
> > unhelpful when the kernel turns off NO_HZ_FULL silently
> > without informing anyone.
> > Also added a kernel log when sched clock is marked as unstable.
>
> Don't do two things at once. See Documentation/process/....

Ok I will split up this patch into two.

>
> Also your subject line want's a proper prefix.

OK will fix.

>
> > diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
> > index c2b2859ddd82..9f9fe658f8a5 100644
> > --- a/kernel/sched/clock.c
> > +++ b/kernel/sched/clock.c
> > @@ -192,8 +192,11 @@ void clear_sched_clock_stable(void)
> >
> >  	smp_mb(); /* matches sched_clock_init_late() */
> >
> > -	if (static_key_count(&sched_clock_running.key) == 2)
> > +	if (static_key_count(&sched_clock_running.key) == 2) {
> > +		WARN_ONCE(sched_clock_stable(),
> > +			  "sched clock is now marked unstable.");
>
> What's the WARN for here? That backtrace is largely uninteresting.

OK to be consistent with other parts of the code, I will replace this with
pr_warn()

>
> > -	if (can_stop_full_tick(cpu, ts))
> > +	if (can_stop_full_tick(cpu, ts)) {
> >  		tick_nohz_stop_sched_tick(ts, cpu);
> > -	else if (ts->tick_stopped)
> > -		tick_nohz_restart_sched_tick(ts, ktime_get());
> > +	} else {
> > +		/*
> > +		 * Don't allow the user to think they can get
> > +		 * full NO_HZ with this machine.
> > +		 */
> > +		WARN_ONCE(tick_nohz_full_running,
> > +			  "NO_HZ_FULL will not work for the current system.");
>
> can_stop_full_tick() returning false can be transient and then the user
> still has no idea _why_ this is printed.
>
> Also assume the user/admin starts perf and knows he's going to disturb
> NOHZ full, then _why_ would he be interested in that warning.
>

If NOHZ is disabled intentionally, that is not an interesting case. I am
worried about the situation where the user specifies NOHZ option in the
kernel commandline and the kernel silently disabled this because of one or
more limitations in the system. I want to address this. All the reasons
specified in the following commit is still true:

commit e12d0271774fea9fddf1e2a7952a0bffb2ee8e8b
Author: Steven Rostedt <rostedt@...dmis.org>
Date:   Fri May 10 17:12:28 2013 -0400

    nohz: Warn if the machine can not perform nohz_full

    If the user configures NO_HZ_FULL and defines nohz_full=XXX on the
    kernel command line, or enables NO_HZ_FULL_ALL, but nohz fails
    due to the machine having a unstable clock, warn about it.

    We do not want users thinking that they are getting the benefit
    of nohz when their machine can not support it.

    Signed-off-by: Steven Rostedt <rostedt@...dmis.org>
    Cc: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
    Cc: Ingo Molnar <mingo@...nel.org>
    Cc: Andrew Morton <akpm@...ux-foundation.org>
    Cc: Thomas Gleixner <tglx@...utronix.de>
    Cc: H. Peter Anvin <hpa@...or.com>
    Cc: Peter Zijlstra <peterz@...radead.org>
    Cc: Borislav Petkov <bp@...en8.de>
    Cc: Li Zhong <zhong@...ux.vnet.ibm.com>
    Signed-off-by: Frederic Weisbecker <fweisbec@...il.com>

This log was removed from the kernel in commit

commit 4f49b90abb4aca6fe677c95fc352fd0674d489bd
Author: Frederic Weisbecker <fweisbec@...il.com>
Date:   Wed Jul 22 17:03:52 2015 +0200

    sched-clock: Migrate to use new tick dependency mask model

    Instead of checking sched_clock_stable from the nohz subsystem to
verify
    its tick dependency, migrate it to the new mask in order to include it
    to the all-in-one check.


and it seems to me that the removal of the log defeats the purpose that
commit e12d0271774fea9fddf tried to serve.

Please enlighten me as to how to achieve this and I will cook up
something.

Ani

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ