lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210417235136.GD5006@paulmck-ThinkPad-P17-Gen-1>
Date:   Sat, 17 Apr 2021 16:51:36 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     linux-kernel@...r.kernel.org, john.stultz@...aro.org,
        sboyd@...nel.org, corbet@....net, Mark.Rutland@....com,
        maz@...nel.org, kernel-team@...com, neeraju@...eaurora.org,
        ak@...ux.intel.com, Chris Mason <clm@...com>
Subject: Re: [PATCH v8 clocksource 3/5] clocksource: Check per-CPU clock
 synchronization when marked unstable

On Sat, Apr 17, 2021 at 02:47:18PM +0200, Thomas Gleixner wrote:
> On Tue, Apr 13 2021 at 21:36, Paul E. McKenney wrote:
> 
> Bah, hit send too quick.
> 
> > +	cpumask_clear(&cpus_ahead);
> > +	cpumask_clear(&cpus_behind);
> > +	preempt_disable();
> 
> Daft. 

Would migrate_disable() be better?

Yes, I know, in virtual environments, the hypervisor can migrate anyway,
but this does limit the potential damage to one out of the two schedulers.

> > +	testcpu = smp_processor_id();
> > +	pr_warn("Checking clocksource %s synchronization from CPU %d.\n", cs->name, testcpu);
> > +	for_each_online_cpu(cpu) {
> > +		if (cpu == testcpu)
> > +			continue;
> > +		csnow_begin = cs->read(cs);
> > +		smp_call_function_single(cpu, clocksource_verify_one_cpu, cs, 1);
> > +		csnow_end = cs->read(cs);
> 
> As this must run with interrupts enabled, that's a pretty rough
> approximation like measuring wind speed with a wet thumb.
> 
> Wouldn't it be smarter to let the remote CPU do the watchdog dance and
> take that result? i.e. split out more of the watchdog code so that you
> can get the nanoseconds delta on that remote CPU to the watchdog.

First, an interrupt, NMI, SMI, vCPU preemption, or whatever could
not cause a false positive.  A false negative, perhaps, but no
false positives.  Second, in normal operation, these are rare, so that
hitting the (eventual) default of eight CPUs is very likely to result in
tight bounds on the delay-based error for most of those CPUs.  Third,
we really need to compare the TSC on one CPU to the TSC on the other
in order to have a very clear indication of a problem, should a real
TSC-synchronization issue arise.  In contrast, comparisons against the
watchdog timer will be more complicated and any errors detected will be
quite hard to prove to be due to TSC issues.

Or am I once again missing something?

> > +		delta = (s64)((csnow_mid - csnow_begin) & cs->mask);
> > +		if (delta < 0)
> > +			cpumask_set_cpu(cpu, &cpus_behind);
> > +		delta = (csnow_end - csnow_mid) & cs->mask;
> > +		if (delta < 0)
> > +			cpumask_set_cpu(cpu, &cpus_ahead);
> > +		delta = clocksource_delta(csnow_end, csnow_begin, cs->mask);
> > +		cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
> 
> > +		if (firsttime || cs_nsec > cs_nsec_max)
> > +			cs_nsec_max = cs_nsec;
> > +		if (firsttime || cs_nsec < cs_nsec_min)
> > +			cs_nsec_min = cs_nsec;
> > +		firsttime = 0;
> 
>   int64_t cs_nsec_max = 0, cs_nsec_min = LLONG_MAX;
> 
> and then the firsttime muck is not needed at all.

Good point, will fix!

And again, thank you for looking all of this over.

							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ