lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211014104358.GA406368@lothringen>
Date:   Thu, 14 Oct 2021 12:43:58 +0200
From:   Frederic Weisbecker <frederic@...nel.org>
To:     "Paul E. McKenney" <paulmck@...nel.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Valentin Schneider <Valentin.Schneider@....com>,
        Peter Zijlstra <peterz@...radead.org>,
        Uladzislau Rezki <urezki@...il.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Boqun Feng <boqun.feng@...il.com>,
        Neeraj Upadhyay <neeraju@...eaurora.org>,
        Josh Triplett <josh@...htriplett.org>,
        Joel Fernandes <joel@...lfernandes.org>, rcu@...r.kernel.org
Subject: Re: [PATCH 00/11] rcu: Make rcu_core() safe in PREEMPT_RT with NOCB
 + a few other fixes v2

On Wed, Oct 13, 2021 at 09:27:33AM -0700, Paul E. McKenney wrote:
> On Wed, Oct 13, 2021 at 01:43:35PM +0200, Frederic Weisbecker wrote:
> > On Tue, Oct 12, 2021 at 08:28:32PM -0700, Paul E. McKenney wrote:
> > > On Tue, Oct 12, 2021 at 05:32:15PM -0700, Paul E. McKenney wrote:
> > > > On Mon, Oct 11, 2021 at 04:51:29PM +0200, Frederic Weisbecker wrote:
> > > > > Hi,
> > > > > 
> > > > > No code change in this v2, only changelogs:
> > > > > 
> > > > > * Add tags from Valentin and Sebastian
> > > > > 
> > > > > * Remove last reference to SEGCBLIST_SOFTIRQ_ONLY (thanks Valentin)
> > > > > 
> > > > > * Rewrite changelog for "rcu/nocb: Check a stable offloaded state to manipulate qlen_last_fqs_check"
> > > > >   after off-list debates with Paul.
> > > > > 
> > > > > * Remove the scenario with softirq interrupting rcuc on
> > > > >   "rcu/nocb: Limit number of softirq callbacks only on softirq" as it's
> > > > >   probably not possible (thanks Valentin).
> > > > > 
> > > > > * Remove the scenario with task spent scheduling out accounted on tlimit
> > > > >   as it's not possible (thanks Valentin)
> > > > >   (see "rcu: Apply callbacks processing time limit only on softirq")
> > > > > 
> > > > > * Fixed changelog of
> > > > >   "rcu/nocb: Don't invoke local rcu core on callback overload from nocb kthread"
> > > > >   (thanks Sebastian).
> > > > > 
> > > > > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> > > > > 	rcu/rt-v2
> > > > > 
> > > > > HEAD: 2c9349986d5f70a555195139665841cd98e9aba4
> > > > > 
> > > > > Thanks,
> > > > > 	Frederic
> > > > 
> > > > Nice!
> > > > 
> > > > I queued these for further review and testing.  I reworked the commit log
> > > > of 6/11 to give my idea of the reason, though I freely admit that this
> > > > reason is not as compelling as it no doubt seemed when I wrote that code.
> > > 
> > > But in initial tests TREE04.5, TREE04.6, and TREE04.9 all hit the
> > > WARN_ON(1) in rcu_torture_barrier(), which indicates rcu_barrier()
> > > breakage.  My best (but not so good) guess is a five-hour MTBF on a
> > > dual-socket system.
> > > 
> > > I started an automated "git bisect" with each step running 100 hours
> > > of TREE04, but I would be surprised if anything useful comes of it.
> > > Pleased, mind you, but surprised.
> > 
> > Ok I can reproduce.
> > 
> > I'm launching a bisect from my side as well.
> 
> Mine converged on 2a4200944750 ("rcu/nocb: Prepare state machine for
> a new step").  The surprise is that I was running "git bisect run"
> on a script wrappering kvm-remote.sh, which means that it managed to
> repeatedly request 10 systems, download to them, run the test, collect
> the results, and finally return the systems.
> 
> Huh.  I should probably refactor my local script to avoid the pointless
> repeated request/return work.
> 
> But which commit did your bisect find?  ;-)

So my bisection got confused with two different issues: one with an
oom and one with rcu_barrier() being unhappy.

I'm re-running it but I'll investigate both.

> 
> Anyway, I am keeping the first commit 4b246eab4750 ("rcu/nocb: Make
> local rcu_nocb_lock_irqsave() safe against concurrent deoffloading"),
> but dropping the others for the time being.

Fair enough!

Thanks.



> 							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ