linux-kernel - Re: Question on __torture_rt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <df999cba-a090-4461-8db6-7ddd788ddf85@paulmck-laptop>
Date:   Tue, 22 Aug 2023 10:46:23 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Joel Fernandes <joel@...lfernandes.org>
Cc:     linux-kernel@...r.kernel.org
Subject: Re: Question on __torture_rt_boost() else clause

On Tue, Aug 22, 2023 at 04:18:50PM +0000, Joel Fernandes wrote:
> Hi Paul,
> 
> On Mon, Aug 21, 2023 at 08:12:50PM -0700, Paul E. McKenney wrote:
> > Hello, Joel!
> > 
> > A quick question for you...
> > 
> > I am doing catch-up additions of locktorture module parameters
> > to kernel-parameters.txt, and came across rt_boost_factor.  The
> > multiplication by cxt.nrealwriters_stress in the !rt_task(current)
> > then-clause makes sense:  No matter how many writers you have, the
> > number of boost operations per unit time remains roughly constant.
> 
> > But I am having some difficulty rationalizing a similar multiplication
> > in the else-clause.  That would seem to leave boosting in effect for
> > longer times the more writers there were.
> 
> But the number of de-boost operations per-unit time should also remain a
> constant? I think you (or the original authors) wanted it to boost at every
> 50k ops at deboost at 500k ops originally.

The else-clause controls the boost duration.  So if I am understanding
the code correctly, the more writers there are, the longer each writer
stays boosted.  Which might be a good thing, but seemed strange.

> > Is that the intent?
> 
> The original change before my patch to make boosting possible for non-rtmutex
> types already had that multiplication, see below for diff from my patch. My
> patch just kept the same thing to make the logic consistent (i.e. deboost
> less often).

Ah, you are right, I should have told "git blame" to dig deeper.

But hey, you did touch the code at one point!  ;-)

> > Also, I am rationalizing the choice of 2 as default for rt_boost by
> > noting that "mutex" and "ww_mutex_lock" don't do boosting and that
> > preemption-disabling makes non-RT spinlocks immune from priority
> > inversion.  Is this what you had in mind, or am I off in the weeds here?
> 
> The 2 was just to make sure that we don't deboost as often as we boost, which
> is also what the old logic was trying to do.

This is a different "2".  The rt_boost=0 says never boost, rt_boost=1
says boost only if the lock in question supports priority boosting, and
rt_boost=2 (the default) says boost unconditionally, aside from lock
types that don't define cur_ops->task_boost.  Except that they all
define cur_ops->task_boost.

I am not seeing failures in my torture.sh testing, so maybe OK, but it
does seem a bit strange.

(And this probably predates your involvement as well, but so it goes!)

> What is the drawback of keeping the boost active for longer than not? It will
> trigger the PI-boosting (and in the future proxy exec) more often.

My concern is someone running this on a 1,000-CPU system.  Though locking
being what it is, there is a non-negligible possibility that something
else breaks first.

> Also by making the factor configurable, I allow it to control how often we
> boost and deboost. IIRC, it was boosting much less often before I did that.

No argument with the frequency of boosting, just curiosity about the
duration increasing with increasing numbers of CPUs.  I can rationalize
it, but then again, I can rationalize pretty much anything.  ;-)

> > I am putting my best guess in the patch, and am including you on CC.
> 
> Ok, thanks,

On the other hand, it looks like I can now reproduce a qspinlock hang
that happens maybe five to ten times a week across the entire fleet
in a few tens of minutes.  On my laptop.  ;-)

Now to start adding debug.  Which will affect the reproduction times,
but life is like that sometimes...

							Thanx, Paul

>  - Joel
> 
> 
> -static void torture_rtmutex_boost(struct torture_random_state *trsp)
> -{
> -       const unsigned int factor = 50000; /* yes, quite arbitrary */
> -
> -       if (!rt_task(current)) {
> -               /*
> -                * Boost priority once every ~50k operations. When the
> -                * task tries to take the lock, the rtmutex it will account
> -                * for the new priority, and do any corresponding pi-dance.
> -                */
> -               if (trsp && !(torture_random(trsp) %
> -                             (cxt.nrealwriters_stress * factor))) {
> -                       sched_set_fifo(current);
> -               } else /* common case, do nothing */
> -                       return;
> -       } else {
> -               /*
> -                * The task will remain boosted for another ~500k operations,
> -                * then restored back to its original prio, and so forth.
> -                *
> -                * When @trsp is nil, we want to force-reset the task for
> -                * stopping the kthread.
> -                */
> -               if (!trsp || !(torture_random(trsp) %
> -                              (cxt.nrealwriters_stress * factor * 2))) {
> -                       sched_set_normal(current, 0);
> -               } else /* common case, do nothing */
> -                       return;
> -       }
> -}
> -