linux-kernel - Re: [PATCH 1/3] sched: better handling for busy polling loops

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CABk29Ns8mhLUwSs+ZbREnx3yaX+xKwVeHf61OTe=5zNWxpmyag@mail.gmail.com>
Date:   Tue, 27 Oct 2020 16:58:08 -0700
From:   Josh Don <joshdon@...gle.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     g@...ez.programming.kicks-ass.net, Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Eric Dumazet <edumazet@...gle.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        netdev@...r.kernel.org, kvm@...r.kernel.org,
        Xi Wang <xii@...gle.com>
Subject: Re: [PATCH 1/3] sched: better handling for busy polling loops

On Fri, Oct 23, 2020 at 12:19 AM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Thu, Oct 22, 2020 at 08:29:42PM -0700, Josh Don wrote:
> > Busy polling loops in the kernel such as network socket poll and kvm
> > halt polling have performance problems related to process scheduler load
> > accounting.
>
> AFAICT you're not actually fixing the load accounting issue at all.

Halt polling is an ephemeral state, and the goal is not to adjust the
accounting, but rather to allow wake up balance to identify polling
cpus.

> > This change also disables preemption for the duration of the busy
> > polling loop. This is important, as it ensures that if a polling thread
> > decides to end its poll to relinquish cpu to another thread, the polling
> > thread will actually exit the busy loop and potentially block. When it
> > later becomes runnable, it will have the opportunity to find an idle cpu
> > via wakeup cpu selection.
>
> At the cost of inducing a sleep+wake cycle; which is mucho expensive. So
> this could go either way. No data presented.

I can take a look at getting some data on that. What I do have is data
that reflects an overall improvement in machine efficiency. On heavily
loaded hosts, we were able to recoup ~2% of cycles.

> > +void prepare_to_busy_poll(void)
> > +{
> > +     struct rq __maybe_unused *rq = this_rq();
> > +     unsigned long __maybe_unused flags;
> > +
> > +     /* Preemption will be reenabled by end_busy_poll() */
> > +     preempt_disable();
> > +
> > +#ifdef CONFIG_SMP
> > +     raw_spin_lock_irqsave(&rq->lock, flags);
> > +     /* preemption disabled; only one thread can poll at a time */
> > +     WARN_ON_ONCE(rq->busy_polling);
> > +     rq->busy_polling++;
> > +     raw_spin_unlock_irqrestore(&rq->lock, flags);
> > +#endif
>
> Explain to me the purpose of that rq->lock usage.

This was required in a previous iteration, but failed to drop after a refactor.

> > +int continue_busy_poll(void)
> > +{
> > +     if (!single_task_running())
> > +             return 0;
>
> Why? If there's more, we'll end up in the below condition anyway.
>

Migrating a task over won't necessarily set need_resched. Though it
does make sense to allow check_preempt_curr to set it directly in this
case.

> > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> > index 28709f6b0975..45de468d0ffb 100644
> > --- a/kernel/sched/sched.h
> > +++ b/kernel/sched/sched.h
> > @@ -1003,6 +1003,8 @@ struct rq {
> >
> >       /* This is used to determine avg_idle's max value */
> >       u64                     max_idle_balance_cost;
> > +
> > +     unsigned int            busy_polling;
>
> This is a good location, cache-wise, because?
>
> >  #endif /* CONFIG_SMP */
> >
> >  #ifdef CONFIG_IRQ_TIME_ACCOUNTING

Good call out, I didn't consider that.