[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrXutDjE5Z6WR+47MJvp3xt4n=EGiF_oEYm88vGvCqUgHA@mail.gmail.com>
Date: Tue, 25 Aug 2020 16:41:36 -0700
From: Andy Lutomirski <luto@...nel.org>
To: Alexander Graf <graf@...zon.com>
Cc: Thomas Gleixner <tglx@...utronix.de>,
LKML <linux-kernel@...r.kernel.org>,
Andy Lutomirski <luto@...nel.org>,
Andrew Cooper <andrew.cooper3@...rix.com>,
X86 ML <x86@...nel.org>,
"Paul E. McKenney" <paulmck@...nel.org>,
Alexandre Chartre <alexandre.chartre@...cle.com>,
Frederic Weisbecker <frederic@...nel.org>,
Paolo Bonzini <pbonzini@...hat.com>,
Sean Christopherson <sean.j.christopherson@...el.com>,
Masami Hiramatsu <mhiramat@...nel.org>,
Petr Mladek <pmladek@...e.com>,
Steven Rostedt <rostedt@...dmis.org>,
Joel Fernandes <joel@...lfernandes.org>,
Boris Ostrovsky <boris.ostrovsky@...cle.com>,
Juergen Gross <jgross@...e.com>,
Brian Gerst <brgerst@...il.com>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Josh Poimboeuf <jpoimboe@...hat.com>,
Will Deacon <will@...nel.org>,
Tom Lendacky <thomas.lendacky@....com>,
Wei Liu <wei.liu@...nel.org>,
Michael Kelley <mikelley@...rosoft.com>,
Jason Chen CJ <jason.cj.chen@...el.com>,
Zhao Yakui <yakui.zhao@...el.com>,
"Peter Zijlstra (Intel)" <peterz@...radead.org>,
Avi Kivity <avi@...lladb.com>,
"Herrenschmidt, Benjamin" <benh@...zon.com>, robketr@...zon.de,
amos@...lladb.com
Subject: Re: [patch V9 21/39] x86/irq: Convey vector as argument and not in ptregs
On Tue, Aug 25, 2020 at 4:18 PM Alexander Graf <graf@...zon.com> wrote:
>
> Hi Thomas,
>
> On 25.08.20 12:28, Thomas Gleixner wrote:
> >void irq_complete_move(struct irq_cfg *cfg)
{
__irq_complete_move(cfg, ~get_irq_regs()->orig_ax);
}
> > Alex,
> >
> > On Mon, Aug 24 2020 at 19:29, Alexander Graf wrote:
> >> I'm currently trying to understand a performance regression with
> >> ScyllaDB on i3en.3xlarge (KVM based VM on Skylake) which we reliably
> >> bisected down to this commit:
> >>
> >> https://github.com/scylladb/scylla/issues/7036
> >>
> >> What we're seeing is that syscalls such as membarrier() take forever
> >> (0-10 µs would be normal):
> > ...
> >> That again seems to stem from a vastly slowed down
> >> smp_call_function_many_cond():
> >>
> >> Samples: 218K of event 'cpu-clock', 4000 Hz
> >> Overhead Shared Object Symbol
> >> 94.51% [kernel] [k] smp_call_function_many_cond
> >> 0.76% [kernel] [k] __do_softirq
> >> 0.32% [kernel] [k] native_queued_spin_lock_slowpath
> >> [...]
> >>
> >> which is stuck in
> >>
> >> │ csd_lock_wait():
> >> │ smp_cond_load_acquire(&csd->flags, !(VAL &
> >> 0.00 │ mov 0x8(%rcx),%edx
> >> 0.00 │ and $0x1,%edx
> >> │ ↓ je 2b9
> >> │ rep_nop():
> >> 0.70 │2af: pause
> >> │ csd_lock_wait():
> >> 92.82 │ mov 0x8(%rcx),%edx
> >> 6.48 │ and $0x1,%edx
> >> 0.00 │ ↑ jne 2af
> >> 0.00 │2b9: ↑ jmp 282
> >>
> >>
> >> Given the patch at hand I was expecting lost IPIs, but I can't quite see
> >> anything getting lost.
> >
> > I have no idea how that patch should be related to IPI and smp function
> > calls. It's changing the way how regular device interrupts and their
> > spurious counterpart are handled and not the way how IPIs are
> > handled. They are handled by direct vectors and do not go through
> > do_IRQ() at all.
> >
> > Aside of that the commit just changes the way how the interrupt vector
> > of a regular device interrupt is stored and conveyed. The extra read and
> > write on the cache hot stack is hardly related to anything IPI.
>
> I am as puzzled as you are, but the bisect is very clear: 79b9c183021e
> works fast and 633260fa1 (as well as mainline) shows the weird behavior
> above.
>
> It gets even better. This small (demonstrative only, mangled) patch on
> top of 633260fa1 also resolves the performance issue:
>
> diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
> index c766936..7e91e9a 100644
> --- a/arch/x86/kernel/irq.c
> +++ b/arch/x86/kernel/irq.c
> @@ -239,6 +239,7 @@ __visible void __irq_entry do_IRQ(struct pt_regs
> *regs, unsigned long vector)
> * lower 8 bits.
> */
> vector &= 0xFF;
> + regs->orig_ax = ~vector;
>
> /* entering_irq() tells RCU that we're not quiescent. Check it. */
> RCU_LOCKDEP_WARN(!rcu_is_watching(), "IRQ failed to wake up RCU");
>
>
> To me that sounds like some irq exit code somewhere must still be
> looking at orig_ax to decide on something - and that something is wrong
> now that we removed the negation of the vector. It also seems to have an
> impact on remote function calls.
>
> I'll have a deeper look tomorrow again if I can find any such place, but
> I wouldn't mind if anyone could point me into the right direction
> earlier :).
How about this:
void irq_complete_move(struct irq_cfg *cfg)
{
__irq_complete_move(cfg, ~get_irq_regs()->orig_ax);
}
in arch/x86/kernel/apic/vector.c.
Powered by blists - more mailing lists