[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZiLO9RUdMsNlCtI_@x1>
Date: Fri, 19 Apr 2024 17:07:17 -0300
From: Arnaldo Carvalho de Melo <acme@...nel.org>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Jacob Pan <jacob.jun.pan@...ux.intel.com>,
LKML <linux-kernel@...r.kernel.org>, X86 Kernel <x86@...nel.org>,
Peter Zijlstra <peterz@...radead.org>, iommu@...ts.linux.dev,
Lu Baolu <baolu.lu@...ux.intel.com>, kvm@...r.kernel.org,
Dave Hansen <dave.hansen@...el.com>, Joerg Roedel <joro@...tes.org>,
"H. Peter Anvin" <hpa@...or.com>, Borislav Petkov <bp@...en8.de>,
Ingo Molnar <mingo@...hat.com>, Paul Luse <paul.e.luse@...el.com>,
Dan Williams <dan.j.williams@...el.com>,
Jens Axboe <axboe@...nel.dk>, Raj Ashok <ashok.raj@...el.com>,
"Tian, Kevin" <kevin.tian@...el.com>, maz@...nel.org,
seanjc@...gle.com, Robin Murphy <robin.murphy@....com>,
jim.harris@...sung.com, a.manzanares@...sung.com,
Bjorn Helgaas <helgaas@...nel.org>, guang.zeng@...el.com,
robert.hoo.linux@...il.com, kan.liang@...el.com,
"Kleen, Andi" <andi.kleen@...el.com>
Subject: Re: [PATCH v2 05/13] x86/irq: Reserve a per CPU IDT vector for
posted MSIs
On Fri, Apr 19, 2024 at 06:00:24AM +0200, Thomas Gleixner wrote:
> On Mon, Apr 15 2024 at 13:43, Jacob Pan wrote:
> > On Mon, 15 Apr 2024 11:53:58 -0700, Jacob Pan <jacob.jun.pan@...ux.intel.com> wrote:
> >> On Thu, 11 Apr 2024 18:51:14 +0200, Thomas Gleixner <tglx@...utronix.de> wrote:
> >> > If we really care then we do it proper for _all_ of them. Something like
> >> > the uncompiled below. There is certainly a smarter way to do the build
> >> > thing, but my kbuild foo is rusty.
> >> I too had the concern of the wasting system vectors, but did not know how
> >> to fix it. But now your code below works well. Tested without KVM in
> >> .config to show the gaps:
> >>
> >> In VECTOR IRQ domain.
> >>
> >> BEFORE:
> >> System: 46: 0-31,50,235-236,244,246-255
> >>
> >> AFTER:
> >> System: 46: 0-31,50,241-242,245-255
> >>
> >> The only gap is MANAGED_IRQ_SHUTDOWN_VECTOR(243), which is expected on a
> >> running system.
> >>
> >> Verified in irqvectors.s: .ascii "->MANAGED_IRQ_SHUTDOWN_VECTOR $243
> >>
> >> POSTED MSI/first system vector moved up from 235 to 241 for this case.
> >>
> >> Will try to let tools/arch/x86/include/asm/irq_vectors.h also use it
> >> instead of manually copy over each time. Any suggestions greatly
> >> appreciated.
> >>
> > On a second thought, if we make system IRQ vector determined at compile
> > time based on different CONFIG options, will it break userspace tools such
> > as perf? More importantly the rule of not breaking userspace.
The rule for tools/perf is "don't impose _any requirement_ on the kernel
developers, they don't have to test if any change they do outside of
tools/ will break something inside tools/."
> tools/arch/x86/include/asm/irq_vectors.h is only used to generate the
> list of system vectors for pretty output. And your change already broke
> that.
Yeah, I even moved that from tools/arch/x86/include/asm/irq_vectors.h
to tools/perf/trace/beauty/arch/x86/include/asm/irq_vectors.h (for next
merge window).
Having it in tools/arch/x86/include/asm/irq_vectors.h was a bad decision
as it, as you mentinoned, is only used to generate string tables:
⬢[acme@...lbox perf-tools-next]$ tools/perf/trace/beauty/tracepoints/x86_irq_vectors.sh
static const char *x86_irq_vectors[] = {
[0x02] = "NMI",
[0x80] = "IA32_SYSCALL",
[0xec] = "LOCAL_TIMER",
[0xed] = "HYPERV_STIMER0",
[0xee] = "HYPERV_REENLIGHTENMENT",
[0xef] = "MANAGED_IRQ_SHUTDOWN",
[0xf0] = "POSTED_INTR_NESTED",
[0xf1] = "POSTED_INTR_WAKEUP",
[0xf2] = "POSTED_INTR",
[0xf3] = "HYPERVISOR_CALLBACK",
[0xf4] = "DEFERRED_ERROR",
[0xf6] = "IRQ_WORK",
[0xf7] = "X86_PLATFORM_IPI",
[0xf8] = "REBOOT",
[0xf9] = "THRESHOLD_APIC",
[0xfa] = "THERMAL_APIC",
[0xfb] = "CALL_FUNCTION_SINGLE",
[0xfc] = "CALL_FUNCTION",
[0xfd] = "RESCHEDULE",
[0xfe] = "ERROR_APIC",
[0xff] = "SPURIOUS_APIC",
};
⬢[acme@...lbox perf-tools-next]$
Used in:
root@...ber:~# perf trace -a -e irq_vectors:irq_work_entry/max-stack=32/ --max-events=1
0.000 kworker/u57:0-/9912 irq_vectors:irq_work_entry(vector: IRQ_WORK)
__sysvec_irq_work ([kernel.kallsyms])
__sysvec_irq_work ([kernel.kallsyms])
sysvec_irq_work ([kernel.kallsyms])
asm_sysvec_irq_work ([kernel.kallsyms])
_raw_spin_unlock_irqrestore ([kernel.kallsyms])
dma_fence_wait_timeout ([kernel.kallsyms])
intel_atomic_commit_tail ([kernel.kallsyms])
process_one_work ([kernel.kallsyms])
worker_thread ([kernel.kallsyms])
kthread ([kernel.kallsyms])
ret_from_fork ([kernel.kallsyms])
ret_from_fork_asm ([kernel.kallsyms])
root@...ber:~#
But as the original cset introducing this explains, these irq_vectors:
tracepoins operate on just one of the vectors, so irq_work_entry(vector:
IRQ_WORK), irq_vectors:reschedule_exit(vector: RESCHEDULE), etc.
> The obvious solution to that is to expose that list in sysfs for
> consumption by perf.
nah, the best thing these days is stop using 'int' for vector and use
'enum irq_vector', then since we have BTF we can use that to do the enum
-> string translation, like with (using /sys/kernel/btf/vmlinux, that is
pretty much available everywhere these days):
root@...ber:~# pahole clocksource_ids
enum clocksource_ids {
CSID_GENERIC = 0,
CSID_ARM_ARCH_COUNTER = 1,
CSID_MAX = 2,
};
root@...ber:~# pahole skb_drop_reason | head
enum skb_drop_reason {
SKB_NOT_DROPPED_YET = 0,
SKB_CONSUMED = 1,
SKB_DROP_REASON_NOT_SPECIFIED = 2,
SKB_DROP_REASON_NO_SOCKET = 3,
SKB_DROP_REASON_PKT_TOO_SMALL = 4,
SKB_DROP_REASON_TCP_CSUM = 5,
SKB_DROP_REASON_SOCKET_FILTER = 6,
SKB_DROP_REASON_UDP_CSUM = 7,
SKB_DROP_REASON_NETFILTER_DROP = 8,
root@...ber:~#
Then its easy to go from 0 to CSID_GENERIC, etc.
⬢[acme@...lbox pahole]$ perf stat -e cycles pahole skb_drop_reason > /dev/null
Performance counter stats for 'pahole skb_drop_reason':
6,095,427 cpu_atom/cycles:u/ (2.82%)
103,694,633 cpu_core/cycles:u/ (97.18%)
0.039031759 seconds time elapsed
0.016028000 seconds user
0.023007000 seconds sys
⬢[acme@...lbox pahole]$
- Arnaldo
> But we don't have to do any of that right away. It's an orthogonal
> issue. Just waste the extra system vector to start with and then we can
> add the compile time dependend change on top if we really care about
> gaining back the vectors.
>
> Thanks,
>
> tglx
Powered by blists - more mailing lists