[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.20.1709070731110.2433@nanos>
Date: Thu, 7 Sep 2017 07:54:09 +0200 (CEST)
From: Thomas Gleixner <tglx@...utronix.de>
To: Yu Chen <yu.c.chen@...el.com>
cc: x86@...nel.org, Ingo Molnar <mingo@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>, Rui Zhang <rui.zhang@...el.com>,
LKML <linux-kernel@...r.kernel.org>,
"Rafael J. Wysocki" <rjw@...ysocki.net>,
Len Brown <lenb@...nel.org>,
Dan Williams <dan.j.williams@...el.com>,
Christoph Hellwig <hch@....de>,
Peter Zijlstra <peterz@...radead.org>,
Jeff Kirsher <jeffrey.t.kirsher@...el.com>
Subject: Re: [PATCH 4/4][RFC v2] x86/apic: Spread the vectors by choosing
the idlest CPU
On Thu, 7 Sep 2017, Yu Chen wrote:
> On Wed, Sep 06, 2017 at 10:03:58AM +0200, Thomas Gleixner wrote:
> > Can you please apply the debug patch below, boot the machine and right
> > after login provide the output of
> >
> > # cat /sys/kernel/debug/tracing/trace
> >
> kworker/0:2-303 [000] .... 9.135467: msi_domain_alloc_irqs: dev: 0000:bb:00.0 nvec 1 virq 34
> kworker/0:2-303 [000] .... 9.135476: msi_domain_alloc_irqs: dev: 0000:bb:00.0 nvec 1 virq 35
> kworker/0:2-303 [000] .... 9.135484: msi_domain_alloc_irqs: dev: 0000:bb:00.0 nvec 1 virq 36
<SNIP>
> kworker/0:2-303 [000] .... 9.762268: msi_domain_alloc_irqs: dev: 0000:bb:00.3 nvec 1 virq 331
> kworker/0:2-303 [000] .... 9.762278: msi_domain_alloc_irqs: dev: 0000:bb:00.3 nvec 1 virq 332
> kworker/0:2-303 [000] .... 9.762288: msi_domain_alloc_irqs: dev: 0000:bb:00.3 nvec 1 virq 333
That's 300 vectors.
> bb:00.[0-3] Ethernet controller: Intel Corporation Device 37d0 (rev 03)
>
> -+-[0000:b2]-+-00.0-[b3-bc]----00.0-[b4-bc]--+-00.0-[b5-b6]----00.0
> | | +-01.0-[b7-b8]----00.0
> | | +-02.0-[b9-ba]----00.0
> | | \-03.0-[bb-bc]--+-00.0
> | | +-00.1
> | | +-00.2
> | | \-00.3
>
> and they are using i40e driver, the vectors should be reserved by:
> i40e_probe() ->
> i40e_init_interrupt_scheme() ->
> i40e_init_msix() ->
> i40e_reserve_msix_vectors() ->
> pci_enable_msix_range()
>
> # ls /sys/kernel/debug/irq/irqs
> 0 10 11 13 142 184 217 259 292 31 33
> 337 339 340 342 344 346 348 350 352 354 356
> 358 360 362 364 366 368 370 372 374 376 378
> 380 382 384 386 388 390 392 394 4 6 7 9
> 1 109 12 14 15 2 24 26 3 32 335
> 338 34 341 343 345 347 349 351 353 355 357
> 359 361 363 365 367 369 371 373 375 377 379
> 381 383 385 387 389 391 393 395 5 67 8
Out of these 300 interrupts exactly 8 randomly selected ones are actively
used. And the other 292 interrupts are just there because it might need
them in the future when the 32 CPU machine gets magically upgraded to 4096
cores at runtime?
Can the i40e people @intel please fix this waste of resources and sanitize
their interrupt allocation scheme?
Please switch it over to managed interrupts so the affinity spreading
happens in a sane way and the interrupts are properly managed on CPU
hotplug.
Thanks,
tglx
Powered by blists - more mailing lists