[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170906043454.GD23250@localhost.localdomain>
Date: Wed, 6 Sep 2017 12:34:54 +0800
From: Yu Chen <yu.c.chen@...el.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: x86@...nel.org, Ingo Molnar <mingo@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>, Rui Zhang <rui.zhang@...el.com>,
LKML <linux-kernel@...r.kernel.org>,
"Rafael J. Wysocki" <rjw@...ysocki.net>,
Len Brown <lenb@...nel.org>,
Dan Williams <dan.j.williams@...el.com>,
Christoph Hellwig <hch@....de>,
Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH 4/4][RFC v2] x86/apic: Spread the vectors by choosing the
idlest CPU
On Wed, Sep 06, 2017 at 12:57:41AM +0200, Thomas Gleixner wrote:
> On Sun, 3 Sep 2017, Thomas Gleixner wrote:
>
> > On Fri, 1 Sep 2017, Chen Yu wrote:
> >
> > > This is the major logic to spread the vectors on different CPUs.
> > > The main idea is to choose the 'idlest' CPU which has assigned
> > > the least number of vectors as the candidate/hint for the vector
> > > allocation domain, in the hope that the vector allocation domain
> > > could leverage this hint to generate corresponding cpumask.
> > >
> > > One of the requirements to do this vector spreading work comes from the
> > > following hibernation problem found on a 16 cores server:
> > >
> > > CPU 31 disable failed: CPU has 62 vectors assigned and there
> > > are only 0 available.
>
> Thinking more about this, this makes no sense whatsoever.
>
> The total number of interrupts on a system is the same whether they are
> all on CPU 0 or evenly spread over all CPUs.
>
> As this machine is using physcial destination mode, the number of vectors
> used is the same as the number of interrupts, except for the case where a
> move of an interrupt is in progress and the interrupt which cleans up the
> old vector has not yet arrived. Lets ignore that for now.
>
> The available vector space is 204 per CPU on such a system.
>
> 256 - SYSTEM[0-31, 32, 128, 239-255] - LEGACY[50] = 204
>
> > > CPU 31 disable failed: CPU has 62 vectors assigned and there
> > > are only 0 available.
>
> CPU31 is the last AP going offline (CPU0 is still online).
>
> It wants to move 62 vectors to CPU0, but it can't because CPU0 has 0
> available vectors. That means CPU0 has 204 vectors used. I doubt that, but
> what I doubt even more is that this interrupt spreading helps in any way.
>
> Assumed that we have a total of 204 + 62 = 266 device interrupt vectors in
> use and they are evenly spread over 32 CPUs, so each CPU has either 8 or
> nine vectors. Fine.
>
> Now if you unplug all CPUs except CPU0 starting from CPU1 up to CPU31 then
> at the point where CPU31 is about to be unplugged, CPU0 holds 133 vectors
> and CPU31 holds 133 vectors as well - assumed that the spread is exactly
> even.
>
> I have a hard time to figure out how the 133 vectors on CPU31 are now
> magically fitting in the empty space on CPU0, which is 204 - 133 = 71. In
> my limited understanding of math 133 is greater than 71, but your patch
> might make that magically be wrong.
>
The problem is reproduced when the network cable is not plugged in,
because this driver looks like this:
step 1. Reserved enough irq vectors and corresponding IRQs.
step 2. If the network is activated, invoke request_irq() to
register the handler.
step 3. Invoke set_affinity() to spread the IRQs onto different
CPUs, thus to spread the vectors too.
Here's my understanding for why spreading vectors might help for this
special case:
As step 2 will not get invoked, the IRQs of this driver
has not been enabled, thus in migrate_one_irq() this IRQ
will not be considered because there is a check of
irqd_is_started(d), thus there should only be 8 vectors
allocated by this driver on CPU0, and 8 vectors left on
CPU31, and the 8 vectors on CPU31 will not be migrated
to CPU0 neither, so there is room for other 'valid' vectors
to be migrated to CPU0.
> Can you please provide detailed information about how many device
> interrupts are actually in use/allocated on that system?
>
> Please enable CONFIG_GENERIC_IRQ_DEBUGFS and provide the output of
>
> # cat /sys/kernel/debug/irq/domains/*
>
> and
>
> # ls /sys/kernel/debug/irq/irqs
>
Ok, here's the information after system bootup on top
of 4.13:
# cat /sys/kernel/debug/irq/domains/*
name: VECTOR
size: 0
mapped: 388
flags: 0x00000041
name: IO-APIC-0
size: 24
mapped: 16
flags: 0x00000041
parent: VECTOR
name: VECTOR
size: 0
mapped: 388
flags: 0x00000041
name: IO-APIC-1
size: 8
mapped: 2
flags: 0x00000041
parent: VECTOR
name: VECTOR
size: 0
mapped: 388
flags: 0x00000041
name: IO-APIC-2
size: 8
mapped: 0
flags: 0x00000041
parent: VECTOR
name: VECTOR
size: 0
mapped: 388
flags: 0x00000041
name: IO-APIC-3
size: 8
mapped: 0
flags: 0x00000041
parent: VECTOR
name: VECTOR
size: 0
mapped: 388
flags: 0x00000041
name: IO-APIC-4
size: 8
mapped: 5
flags: 0x00000041
parent: VECTOR
name: VECTOR
size: 0
mapped: 388
flags: 0x00000041
name: PCI-HT
size: 0
mapped: 0
flags: 0x00000041
parent: VECTOR
name: VECTOR
size: 0
mapped: 388
flags: 0x00000041
name: PCI-MSI-2
size: 0
mapped: 365
flags: 0x00000051
parent: VECTOR
name: VECTOR
size: 0
mapped: 388
flags: 0x00000041
name: VECTOR
size: 0
mapped: 388
flags: 0x00000041
# ls /sys/kernel/debug/irq/irqs
ls /sys/kernel/debug/irq/irqs
0 10 11 13 142 184 217 259 292 31 33 337 339
340 342 344 346 348 350 352 354 356 358 360 362
364 366 368 370 372 374 376 378 380 382 384 386
388 390 392 394 4 6 7 9 1 109 12 14 15 2
24 26 3 32 335 338 34 341 343 345 347 349
351 353 355 357 359 361 363 365 367 369 371 373
375 377 379 381 383 385 387 389 391 393 395 5
67 8
BTW, do we have sysfs to display how much vectors used on each CPUs?
Thanks,
Yu
> Thanks,
>
> tglx
>
>
>
>
>
>
>
>
>
>
>
Powered by blists - more mailing lists