linux-kernel - Re: [PATCH v2 8/8] x86/ioapic: Generate RTE directly from parent irqchip's MSI message

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <be564fccc341efa730b8cdfe18ef4d7e709ebf50.camel@infradead.org>
Date:   Sat, 24 Oct 2020 09:26:13 +0100
From:   David Woodhouse <dwmw2@...radead.org>
To:     Thomas Gleixner <tglx@...utronix.de>, x86@...nel.org
Cc:     kvm <kvm@...r.kernel.org>, Paolo Bonzini <pbonzini@...hat.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        linux-hyperv@...r.kernel.org, Dexuan Cui <decui@...rosoft.com>
Subject: Re: [PATCH v2 8/8] x86/ioapic: Generate RTE directly from parent
 irqchip's MSI message

On Fri, 2020-10-23 at 23:28 +0200, Thomas Gleixner wrote:
> On Fri, Oct 23 2020 at 11:10, David Woodhouse wrote:
> > On 22 October 2020 22:43:52 BST, Thomas Gleixner <tglx@...utronix.de> wrote:
> > It makes the callers slightly more readable, not having to cast to uint32_t* from the struct.
> > 
> > I did ponder defining a new struct with bitfields named along the
> > lines of 'msi_addr_bits_19_to_4', but that seemed like overkill.
> 
> I did something like this in the meantime, because all of this just
> sucks.
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/apic
> 
> Hot of the press and completely untested.

Hm, your struct IO_APIC_route_entry isn't actually a union; you've
defined a 128-bit structure with the IR fields *following* the non-IR
fields. But there *is* a union in io_apic.c, of that 128-bit structure
and two uint32_ts. Suspect that wasn't quite what you intended. I'll
prod at it this morning and turn it into a single union of the three,
and give it some testing.

Also, my "Move MSI support into hpet.c" patch¹ got updated to
s/CONFIG_PCI_MSI/CONFIG_GENERIC_MSI_IRQ/ at about line 53 for the MSI-
related variable declarations, which was going to be in the next
version I posted.

I was also hoping Paolo was going to take the patch which just defines
the KVM_FEATURE_MSI_EXT_DEST_ID bit² ASAP, so that we end up with a
second patch³ that *just* wires it up to x86_init.msi_ext_dest_id() for
KVM.

¹ https://git.infradead.org/users/dwmw2/linux.git/commitdiff/734719c1f4
² https://git.infradead.org/users/dwmw2/linux.git/commitdiff/3f371d6749
³ https://git.infradead.org/users/dwmw2/linux.git/commitdiff/8399e14eb5

> Yes, we can't avoid the bit swizzling at all. But it can be made more
> readable.

Hm, I was about to concede that your version is a bit more readable.

But then I got to your new __ipi_msi_compose_msg() and realised that it
isn't working because it's setting the 0xFEE base address in the _low_
bits, somehow...

	msg->arch_addr_lo.base_address = X86_MSI_BASE_ADDRESS_LOW;
	printk("1 Compose MSI message %x/%x\n", msg->address_lo, msg->data);
	msg->arch_addr_lo.dest_mode_logical = apic->dest_mode_logical;
	printk("2 Compose MSI message %x/%x\n", msg->address_lo, msg->data);
	msg->arch_addr_lo.destid_0_7 = cfg->dest_apicid & 0xFF;
	printk("3 Compose MSI message %x/%x\n", msg->address_lo, msg->data);

[    1.793874] 1 Compose MSI message fee/0
[    1.794310] 2 Compose MSI message fee/0
[    1.794768] 3 Compose MSI message f02/0

And now I wish it was just a simple shift instead of an unholy maze of
overlapping unions of bitfields. But I'll make more coffee and stare at
it harder...

> Yes, that code is horrid, but adding a comment to that effect when
> changing it is not asked too much, right?

Sure. I just actually hadn't noticed that setting the dest/vector bits
right there was entirely redundant in the first place.

> I'm still wrapping my head around getting rid of this thing completely
> because now it's just a subset of your KVM case with the only
> restriction that I/O-APIC cannot be affined to any CPU with a APIC id
> greater than 255.

It was only ever that restriction anyway, wasn't it? Hyper-V PCI has
its own MSI handling, and there's no HPET so it was only ever the
I/OAPIC which was problematic there.

There are Hyper-V VM sizes with 416 vCPUs which depend on this today,
and which don't have the 15-bit MSI extension. Removing hyperv-iommu
would prevent us from using all the vCPUs on those. You *could* make
hyperv-iommu decline to initialise if x86_init.msi_ext_dest_id()
returns true though⁴.

⁴ https://git.infradead.org/users/dwmw2/linux.git/commitdiff/633ccf0d42

Download attachment "smime.p7s" of type "application/x-pkcs7-signature" (5174 bytes)