[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <m1tzexaz7u.fsf@frodo.ebiederm.org>
Date: Thu, 10 Jul 2008 20:15:01 -0700
From: ebiederm@...ssion.com (Eric W. Biederman)
To: Suresh Siddha <suresh.b.siddha@...el.com>
Cc: "mingo@...e.hu" <mingo@...e.hu>, "hpa@...or.com" <hpa@...or.com>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"arjan@...ux.intel.com" <arjan@...ux.intel.com>,
"andi@...stfloor.org" <andi@...stfloor.org>,
"jbarnes@...tuousgeek.org" <jbarnes@...tuousgeek.org>,
"steiner@....com" <steiner@....com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
Suresh Siddha <suresh.b.siddha@...el.com> writes:
>> That sounds nice in principle. I saw cpu cache flushes, I saw writes.
>> I did not see any reads which is necessary to get that behavior with
>> the standard pci transaction rules.
>
> qi_flush_iec() will submit an invalidation descriptor and will wait
> till it finishes the invalidation of the interrupt entry cache.
> qi_submit_sync() will do the job. Descriptor completion ensures that
> the inflight interrupts are flushed.
I will have to look a second time. It seems I did not see the wait.
>> Having seen enough little races and misbehaving hardware I'm very paranoid
>> about irq migration. The current implementation is belt and suspenders
>> and I still think there are races that I have missed.
>
> Eric, This process irq migration is done on the cutting edge hardware
> which was designed with all the feedback and experiences in the mind ;)
>
> And also, I don't think we are deviating much from what we are currently doing.
> We are still using cleanup vector etc, to clean up the previous vector
> allocation.
Yes. In general I think the design appears sound. I'm just not certain
of the implementation details. Having spent way to many hours
debugging some very subtle hardware issues, I am paranoid.
>> Sounds good. Ultimately we are looking at handler_data or chip_data.
>> There are very specific rules that meant I could not use them for
>> the msi data but otherwise I don't remember exactly what the are for.
>> IOMMU are covered though.
>
> IOMMU is covered as part of pci_dev (pci_sysdata). But in the case of
> interrupt-remapping, there are some interrupt resources like ioapics and
> hpet, which don't have the corresponding pci dev. Will take a look at this.
>
>> At least for msi the code you are working on was essentially unified
>> when it was written, it just happened to have two copies. I don't
>> think I'm asking for heaving lifting. Mostly just putting code that
>> is touched into something other then the growing monstrosity that is
>> ioapic.c
>
> We can create msi.c which handles MSI specific handling. I will
> look into this. But I def welcome somone beating me in posting those
> patches :) I made a note of this however.
Thanks. It is all of these little things. My hope is that we can
whittle down the unshared core instead of increasing the amount of
non-shared code.
>> Further can we please see some better abstractions. In particular can
>> we generate a token for the irq destination. And have the msi and
>> ioapic setup read that token and program it into the hardware. The
>> rules for which bits go where is exactly the same both with and
>> without irq_remapping so having an if statement there seems to obscure
>> what is really happening. Especially if as it appears that we may be used
>> the new token format with x2apics without remapping.
>
> unfortunately x2apic can't be enabled with out enabling interrupt-remapping.
> Interrupts don't work in majority of the configurations (as I mentioned
> earlier). Programming IOAPIC RTE's and MSI address/data registers are
> completely different based on the presence of interrupt-remapping.
Regardless of the x2apic mode issues there is a different issue here (address better in
another response where I gave an example).
There are a set of architecturally defined bits that can be
programmed. These same bits exist in the ioapic routing entry and in
the msi message.
Therefore we should have a generic mapping function that says give the architecturally
defined bits.
Then both the ioapic setup and the msi setup can call x86_irq_map(irq) get those
architectural bits and program them in their architecturally defined location.
For MSI it looks like you be able to take advantage of a few more bits, but the
same principle applies.
Getting these intermediate abstractions relatively clean is important so we can do
things with the hardware.
Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists