[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <m1wrpzfctu.fsf@fess.ebiederm.org>
Date: Sun, 03 Oct 2010 09:41:49 -0700
From: ebiederm@...ssion.com (Eric W. Biederman)
To: Thomas Gleixner <tglx@...utronix.de>
Cc: LKML <linux-kernel@...r.kernel.org>, linux-arch@...r.kernel.org,
Linus Torvalds <torvalds@...l.org>,
Andrew Morton <akpm@...ux-foundation.org>, x86@...nel.org,
Peter Zijlstra <peterz@...radead.org>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Paul Mundt <lethal@...ux-sh.org>,
Russell King <linux@....linux.org.uk>,
David Woodhouse <dwmw2@...radead.org>,
Jesse Barnes <jbarnes@...tuousgeek.org>,
Yinghai Lu <yinghai@...nel.org>,
Grant Likely <grant.likely@...retlab.ca>
Subject: Re: [patch 00/47] Sparse irq rework
Thomas Gleixner <tglx@...utronix.de> writes:
> The following patch series cleans up and mostly reimplements the core
> sparse irq implementation and sanitizes the most complex (ab)user:
> arch/x86
Overall this patchset looks pretty sane, but I don't see a clear picture
of what everything is going to look like when the dust settles.
> The series is based on the previous rework of irq chip functions which
> is available at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git irq/core
>
> A combined throwaway git repository with all the following patches on top of
> tip/irq/core is available at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-sparse-irq.git
>
> The overall changes are (full changelog below):
> 56 files changed, 1229 insertions(+), 1682 deletions(-)
>
> The series consists of 3 parts:
>
> - cleanup of kernel/irq code and implementation of new allocator
> - conversion of x86 to new irq_chip functions and new allocator
> - trivial cleanup of the remaining users and removal of the old stuff
>
> It's fully bisectable and survived a night of testing in my testfarm.
>
> There are two bugfix patches (1/47, 2/47), which resulted of staring
> at that maze for way too long. They are targeted for mainline urgent,
> but I left them in the queue to avoid further churn.
>
>
> Rationale:
> ----------
>
> The current sparse_irq allocator has several short comings due to
> failures in the design or the lack of it:
>
> - Requires iteration over the number of active irqs to find a free slot
> Some architectures have grown their own workarounds for this.
>
> - Freeing of irq descriptors is not possible
>
> - Racy between create_irq_nr and destroy_irq plugged by horrible
> callbacks
>
> - Migration of active irq descriptors is not possible
I believe you have distored the design when aiming for migration
of active irq descriptors (which you have not even implemented yet).
How do you plan to remove the radix tree lookup from the irq
handling path?
On x86 the obvious implementation is to store a pointer to the irq_desc
in our 256 entry per cpu tables. Please implement this and see how
it affects the design. The code is pretty trivial.
>From what I can see of your migration plan it seems incompatible with
removing the radix tree look up in the path to generic_handle_irq().
> - No bulk allocation of irq ranges
Where is that a short coming?
> Aside of that the sparse irq design failure caused that we sprinkled
> irq_desc references all over the place outside of kernel/irq/. That
> makes it extremly hard to do the core changes which are necessary to
> do further cleanups and improvements like he migration of active irq
> descriptors. The arch code needs only to know about the irq chip and
> the data associated with the irq. The irq descriptor itself is solely
> a core code data structure.
If by core you mean arch code irq handling code certainly and
msi fits that bill.
> The reason is that with the non sparse code access to the irq data was
> just array pointer math and most code (aside of the old __do_IRQ()
> users) used the provided accessor functions.
>
> With sparse it requires a radix tree lookup, which casued performance
> problems. Instead of tackling the problem at the chip function level
> and handing down a pointer to the associated data instead of an irq
> number, the low level code acquired a reference to irq_desc and
> populated that all over the place. Yeah, it's easier than doing a full
> cleanup and a sensible migration path, but the resulting mess is just
> disgusting.
>
> The previous chip functions series on which this series is based is
> addressing this issue on the chip level side by handing down the
> associated interrupt data instead of the interruut number. The x86
> cleanup is making use of it.
And always handing down the data structure so you can do the same
thing with sparse irq enabled or not is a much needed code cleanup.
> New implementation:
> -------------------
>
> I've implemented a sane allocator which fixes the above short comings
> (though migration of active descriptors still needs a full tree wide
> cleanup of the direct and mostly unlocked access to irq_desc).
>
> The new allocator still uses a radix_tree, but uses a bitmap for
> keeping track of allocated irq numbers. That results in:
I don't know that I have a problem with this but I do have a problem
with using a bitmap. A lot of the kernels irq usage has been distored
because we use a compact array, that we cannot grow over time. Using a
bitmap here essentially removes 90% of the point of sparse irq. The
ability to remove a hard coded NR_IRQS from the kernel.
> - Fast lookup of a free slot
>
> - The removal of disposed descriptors (destroy_irq())
>
> - Prevents the create/destroy race
>
> - Bulk (de)allocation of consecutive irq ranges
>
> - Migration of life descriptors after further cleanups
You should be able to do all of that by walking your radix tree in the
sparse irq case.
> Full conversion and clean up of x86:
> ------------------------------------
>
> I spent quite a time to come up with a sane and splitable concept,
> which does not reach out into drivers/pci/[msi|ht|dmar] and whatever.
>
> But that's simply impossible because everything is twisted together
> mainly by optimization hacks done over time. (i.e. handing down
> irq_desc to low level msi functions instead of irq_desc.msi_desc would
> have kept the mess confined to x86).
Those files provide the genirq irq chip implementation especially
drivers/pci/msi.c. Of course they will do what every other irq_chip
implementation does to get access to data. There is an unpleasant
difference between which generic irq data field htirq.c uses and msi.c
which may be worth cleaning up. But otherwise I don't see any
fundamental problems.
The big difference is those are the irq controllers that we have code
for that is not necessarily architecture specific.
> So I went there and started to convert stuff piece by piece in x86 and
> added the drivers/pci/* fixes as separate patches along the way. Not
> nice, but it turned out to be the only way which avoided even more
> churn.
You should be able to convert msi.c and company directly to using
irq_data immediately following your previous patchset shouldn't you.
Perhaps with two flavors of helper functions during the transition
to passing irq_data everywhere.
I don't see any code in the msi code is arch specific or sparse irq
specific.
> Further work:
> -------------
>
> - Cleanup the irq_desc references all over the tree, which should become
> easier after the remaining __do_IRQ() users are gone.
>
> - Implement migration of active irq descriptors
>
> - Implement node bound late allocation of low level irq vectors which
> solves an existing (SGI) problem on large machines.
>
>
> How to merge:
> -------------
>
> It needs:
>
> - ack to the new allocator design
>
> - ack to merge the whole arch/!x86 and driver related cleanups
> along with the core changes and the x86 cleanup
Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists