[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1170065487.19887.91.camel@concordia.ozlabs.ibm.com>
Date: Mon, 29 Jan 2007 21:11:27 +1100
From: Michael Ellerman <michael@...erman.id.au>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: Benjamin Herrenschmidt <benh@...nel.crashing.org>,
David Miller <davem@...emloft.net>, jeff@...zik.org,
greg@...ah.com, tony.luck@...el.com, grundler@...isc-linux.org,
mingo@...e.hu, linux-kernel@...r.kernel.org, kyle@...isc-linux.org,
linuxppc-dev@...abs.org, brice@...i.com, shaohua.li@...el.com,
linux-pci@...ey.karlin.mff.cuni.cz
Subject: Re: [PATCH 0/6] MSI portability cleanups
On Mon, 2007-01-29 at 02:03 -0700, Eric W. Biederman wrote:
> Benjamin Herrenschmidt <benh@...nel.crashing.org> writes:
>
> > On Sun, 2007-01-28 at 21:25 -0800, David Miller wrote:
> >> From: ebiederm@...ssion.com (Eric W. Biederman)
> >> Date: Sun, 28 Jan 2007 22:18:59 -0700
> >>
> >> > Regardless of my opinion on the sanity of the hypervisor architects.
> >> > I have not seen anything that indicates it will be hard to support
> >> > the hypervisor doing everything or most of everything for us, so
> >> > I see no valid technical objection to it. Nor have I ever.
> >> >
> >> > So I have no problem with additional patches in that direction.
> >>
> >> Ok, that's great to hear.
> >>
> >> I know your bi-directional approach isn't exactly what Ben
> >> wants but he can support his machines with it. Maybe after
> >> some time we can agree to move from that more towards the
> >> totally abstracted scheme.
> >
> > It can support my machines without HV with trivial changes I reckon: I
> > need an ops struct to indirect eric's 2 remaining arch hooks
> > (setup/teardown) but that can be done inline within asm-powerpc. I need
> > to double check of course and probably actually port the MPIC backend
> > and possibly go write the Cell Axon one while at it to verify everything
> > is allright, but the base design seems sound enough.
> >
> > For the ones with HV (RTAS stuff), we still need to agree on how to
> > approach it. We can either:
> >
> > Option 1
> > --------
> >
> > Do a hook -above- Eric stuff, by having the toplevel APIs themselves be
> > arch hooks that can either go toward the RTAS implementation or toward
> > Eric's code. That is, eric code would define those (pick better names if
> > you are good at it):
> >
> > pci_generic_enable_msi
> > pci_generic_disable_msi
> > pci_generic_enable_msix
> > pci_generic_disable_msix
> > pci_generic_save_msi_state
> > pci_generic_restore_msi_state
> >
> > Then we can have asm-i386/msi.h & friends do something like
> >
> > #define pci_enable_msi pci_generic_enable_msi
> > #define pci_disable_msi pci_generic_disable_msi
> > etc...
> >
> > And we can have asm-powerpc/msi.h hook then via ppc_md:
> >
> > static inline int pci_enable_msi(xxx...)
> > {
> > return ppc_md.pci_enable_msi(xxx...);
> > }
> > etc...
> >
> > (ppc_md is our per-platform global hook structure filled at boot when we
> > discover on what machine type we are running on) so that pSeries can use
> > it's own RTAS callbacks, and others can just re-hook those to Eric's
> > code.
>
> This is the most straight forward and handles machines with really
> weird msi setups, so I lean in this direction.
>
> The question is there anything at all we can do generically?
>
> I can't see a case where ppc_md would not wind up with the hooks
> that decide if it is a hypervisor or not. Even if we came up
> with a better set of functions you need to hook.
>
> > Option 2
> > --------
> >
> > That is to make Eric's code itself cope with the HV case. I'm a bit at
> > loss right now as how precisely to do it. I need to spend more time
> > staring at the code after Eric latest patches rather than the patches
> > themselves I suppose :-) (Eric, they don't apply out of the box on
> > current git, they are against -mm ?).
> >
> > Some of the main issues here, more/less following the order in which
> > Eric code calls things:
> >
> > - The number of vectors for MSI-X is obtained from config space (at
> > least for sanity checking the requested argument). On RTAS, it should
> > come from an OF property (we are really not supposed to go read the
> > config space even if we can). I -suppose- we can survive for now with
> > just reading it, but we might well run into trouble with some "special"
> > devices shared accross partitions or if the IBM magic bridges themselves
> > ever start sending MSI-X on their own (unlikely but who knows...).
> > Michael's code handled that by having a callback ->check() do the sanity
> > checking of the nvec, and then just use the nvec passed in as an
> > argument once it's sane.
>
> Ok. I think I get the point of check. I believe I need to look at your
> code a little more and see what you are doing to see if there is anything
> generic worth doing, that we can always do outside of architecture code
> no matter how much of the job the Hypervisor wants to do for us.
>
> I'd hate to hit a different Hypervisor that did something close but
> not quite the same and have the code fail then. So definitely
> avoiding touching pci config space at all in the calls seems to make a
> lot of sense. This includes avoiding pci_find_capability right?
You can read config space, but it's not clear to me if the HV is allowed
to filter it and hide things. It's also possible that the device
supports MSI, but for some reason the HV doesn't allow it on that device
etc. so you really have to ask the HV if it's enabled. So pci_find_cap()
shouldn't crash or anything, but it may lie to you.
> Off the top of my head the only things we can do generically are
> some data structure things and flags like dev->msi_enabled or
> dev->msix_enabled.
It would be good to have a common data structure if possible. My
thinking was that most of the information is per pci_dev, so that's
where I put it. I realise the Intel code stores some info that's
per-irq, but most of it is per-device. I hadn't got anywhere near coding
it, but my vague idea was to add a arch_data (or whatever) pointer to my
msi_info struct, which would allow backends to stash stuff.
I think the pci_intx() calls can be in the core.
Munging dev->irq could be in the core, assuming it's left in some known
location by the code. On the other hand we might want to decide it's a
bad idea altogether.
One thing I did like about my code, is that pci_enable_msi() and
pci_enable_msix() are just small wrappers around generic_enable_msi() -
which does all the work, and is the same regardless of whether it's an
MSI or MSI-X. Although that's facilitated by the type arg which you
don't like.
cheers
--
Michael Ellerman
OzLabs, IBM Australia Development Lab
wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)
We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person
Download attachment "signature.asc" of type "application/pgp-signature" (190 bytes)
Powered by blists - more mailing lists