[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1157594179.20092.451.camel@ymzhang-perf.sh.intel.com>
Date: Thu, 07 Sep 2006 09:56:19 +0800
From: "Zhang, Yanmin" <yanmin_zhang@...ux.intel.com>
To: Linas Vepstas <linas@...tin.ibm.com>
Cc: Benjamin Herrenschmidt <benh@...nel.crashing.org>,
linuxppc-dev@...abs.org,
linux-pci maillist <linux-pci@...ey.karlin.mff.cuni.cz>,
Yanmin Zhang <yanmin.zhang@...el.com>,
LKML <linux-kernel@...r.kernel.org>,
Rajesh Shah <rajesh.shah@...el.com>
Subject: Re: pci error recovery procedure
On Thu, 2006-09-07 at 04:01, Linas Vepstas wrote:
> On Wed, Sep 06, 2006 at 09:26:56AM +0800, Zhang, Yanmin wrote:
> > > > The
> > > > error_detected of the drivers in the latest kernel who support err handlers
> > > > always returns PCI_ERS_RESULT_NEED_RESET. They are typical examples.
> > >
> > > Just because the current drivers do it this way does not mean that this is
> > > the best way to do things.
> >
> > If it's not the best way, why did you choose to reset slot for e1000/e100/ipr
> > error handlers? They are typical widely-used devices. To make it easier to
> > add error handlers?
>
> I did it that way just to get going, get something working. I do not
> have hardware specs for any of these devices, and do not have much of
> an idea of what they are capable of;
Yes, it's difficult to add fine-grained error handlers for guys who are not
the driver developers.
> the recovery code I wrote is of
> "brute force, hit it with a hammer"-nature. Driver writers who
> know thier hardware well, and are interested in a more refined
> approach are encouraged to actualy use a more refined approach.
I guess almost no driver developer is happy to spend lots of time to
add refined steps. They would like to focus on normal process (for achievement
feeling? :) ).
In addition, if they use fine-grained steps in error handlers, all these
steps might be rewritten when the device specs is upgraded. Fine-grained steps in
error handlers are more difficut to debug.
It's impossible for you to develop error handlers for all device drivers.
The error handlers look a little like suspend/resume. Of course, it's more
complicated. If we could keep it as simple as suspend/resume, it's more welcomed.
pci error shouldn't happen frequently. And when it happens, I think mostly it's
an endpoint device instead of bridge. When it happens, if we choose always
reset slot, performance could be degraded, but not too much. I just deduce, and
didn't test it on a machine with hundreds of devices.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists