[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64N.0710171221510.28993@blysk.ds.pg.gda.pl>
Date: Thu, 18 Oct 2007 12:30:35 +0100 (BST)
From: "Maciej W. Rozycki" <macro@...ux-mips.org>
To: Jarek Poplawski <jarkao2@...pl>
cc: Andy Fleming <afleming@...escale.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Jeff Garzik <jgarzik@...ox.com>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes
On Wed, 17 Oct 2007, Jarek Poplawski wrote:
> I'm not sure free_irq() should maintain the depth count - rather warn
> on not zero. But, IMHO, any activity on freed irq seems suspicious to
> me (and doesn't look like very common), even if it's safe with current
> implementation.
No way to avoid it with DEBUG_SHIRQ.
> Yes, these DEBUG_SHIRQ checks are suspicious to me too, but they seem
> to be reasonable only in the case of possible resent irqs (so not for
> all irqs). On the other hand, it seems, proper irq handler with proper
> hardware shouldn't have any problems with such a check.
What do you mean by "proper irq handler with proper hardware"? Using
softirqs (they used to be called bottom-halves) is actually a natural way
of handling any interrupt which requires extensive processing.
> 1) phy_change() checks PHY_HALTED flag without lock; I think it's
> racy: eg. if it's done during phy_stop() it can check just before
> the flag is set and reenable interrupts just after phy_stop() ends.
I remember having a look into it, but it was long ago and I cannot
immediately recall the conclusion. Which means it is either broken or
deserves a comment as non-obvious. I will have a look into it again, but
I am resource-starved a little at the moment, sorry.
> 2) phy_change() doesn't reenable irq line after it sees returns
> with errors; IMHO it should at least write some warning, but maybe
> try some safety plan, so enable_irq() and try to disable interrupts
> and free_irq() on the next call (if it happens). (But, I can be very
> wrong with this - maybe it's OK and official way.)
No way to do this safely -- at this point the device probably still has
its interrupt output asserted and the register to clear it is
inaccessible, so enabling the line will enter an infinite loop. At this
point the system is no longer stable, so it is better to keep at least
some functionality, so that it may be attempted to be shut down cleanly,
rather than make it completely irresponsive. The alternative is panic().
> 3) phy_interrupt() checks PHY_HALTED flag without lock too, but I'm
> not sure now if it could be dangerous after fixing #1; on the other
> hand even if we know it's not our regular interrupt, with current
> DEBUG_SHIRQ it could be easier to call schedule_work() anyway since
> we are sure it's before/in free_irq() yet.
See #1 above.
> 4) phy_interrupt() should check return value from schedule_work() and
> enable irq on 0.
No -- the work already pending will do that.
> 5) phy_stop_interrupts(): maybe I miss something, but it seems
> phy_stop() is required before this, so maybe there should be a
> comment on this?
The API is documented in: Documentation/networking/phy.txt -- you are
welcome to improve. If you do not want to get into the gory details, just
use the cooked interface and phy_disconnect() will do the dirty work for
you.
> 6) phy_stop_interrupts(): if I'm not wrong with #3 calling
> phy_disable_interrupts() looks like we are not sure this phy_stop()
> really works; than maybe a WARN_ON?
WARN_ON what?
> 7) phy_stop_interrupts(): after above mentioned changes in
> phy_interrupt(), and phy_changes() (always enable_irq()) I can't see
> any reason why there should be more than one skipped enable_irq(),
> so checking return from cancel_work_sync() shouldn't be enough
> instead of this atomic counter.
CONFIG_DEBUG_SHIRQ. Barring this option, cancel_work_sync() could have
been moved to the front of free_irq(). I think I have seen this in
reality, with the interrupt line left stuck disabled afterwards, but I
will double check when I have an opportunity. The approach implemented
with this patch does work, which is the important bit, and if
simplification is possible, then it may be applied later.
> 8) phy_stop_interrupts(): I'm not sure this additional call from
> DEBUG_SHIRQ should be so dangerous, eg.:
>
> /*
> * status == PHY_HALTED &&
> * interrupts are stopped after phy_stop()
> */
> if (cancel_work_sync(...))
> enable_irq();
>
> free_irq(...);
> /*
> * possible schedule_work() from DEBUG_SHIRQ only,
> * but proper check for PHY_HALTED is done;
> * so, let's flush after this too:
> */
> cancel_work_sync();
Well, if there is another handler registered on this line, you'll get
your interrupt line stuck disabled.
> Of course, I don't know phy.c enough, so most of this can be terribly
> wrong, then feel free to forget about this - I don't expect you should
> waste any time for explaining me these things - after all they are
> doubts only.
I am not sure I know phy.c well enough either, ;-) and your concerns are
appreciated as interesting conclusions may develop. If someone disagrees
with what I have written here, they are welcome to speak out too.
Maciej
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists