linux-kernel - Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64N.0710171221510.28993@blysk.ds.pg.gda.pl>
Date:	Thu, 18 Oct 2007 12:30:35 +0100 (BST)
From:	"Maciej W. Rozycki" <macro@...ux-mips.org>
To:	Jarek Poplawski <jarkao2@...pl>
cc:	Andy Fleming <afleming@...escale.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Jeff Garzik <jgarzik@...ox.com>, netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

On Wed, 17 Oct 2007, Jarek Poplawski wrote:

> I'm not sure free_irq() should maintain the depth count - rather warn
> on not zero. But, IMHO, any activity on freed irq seems suspicious to
> me (and doesn't look like very common), even if it's safe with current
> implementation.

 No way to avoid it with DEBUG_SHIRQ.

> Yes, these DEBUG_SHIRQ checks are suspicious to me too, but they seem
> to be reasonable only in the case of possible resent irqs (so not for
> all irqs). On the other hand, it seems, proper irq handler with proper
> hardware shouldn't have any problems with such a check.

 What do you mean by "proper irq handler with proper hardware"?  Using
softirqs (they used to be called bottom-halves) is actually a natural way 
of handling any interrupt which requires extensive processing.

> 1) phy_change() checks PHY_HALTED flag without lock; I think it's
> racy: eg. if it's done during phy_stop() it can check just before
> the flag is set and reenable interrupts just after phy_stop() ends.

 I remember having a look into it, but it was long ago and I cannot 
immediately recall the conclusion.  Which means it is either broken or 
deserves a comment as non-obvious.  I will have a look into it again, but 
I am resource-starved a little at the moment, sorry.

> 2) phy_change() doesn't reenable irq line after it sees returns
> with errors; IMHO it should at least write some warning, but maybe
> try some safety plan, so enable_irq() and try to disable interrupts
> and free_irq() on the next call (if it happens). (But, I can be very
> wrong with this - maybe it's OK and official way.)

 No way to do this safely -- at this point the device probably still has 
its interrupt output asserted and the register to clear it is 
inaccessible, so enabling the line will enter an infinite loop.  At this 
point the system is no longer stable, so it is better to keep at least 
some functionality, so that it may be attempted to be shut down cleanly, 
rather than make it completely irresponsive.  The alternative is panic().

> 3) phy_interrupt() checks PHY_HALTED flag without lock too, but I'm
> not sure now if it could be dangerous after fixing #1; on the other
> hand even if we know it's not our regular interrupt, with current
> DEBUG_SHIRQ it could be easier to call schedule_work() anyway since
> we are sure it's before/in free_irq() yet.

  See #1 above.

> 4) phy_interrupt() should check return value from schedule_work() and
> enable irq on 0.

 No -- the work already pending will do that.

> 5) phy_stop_interrupts(): maybe I miss something, but it seems
> phy_stop() is required before this, so maybe there should be a
> comment on this?

 The API is documented in: Documentation/networking/phy.txt -- you are 
welcome to improve.  If you do not want to get into the gory details, just 
use the cooked interface and phy_disconnect() will do the dirty work for 
you.

> 6) phy_stop_interrupts(): if I'm not wrong with #3 calling
> phy_disable_interrupts() looks like we are not sure this phy_stop()
> really works; than maybe a WARN_ON?

 WARN_ON what?

> 7) phy_stop_interrupts(): after above mentioned changes in
> phy_interrupt(), and phy_changes() (always enable_irq()) I can't see
> any reason why there should be more than one skipped enable_irq(),
> so checking return from cancel_work_sync() shouldn't be enough
> instead of this atomic counter.

 CONFIG_DEBUG_SHIRQ.  Barring this option, cancel_work_sync() could have 
been moved to the front of free_irq().  I think I have seen this in 
reality, with the interrupt line left stuck disabled afterwards, but I 
will double check when I have an opportunity.  The approach implemented 
with this patch does work, which is the important bit, and if 
simplification is possible, then it may be applied later.

> 8) phy_stop_interrupts(): I'm not sure this additional call from
> DEBUG_SHIRQ should be so dangerous, eg.:
> 
> 	/*
> 	 * status == PHY_HALTED &&
> 	 * interrupts are stopped after phy_stop()
> 	 */
> 	if (cancel_work_sync(...))
> 		enable_irq();
> 
> 	free_irq(...);
> 	/*
> 	 * possible schedule_work() from DEBUG_SHIRQ only,
> 	 * but proper check for PHY_HALTED is done;
> 	 * so, let's flush after this too:
> 	 */
> 	cancel_work_sync();

 Well, if there is another handler registered on this line, you'll get 
your interrupt line stuck disabled.

> Of course, I don't know phy.c enough, so most of this can be terribly
> wrong, then feel free to forget about this - I don't expect you should
> waste any time for explaining me these things - after all they are
> doubts only.

 I am not sure I know phy.c well enough either, ;-) and your concerns are 
appreciated as interesting conclusions may develop.  If someone disagrees 
with what I have written here, they are welcome to speak out too.

  Maciej
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/