[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20071019023219.GB8453@gondor.apana.org.au>
Date: Fri, 19 Oct 2007 10:32:19 +0800
From: Herbert Xu <herbert@...dor.apana.org.au>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@...nel.crashing.org>,
akpm@...ux-foundation.org,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
linuxppc-dev@...abs.org, Ingo Molnar <mingo@...e.hu>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH] synchronize_irq needs a barrier
On Thu, Oct 18, 2007 at 04:39:59PM -0700, Linus Torvalds wrote:
>
> Basic notion: the only thing that serializes the IRQ_INPROGRESS flag is
> the descriptor lock. And we don't have to (or even want to!) hold it while
> waiting for the thing, but we want to *have*held*it* in between whatever
> we're synchronizing with.
Thinking about this again and checking code I have come to the
conclusion that
1) There is a race (in some drivers) even with the full mb.
2) Linus's patch fixes it.
3) It's difficult to fix it elegantly in the driver.
In other words I think this patch is great :)
First of all let's agree on some basic assumptions:
* A pair of spin lock/unlock subsumes the effect of a full mb.
* A spin lock in general only equates to (SS/SL/LL).
* A spin unlock in general only equates to (SS/LS).
In particular, a load after a spin unlock may pass before the
spin unlock.
Here is the race (with tg3 as the only example that I know of).
The driver attempts to quiesce interrupts such that after the
call to synchronize_irq it is intended that no further IRQ
handler calls for that device will do any work besides acking
the IRQ.
Here is how it does it:
CPU0 CPU1
spin lock
load irq_sync
irq_sync = 1
smp_mb
synchronize_irq()
while (IRQ_INPROGRESS)
wait
return
set IRQ_INPROGRESS
spin unlock
tg3_msi
ack IRQ
if (irq_sync)
return
do work
The problem here is that load of irq_sync in the handler has
passed above the setting of IRQ_INPROGRESS.
Linus's patch fixes it because this becomes:
CPU0 CPU1
spin lock
load irq_sync
irq_sync = 1
smp_mb
synchronize_irq
set IRQ_INPROGRESS
spin unlock
spin lock
spin unlock
tg3_msi
ack IRQ
if (irq_sync)
return
do work
while (IRQ_INPROGRESS)
wait
spin lock
clear IRQ_INPROGRESS
spin unlock
return
Even though we still do the work on the right we will now notice
the INPROGRESS flag on the left and wait.
It's hard to fix this in the drivers because they'd either have
to access the desc lock or add a full mb to the fast path on the
right.
Once this goes in we can also remove the smp_mb from tg3.c. BTW,
a lot of drivers (including the fusion example Ben quoted) call
synchronize_irq before free_irq. This is unnecessary because
the latter already calls it anyway.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@...dor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists