linux-kernel - Re: [patch] revert: [NET]: Fix races in net_rx

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.0.999.0707161425361.20061@woody.linux-foundation.org>
Date:	Mon, 16 Jul 2007 14:40:38 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	David Miller <davem@...emloft.net>
cc:	mingo@...e.hu, linux-kernel@...r.kernel.org, olaf.kirch@...cle.com
Subject: Re: [patch] revert: [NET]: Fix races in net_rx_action vs netpoll

On Mon, 16 Jul 2007, David Miller wrote:
> 
> Well, let's figure out why before we revert because it
> is attempting to fix a legitimate bug.

I'm reverting it. I don't think there is any excuse for not reverting 
something that provably breaks somebody's machine. I don't want this to be 
on the regression list - if you figure out why it broke, you can always 
just resubmit a fixed patch.

Keeping a known broken patch around just because you want to figure out 
why it's broken seems pointless.

Just looking at the patch I see two options:

 - The change seems to always set the LIST_FROZEN bit when calling 
   ->poll(), and at least on e1000, the NAPI poll() routine ends up doing 
   that netif_rx_complete(), so we're *guaranteed* to always take the 
   early exit and not do anything.

   Looks fundamentally broken to me, and not worth saving. But maybe I'm 
   missing something (and it doesn't really seem to explain the write 
   timeout per se).

 - The early return from netif_rx_complete() ends up meaning that an 
   edge-triggered interrupt isn't handled properly, and will this never 
   happen again, since it never goes away.

   With MSI, edge-triggered interrupts are making a comeback in a big way, 
   and yeah, e1000 is one of the drivers that do MSI. Ingo might want to 
   confirm whether it's actually enabled for him, and whether turning it 
   off might hide the problem, but if that's it, then the whole patch is 
   fundamentally broken, and not worth saving.

but in either case (or, indeed, even if I didn't see any problem at all), 
I think reverting a patch that isn't needed is _always_ the right choice. 

If we don't know what caused a problem in the first place, or if the fix 
is known to be required for something else and reverting it would cause 
*another* regression, it would be another issue. But as it is, reverting 
it would seem to unquestionably get rid of a regression, and is thus a 
no-brainer.

No?

		Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/