netdev - Re: [RFC] virtio: orphan skbs if we're relying on timer to free them

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20090521.001503.90069315.davem@davemloft.net>
Date:	Thu, 21 May 2009 00:15:03 -0700 (PDT)
From:	David Miller <davem@...emloft.net>
To:	rusty@...tcorp.com.au
Cc:	netdev@...r.kernel.org, virtualization@...ts.linux-foundation.org
Subject: Re: [RFC] virtio: orphan skbs if we're relying on timer to free
 them

From: Rusty Russell <rusty@...tcorp.com.au>
Date: Thu, 21 May 2009 16:27:05 +0930

> On Tue, 19 May 2009 12:10:13 pm David Miller wrote:
>> What you're doing by orphan'ing is creating a situation where a single
>> UDP socket can loop doing sends and monopolize the TX queue of a
>> device.  The only control we have over a sender for fairness in
>> datagram protocols is that send buffer allocation.
> 
> Urgh, that hadn't even occurred to me.  Good point.

Now this all is predicated on this actually mattering. :-)

You could argue that the scheduler as well as the size of the
TX queue should be limiting and enforcing fairness.

Someone really needs to test this.  Just skb_orphan() every packet
at the beginning of dev_hard_start_xmit(), then run some test
program with two clients looping out UDP packets to see if one
can monopolize the device and get a significantly larger amount
of TX resources than the other.  Repeat for 3, 4, 5, etc. clients.

> I haven't thought this through properly, but how about a hack where
> we don't orphan packets if the ring is over half full?

That would also work.  And for the NIU case this would be great
because I DO have a marker bit for triggering interrupts in the TX
descriptors.  There's just no "all empty" interrupt on TX (who
designs these things? :( ).

> Then I guess we could overload the watchdog as a more general
> timer-after-no- xmit?

Yes, but it means that teardown of a socket can be delayed up to
the amount of that timer.  Factor in all of this crazy
round_jiffies() stuff people do these days and it could cause
pauses for real use cases and drive users batty.

Probably the most profitable avenue is to see if this is a real issue
afterall (see above).  If we can get away with having the socket
buffer represent socket --> device space only, that's the most ideal
solution.  It will probably also improve performance a lot across the
board, especially on NUMA/SMP boxes as our TX complete events tend to
be in difference places than the SKB producer.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html