netdev - Re: Generic rx-recycling and emergency skb pool

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1278138205.2474.27.camel@edumazet-laptop>
Date:	Sat, 03 Jul 2010 08:23:25 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Sebastian Andrzej Siewior <sebastian@...akpoint.cc>
Cc:	netdev@...r.kernel.org, tglx@...utronix.de
Subject: Re: Generic rx-recycling and emergency skb pool

Le vendredi 02 juillet 2010 à 21:20 +0200, Sebastian Andrzej Siewior a
écrit :
> This is version two of generic rx-recycling followed by version one of
> emergency skb pools which are built on top of rx-recycling.
> The change from v1 of generic rx-recycling is that the list access is
> unlocked instead of locked.
> Patch six which introduces the emergency pools adds the locking back.
> This is required since we now have two not serialized users. In order
> not to punish everyone patch eight removes this locking again. That
> patch converts only two drivers so you have an idea what I think is
> required to get the locking removed.
> 
> The idea behind emergency pools is to have pre-allocated skbs for TX and
> RX. Using the memory allocator for it leads to latencies during memory
> pressure. The pre-allocated skb are "tagged" and should get back to the
> pool once they are through the stack so the pool should never get
> exhausted.
> 
> While it was easy to convert the drivers which share the same concept of
> rx-recycling to use the emergency pools it was difficult to hook up the
> more complex drivers like e1000e. The e1000e can use split skbs / a frag
> list which is different from the allocation currently used. So instead of
> forcing all drivers to use the same way of doing things I've been thinking
> about providing a dedicated callback for skb allocation and checking if
> this skb is "good enough". This is not yet implemented.
> 
> I would be glad to receive some feedback on this patch series before I go
> any further. Unfortunately I'm on vacation for the next two weeks so I
> can't respond earlier. tglx is on Cc and should be able respond earlier :)
> 
> Sebastian

Sebastian

I read all patches, and my initial feeling is all this is very complex
and have many shortcomings.

Most modern NICS are multiqueue, so that each cpu can use a queue on its
own without slowing down other cpus.

Yet rx recycling has one queue per device, defeating part of the
multiqueue goal.

Patch 6/8 even touches dev->refcnt on each emerg packet
Patch 6/8 adds 8 bytes (emerg_dev) to skb. Oh well...

Adding cache layers, especially dumb ones like this one, is probably the
sign something more fundamental is broken somewhere.

I do believe for example that netdev_alloc_skb() should not try to use
the node affinity of the device, but use current cpu node for sk_buff at
least, and possibly for data part too.

One other problem of skb are the two memory blocs involved, and fact
that first one (skb) is already very big and fat, and filled/dirtied
many cycles before its use in RX path.

Maybe its time to provide new API, so that a driver can build an skb at
the time RX interrupt is handled, not at the time the rx ring buffer is
renewed. RX ring should only provide the data part to NIC, and skb
should be built when NIC delivers the frame, so that we provide to IP
stack a real hot skb.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html