[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <47C92F49.4070100@cosmosbay.com>
Date: Sat, 01 Mar 2008 11:26:17 +0100
From: Eric Dumazet <dada1@...mosbay.com>
To: David Miller <davem@...emloft.net>
CC: netdev@...r.kernel.org
Subject: Re: [RFC,PATCH] loopback: calls netif_receive_skb() instead of netif_rx()
David Miller a écrit :
> From: Eric Dumazet <dada1@...mosbay.com>
> Date: Thu, 21 Feb 2008 19:51:52 +0100
>
>> Following patch directly calls netif_receive_skb() and avoids lot of
>> atomic operations.
>> (atomic_inc(&dev->refcnt), set_and_set_bit(NAPI_STATE_SCHED, &n->state), ...
>> atomic_dec(&dev->refcnt)...), cache line ping-pongs on device refcnt,
>> but also softirq overhead.
>>
>> This gives a nice boost on tbench for example (5 % on my machine)
>
> My only concern is stack usage.
>
> Note that packet reception can elicit a response and go all the way
> back into this driver and all the way down into netif_receive_skb()
> again. And so on and so forth.
>
> If there is some bug in the stack (ACK'ing ACKs, stuff like that) we
> could get into a loop and overrun the kernel stack in no time at all.
>
> So, if anything, this change could make inconvenient errors become
> catastrophic and hard to diagnose.
You are absolutly right. We should guard against recursion, using a new field
in "pcpu_lstats" (cheap access in a hot cache line as we have to update stats
anyway)
Thank you
[PATCH] loopback: calls netif_receive_skb() instead of netif_rx()
Loopback transmit function loopback_xmit() actually calls netif_rx() to queue
a skb to the softnet queue, and arms a softirq so that this skb can be handled
later.
This has a cost on SMP, because we need to hold a reference on the device, and
free this reference when softirq dequeues packet.
Following patch directly calls netif_receive_skb() and avoids lot of atomic
operations.
(atomic_inc(&dev->refcnt), set_and_set_bit(NAPI_STATE_SCHED, &n->state), ...
atomic_dec(&dev->refcnt)...), cache line ping-pongs on device refcnt, but also
softirq overhead.
This gives a nice boost on tbench for example (5 % on my machine)
We want to limit recursion, in case network stack wants to re-enter
loopback_xmit(). We use a depth field (per cpu), so that we avoid stack
overflow, queueing the packet instead of trying to directly handle it.
Signed-off-by: Eric Dumazet <dada1@...mosbay.com>
View attachment "loopback.patch" of type "text/plain" (761 bytes)
Powered by blists - more mailing lists