[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4797A729.4030006@cosmosbay.com>
Date: Wed, 23 Jan 2008 21:44:25 +0100
From: Eric Dumazet <dada1@...mosbay.com>
To: joonwpark81@...il.com
CC: David Miller <davem@...emloft.net>, netdev@...r.kernel.org
Subject: Re: [PATCH] [IPV4] route: fix locking in rt_run_flush()
joonwpark81@...il.com a écrit :
> On Mon, Jan 21, 2008 at 02:40:43AM -0800, David Miller wrote:
>> From: Joonwoo Park <joonwpark81@...il.com>
>> Date: Tue, 22 Jan 2008 00:08:57 +0900
>>
>>> The rt_run_flush() can be stucked if it was called while netdev is on the
>>> high load.
>>> It's possible when pushing rtable to rt_hash is faster than pulling
>>> from it.
>>>
>>> Signed-off-by: Joonwoo Park <joonwpark81@...il.com>
>> I agree with the analysis of the problem, however not the solution.
>>
>> This will absolutely kill software interrupt latency.
>>
>> In fact, we have moved much of the flush work into a workqueue in
>> net-2.6.25 because of how important that is
>>
>> We need to find some other way to solve this.
>>
>
> Dave, Eric,
> Thanks so much for comments.
>
> I did stress tests and I found that the real problem was not consumer & supplier
> issue.
> It was the problem for me to innumerable enabling & disabling the softirq.
> But I'm still thinking need of considering issue 'faster caching than flush'. :)
>
> ifconfig up on heavy loaded interface.
> Before patching:
> time ifconfig eth1 up
> BUG: soft lockup - CPU#0 stuck for 11s! [events/0:9]
> ...
>
> After patching:
> time ifconfig eth1 up
> real 0m0.007s
> user 0m0.000s
> sys 0m0.004s
>
> Thanks!
> Joonwoo
>
>
>>>From 87c29506de967e811ad5b57cd2e1a002134e878f Mon Sep 17 00:00:00 2001
> From: Joonwoo Park <joonwpark81@...il.com>
> Date: Wed, 23 Jan 2008 15:16:54 +0900
> Subject: [PATCH] [IPV4] route: reduce locking/unlocking in rt_run_flush
>
> The rt_run_flush does spin_lock_bh/spin_unlock_bh for rt_hash_mask + 1
> times.
> The rt_hash_mask takes from 32767 to 65535, so it's big overhead.
> In addition, disable_bh/enable_bh for many times in the rt_run_flush
> can cause stuck on a machine with heavily pended softirqs.
>
> This patch reduces locking/unlocking as doing it with jumping the lock
> slots.
>
> ifconfig up on heavy loaded interface.
> Before:
> time ifconfig eth1 up
> BUG: soft lockup - CPU#0 stuck for 11s! [events/0:9]
> ...
>
> After:
> time ifconfig eth1 up
> real 0m0.007s
> user 0m0.000s
> sys 0m0.004s
>
Unfortunatly, your patch doesnt work on CONFIG_SMP=n (softirq will be disabled
for the whole scan of table)
Also, some machines around there have 2^22 slots in hash table, and NR_CPUS=4,
so softirqs will be disabled for a too long time.
Please try net-2.6.25 and submit patches on top of it if necessary, since
rt_run_flush() has pending changes, not in net-2.6
Note : The 'soft lockup' can be avoided by other means.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists