netdev - Re: [PATCH] [IPV4] route: fix locking in rt_run

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 23 Jan 2008 21:44:25 +0100
From:	Eric Dumazet <dada1@...mosbay.com>
To:	joonwpark81@...il.com
CC:	David Miller <davem@...emloft.net>, netdev@...r.kernel.org
Subject: Re: [PATCH] [IPV4] route: fix locking in rt_run_flush()

joonwpark81@...il.com a écrit :
> On Mon, Jan 21, 2008 at 02:40:43AM -0800, David Miller wrote:
>> From: Joonwoo Park <joonwpark81@...il.com>
>> Date: Tue, 22 Jan 2008 00:08:57 +0900
>>
>>> The rt_run_flush() can be stucked if it was called while netdev is on the 
>>> high load.
>>> It's possible when pushing rtable to rt_hash is faster than pulling
>>> from it.
>>>
>>> Signed-off-by: Joonwoo Park <joonwpark81@...il.com>
>> I agree with the analysis of the problem, however not the solution.
>>
>> This will absolutely kill software interrupt latency.
>>
>> In fact, we have moved much of the flush work into a workqueue in
>> net-2.6.25 because of how important that is
>>
>> We need to find some other way to solve this.
>>
> 
> Dave, Eric,
> Thanks so much for comments.
> 
> I did stress tests and I found that the real problem was not consumer & supplier
> issue.
> It was the problem for me to innumerable enabling & disabling the softirq.
> But I'm still thinking need of considering issue 'faster caching than flush'. :) 
> 
> ifconfig up on heavy loaded interface.
> Before patching:
>  time ifconfig eth1 up
>  BUG: soft lockup - CPU#0 stuck for 11s! [events/0:9]
>  ...
> 
> After patching:
>  time ifconfig eth1 up
> real	0m0.007s
> user	0m0.000s
> sys	0m0.004s
> 
> Thanks!
> Joonwoo
> 
> 
>>>From 87c29506de967e811ad5b57cd2e1a002134e878f Mon Sep 17 00:00:00 2001
> From: Joonwoo Park <joonwpark81@...il.com>
> Date: Wed, 23 Jan 2008 15:16:54 +0900
> Subject: [PATCH] [IPV4] route: reduce locking/unlocking in rt_run_flush
> 
> The rt_run_flush does spin_lock_bh/spin_unlock_bh for rt_hash_mask + 1
> times.
> The rt_hash_mask takes from 32767 to 65535, so it's big overhead.
> In addition, disable_bh/enable_bh for many times in the rt_run_flush
> can cause stuck on a machine with heavily pended softirqs.
> 
> This patch reduces locking/unlocking as doing it with jumping the lock
> slots.
> 
> ifconfig up on heavy loaded interface.
> Before:
>  time ifconfig eth1 up
>  BUG: soft lockup - CPU#0 stuck for 11s! [events/0:9]
>  ...
> 
> After:
>  time ifconfig eth1 up
> real	0m0.007s
> user	0m0.000s
> sys	0m0.004s
> 

Unfortunatly, your patch doesnt work on CONFIG_SMP=n (softirq will be disabled 
for the whole scan of table)

Also, some machines around there have 2^22 slots in hash table, and NR_CPUS=4, 
so softirqs will be disabled for a too long time.

Please try net-2.6.25 and submit patches on top of it if necessary, since 
rt_run_flush() has pending changes, not in net-2.6

Note : The 'soft lockup' can be avoided by other means.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html