[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <47EBEDC9.6080100@cosmosbay.com>
Date: Thu, 27 Mar 2008 19:56:09 +0100
From: Eric Dumazet <dada1@...mosbay.com>
To: Denys Fedoryshchenko <denys@...p.net.lb>
Cc: David Miller <davem@...emloft.net>, kaber@...sh.net,
netdev@...r.kernel.org, netfilter-devel@...r.kernel.org
Subject: Re: kernel 2.6.25-rc7 highly unstable on high load
Denys Fedoryshchenko a écrit :
> Here is output from dmesg with patch you supplied.
>
> Kup /config # rtstat -i60 -c60
> rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> rt_cache|
> entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
> out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
> out_hlis|
> | | tot| mc| ute| | an_dst|
> an_src| | _tot| _mc| | ed| miss| verflow|
> _search|t_search|
> 103266| 69986| 11574| 0| 54| 0| 0| 0|
> 1982| 634| 0| 10984| 10980| 0| 0| 191808| 5254|
> 124787| 45125| 6019| 0| 28| 0| 0| 0|
> 807| 230| 0| 6277| 6274| 0| 0| 128922| 2518|
> 120270| 45588| 6288| 0| 30| 0| 0| 0|
> 883| 214| 0| 6532| 6529| 0| 0| 125651| 2743|
> 122253| 46522| 6582| 0| 27| 0| 0| 0|
> 897| 213| 0| 6822| 6819| 0| 0| 124927| 2761|
>
> [ 102.534363] dst_total: 120397 delayed: 12 work_perf: 0 expires: 27999
> elapsed: 1 us
> [ 130.530240] dst_total: 124277 delayed: 12 work_perf: 0 expires: 32998
> elapsed: 2 us
> [ 163.523240] dst_total: 110006 delayed: 12 work_perf: 0 expires: 39000
> elapsed: 1 us
> [ 202.519402] dst_total: 130453 delayed: 12 work_perf: 0 expires: 45998
> elapsed: 1 us
> [ 248.511220] dst_total: 110637 delayed: 12 work_perf: 0 expires: 52600
> elapsed: 2 us
> [ 301.102445] dst_total: 129366 delayed: 12 work_perf: 0 expires: 60696
> elapsed: 6 us
>
> After while
>
> Kup /config # rtstat -i300 -c60
> rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> rt_cache|
> entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
> out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
> out_hlis|
> | | tot| mc| ute| | an_dst|
> an_src| | _tot| _mc| | ed| miss| verflow|
> _search|t_search|
> 129138| 64166| 9864| 0| 46| 0| 0| 0|
> 1334| 358| 0| 10012| 10008| 0| 0| 173006| 3985|
> 130125| 43971| 10829| 0| 70| 0| 0| 0|
> 907| 195| 0| 11093| 11090| 0| 0| 100601| 2848|
> 144857| 32041| 22233| 0| 234| 0| 0| 0|
> 738| 261| 0| 22727| 22724| 2| 0| 42769| 811|
>
> [ 687.740779] dst_total: 129365 delayed: 12 work_perf: 0 expires: 104999
> elapsed: 1 us
> [ 792.736043] dst_total: 127247 delayed: 101589 work_perf: 0 expires: 600
> elapsed: 11274 us
> [ 793.347127] dst_total: 122440 delayed: 101589 work_perf: 0 expires: 1600
> elapsed: 11150 us
> [ 794.957419] dst_total: 123780 delayed: 101589 work_perf: 0 expires: 3100
> elapsed: 10896 us
> [ 798.068614] dst_total: 121992 delayed: 101589 work_perf: 0 expires: 5655
> elapsed: 11561 us
> [ 803.734450] dst_total: 120552 delayed: 101589 work_perf: 0 expires: 7988
> elapsed: 12254 us
> [ 811.733227] dst_total: 121105 delayed: 101589 work_perf: 0 expires: 10988
> elapsed: 12209 us
> [ 822.733196] dst_total: 120142 delayed: 101589 work_perf: 0 expires: 13987
> elapsed: 12039 us
> [ 836.730637] dst_total: 120823 delayed: 101589 work_perf: 0 expires: 17987
> elapsed: 12448 us
> [ 854.726913] dst_total: 127478 delayed: 101589 work_perf: 0 expires: 22988
> elapsed: 11435 us
> [ 877.724827] dst_total: 127497 delayed: 101589 work_perf: 0 expires: 27987
> elapsed: 12833 us
> [ 905.719818] dst_total: 126693 delayed: 101589 work_perf: 0 expires: 32987
> elapsed: 12068 us
> [ 938.714595] dst_total: 130740 delayed: 101589 work_perf: 0 expires: 38988
> elapsed: 11970 us
> [ 977.707568] dst_total: 135449 delayed: 101589 work_perf: 0 expires: 45989
> elapsed: 11747 us
> [ 1023.701554] dst_total: 141181 delayed: 101589 work_perf: 0 expires: 52988
> elapsed: 12653 us
> [ 1076.694012] dst_total: 147895 delayed: 101588 work_perf: 1 expires: 59987
> elapsed: 12217 us
> [ 1136.684805] dst_total: 154798 delayed: 101588 work_perf: 0 expires: 67987
> elapsed: 12026 us
> [ 1204.673424] dst_total: 163376 delayed: 101588 work_perf: 0 expires: 76988
> elapsed: 11151 us
>
> and after 30 minutes more
>
> Kup /config # rtstat -i300 -c60
> rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> rt_cache|
> entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
> out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
> out_hlis|
> | | tot| mc| ute| | an_dst|
> an_src| | _tot| _mc| | ed| miss| verflow|
> _search|t_search|
> 194841| 176962| 77964| 0| 684| 1| 0| 0|
> 3939| 1143| 0| 79531| 79516| 7| 0| 401258| 9455|
> 223224| 28262| 26499| 0| 236| 0| 0| 0|
> 661| 216| 0| 26951| 26947| 3| 0| 93059| 1946|
> 248425| 28257| 26784| 0| 226| 0| 0| 0|
> 678| 202| 0| 27211| 27208| 3| 0| 115593| 2412|
> 272823| 28943| 26728| 0| 246| 0| 0| 0|
> 695| 200| 0| 27174| 27171| 3| 0| 135256| 2942|
> 295087| 28835| 26701| 0| 252| 0| 0| 0|
> 687| 194| 0| 27146| 27143| 3| 0| 154269| 3312|
>
>
> [ 2043.547879] dst_total: 242922 delayed: 101588 work_perf: 0 expires: 119987
> elapsed: 12519 us
> [ 2163.529997] dst_total: 252951 delayed: 101588 work_perf: 0 expires: 119987
> elapsed: 13930 us
> [ 2283.512215] dst_total: 262678 delayed: 101588 work_perf: 0 expires: 119986
> elapsed: 14330 us
> [ 2403.508447] dst_total: 271930 delayed: 101588 work_perf: 0 expires: 119972
> elapsed: 15230 us
> [ 2523.476844] dst_total: 288046 delayed: 101588 work_perf: 0 expires: 119985
> elapsed: 15018 us
> [ 2643.456996] dst_total: 296475 delayed: 101588 work_perf: 0 expires: 119987
> elapsed: 13654 us
> [ 2763.438267] dst_total: 305033 delayed: 101588 work_perf: 0 expires: 119987
> elapsed: 13137 us
>
>
Very interesting... you can see 101588 dst are *delayed* in dst_garbage,
but apparently never freed.
Something is wrong, but the count seems stable. Must be some kind of
event, admin driven or something...
might be the "ip route flush cache" that is schedlued around 600 seconds
of machine alive, then secret_interval seconds later...
Typically, when a "ip route flush cache" is done (manually or triggered
by secret_interval timer), refcounted>0 entries are put into dst_garbage.
Then when some trafic occurs on the flows involved, IP stack should
decrement refcount so that next dst_garbage round can free the deleted
entries. Normal TCP connections are doing this correctly. On your
machine, nothing.
Possibly idle tcp sessions ?
Maybe some netfilter problem ?
Please tell us more about your machine ;)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists