[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <DCDBA54C-C35B-497D-BB39-224C88B94660@joelfernandes.org>
Date: Thu, 17 Nov 2022 16:58:26 -0500
From: Joel Fernandes <joel@...lfernandes.org>
To: Eric Dumazet <edumazet@...gle.com>
Cc: linux-kernel@...r.kernel.org, Cong Wang <xiyou.wangcong@...il.com>,
David Ahern <dsahern@...nel.org>,
"David S. Miller" <davem@...emloft.net>,
Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
Jakub Kicinski <kuba@...nel.org>,
Jamal Hadi Salim <jhs@...atatu.com>,
Jiri Pirko <jiri@...nulli.us>, netdev@...r.kernel.org,
Paolo Abeni <pabeni@...hat.com>, rcu@...r.kernel.org,
rostedt@...dmis.org, paulmck@...nel.org, fweisbec@...il.com
Subject: Re: [PATCH rcu/dev 1/3] net: Use call_rcu_flush() for qdisc_free_cb
> On Nov 17, 2022, at 4:44 PM, Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Wed, Nov 16, 2022 at 7:16 PM Joel Fernandes (Google)
> <joel@...lfernandes.org> wrote:
>>
>> In a networking test on ChromeOS, we find that using the new CONFIG_RCU_LAZY
>> causes a networking test to fail in the teardown phase.
>>
>> The failure happens during: ip netns del <name>
>>
>> Using ftrace, I found the callbacks it was queuing which this series fixes. Use
>> call_rcu_flush() to revert to the old behavior. With that, the test passes.
>>
>> Signed-off-by: Joel Fernandes (Google) <joel@...lfernandes.org>
>> ---
>> net/sched/sch_generic.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
>> index a9aadc4e6858..63fbf640d3b2 100644
>> --- a/net/sched/sch_generic.c
>> +++ b/net/sched/sch_generic.c
>> @@ -1067,7 +1067,7 @@ static void qdisc_destroy(struct Qdisc *qdisc)
>>
>> trace_qdisc_destroy(qdisc);
>>
>> - call_rcu(&qdisc->rcu, qdisc_free_cb);
>> + call_rcu_flush(&qdisc->rcu, qdisc_free_cb);
>> }
>
> I took a look at this one.
>
> qdisc_free_cb() is essentially freeing : Some per-cpu memory, and the
> 'struct Qdisc'
>
> I do not see why we need to force a flush for this (small ?) piece of memory.
I’ll try to drop that and rerun the test, and get back to you. It could be that there is a different callback that this flush() is compensating for, or something. I am pretty sure at one point, dropping this patch made the test fail most of the time. Now it passes 100%.
I’ll also attempt to collect a complete trace, maybe I’ll learn some networking code in the process..
Thanks!
Powered by blists - more mailing lists