[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6b4db767-3fbd-66df-79c4-f0d78b27b9ee@gmail.com>
Date: Fri, 22 Jul 2022 04:01:13 +0900
From: Taehee Yoo <ap420073@...il.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: David Miller <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
David Ahern <dsahern@...nel.org>,
netdev <netdev@...r.kernel.org>
Subject: Re: [PATCH net] net: mld: do not use system_wq in the mld
On 7/22/22 03:34, Eric Dumazet wrote:
> On Thu, Jul 21, 2022 at 7:53 PM Taehee Yoo <ap420073@...il.com> wrote:
>>
>> Hi Eric,
>> Thank you so much for your review!
>>
>
> ...
>
>> I think your assumption is right.
>> I tested the below scenario, which occurs the real issue.
>> THREAD0 THREAD1
>> mld_report_work()
>> spin_lock_bh()
>> if (!mod_delayed_work()) <-- queued
>> in6_dev_hold();
>> spin_unlock_bh()
>> spin_lock_bh()
>> schedule_delayed_work() <-- return false, already queued by THREAD1
>> spin_unlock_bh()
>> return;
>> //no in6_dev_put() regardless return value of schedule_delayed_work().
>>
>> In order to check, I added printk like below.
>> if (++cnt >= MLD_MAX_QUEUE) {
>>
>> rework = true;
>>
>> if (!schedule_delayed_work(&idev->mc_report_work, 0))
>> printk("[TEST]%s %u \n", __func__, __LINE__);
>> break;
>>
>>
>> If the TEST log message is printed, work is already queued by other
logic.
>> So, it indicates a reference count is leaked.
>> The result is that I can see log messages only when the reference count
>> leak occurs.
>> So, although I tested it only for 1 hour, I'm sure that this bug comes
>> from missing check a return value of schedule_delayed_work().
>>
>> As you said, this changelog is not correct.
>> system_wq and mld_wq are not related to this issue.
>>
>> I would like to send a v2 patch after some more tests.
>> The v2 patch will change the commit message.
>
> Can you describe what kind of tests you are running ?
> Was it a syzbot report ?
I found this bug while testing another syzbot report.
(https://syzkaller.appspot.com/bug?id=ed41eaa4367b421d37aab5dee25e3f4c91ceae93)
And I can't find the same case in the syzbot reports list.
I just use some command lines and many kernel debug options such as
kmemleak, kasan, lockdep, and others.
a=ns-$RANDOM
b=ns-$RANDOM
ip netns add $a
ip netns add $b
echo 'file net/ipv6/addrconf.c +p' >
/sys/kernel/debug/dynamic_debug/control
echo 'file net/ipv6/addrconf_core.c +p' >
/sys/kernel/debug/dynamic_debug/control
echo 'file net/ipv6/mcast.c +p' >
/sys/kernel/debug/dynamic_debug/control
ip netns exec $a sysctl net.ipv6.mld_max_msf=8024 -w
ip netns exec $b sysctl net.ipv6.mld_max_msf=8024 -w
ip netns exec $a ip link add br0 type bridge mcast_querier 1
mcast_query_interval 100 mcast_mld_version 1
ip netns exec $b ip link add br1 type bridge mcast_querier 1
mcast_query_interval 100 mcast_mld_version 1
ip netns exec $a ip link set br0 up
ip netns exec $b ip link set br1 up
for i in {0..5}
do
ip netns exec $a ip link add vetha$i type veth peer name vethb$i
ip netns exec $a ip link set vethb$i netns $b
ip netns exec $a ip link set vetha$i master br0
ip netns exec $a ip link set vetha$i up
ip netns exec $b ip link set vethb$i master br1
ip netns exec $b ip link set vethb$i up
done
sleep 10
ip netns del $a
ip netns del $b
This script is not a real use case.
It just generates *looping* packets and they contain many mld packets
due to the bridge query option.
I think a packet generator would be more useful to reproduce this bug.
Powered by blists - more mailing lists