netdev - Re: [PATCH net-next v3 7/7] selftests: net: fdb_notify: Add a test for FDB notifications

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <871pzfjgc2.fsf@nvidia.com>
Date: Wed, 13 Nov 2024 16:11:03 +0100
From: Petr Machata <petrm@...dia.com>
To: Petr Machata <petrm@...dia.com>
CC: Jakub Kicinski <kuba@...nel.org>, <netdev@...r.kernel.org>
Subject: Re: [PATCH net-next v3 7/7] selftests: net: fdb_notify: Add a test
 for FDB notifications


Petr Machata <petrm@...dia.com> writes:

> Jakub Kicinski <kuba@...nel.org> writes:
>
>> On Mon, 11 Nov 2024 18:09:01 +0100 Petr Machata wrote:
>>> Check that only one notification is produced for various FDB edit
>>> operations.
>>> 
>>> Regarding the ip_link_add() and ip_link_master() helpers. This pattern of
>>> action plus corresponding defer is bound to come up often, and a dedicated
>>> vocabulary to capture it will be handy. tunnel_create() and vlan_create()
>>> from forwarding/lib.sh are somewhat opaque and perhaps too kitchen-sinky,
>>> so I tried to go in the opposite direction with these ones, and wrapped
>>> only the bare minimum to schedule a corresponding cleanup.
>>
>> Looks like it fails about half of the time :(
>>
>> https://netdev.bots.linux.dev/flakes.html?min-flip=0&tn-needle=fdb-notify&br-cnt=200
>
> OK, I can't reproduce this. Trying in VM, on an actual HW, debug, no
> debug, no luck. But I see basically two failures:
>
> - A "0 seen, 1 expected", which... I don't know, maybe it could just be
>   a misplaced sleep. I don't see how, but it's a deterministing
>   scenario, there shouldn't be anything racy here, either it emits or it
>   doesn't, so some buffering issue is the only thing I can think of.

I think this really could be just a "bridge monitor" taking a bit more
time to start every now and then. Can I have you test with this extra
chunk, or should I just resend with that change and hope for the best?

diff --git a/tools/testing/selftests/net/fdb_notify.sh b/tools/testing/selftests/net/fdb_notify.sh
index a98047361988..a8e04f08831c 100755
--- a/tools/testing/selftests/net/fdb_notify.sh
+++ b/tools/testing/selftests/net/fdb_notify.sh
@@ -26,6 +26,7 @@ do_test_dup()
 		bridge monitor fdb &> "$tmpf" &
 		defer kill_process $!
 
+		sleep 0.5
 		bridge fdb "$op" 00:11:22:33:44:55 vlan 1 "$@"
 		sleep 0.2
 	defer_scope_pop

> - Deadlocks. E.g. this, which looks like it deadlocked and timed out

Eh, these are ancient. Never mind.