[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6be07cb6-7dab-4125-b9e5-0bd4c42235fe@redhat.com>
Date: Tue, 30 Sep 2025 09:21:30 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: David Wilder <wilder@...ibm.com>, netdev@...r.kernel.org
Cc: jv@...sburgh.net, pradeeps@...ux.vnet.ibm.com, pradeep@...ibm.com,
i.maximets@....org, amorenoz@...hat.com, haliu@...hat.com,
stephen@...workplumber.org, horms@...nel.org, kuba@...nel.org,
andrew+netdev@...n.ch, edumazet@...gle.com
Subject: Re: [PATCH net-next v11 0/7] bonding: Extend arp_ip_target format to
allow for a list of vlan tags.
On 9/30/25 3:27 AM, David Wilder wrote:
> The current implementation of the arp monitor builds a list of vlan-tags by
> following the chain of net_devices above the bond. See bond_verify_device_path().
> Unfortunately, with some configurations, this is not possible. One example is
> when an ovs switch is configured above the bond.
>
> This change extends the "arp_ip_target" parameter format to allow for a list of
> vlan tags to be included for each arp target. This new list of tags is optional
> and may be omitted to preserve the current format and process of discovering
> vlans.
>
> The new format for arp_ip_target is:
> arp_ip_target ipv4-address[vlan-tag\...],...
>
> For example:
> arp_ip_target 10.0.0.1[10/20]
> arp_ip_target 10.0.0.1[] (used to disable vlan discovery)
>
> Changes since V10
> Thanks Paolo:
> - 1/7 Changed the layout of struct bond_arp_target to reduce size of the struct.
> - 3/7 Fixed format 'size-num' -> 'size - num'
> - 7/7 Updated selftest (bond-arp-ip-target.sh). Removed sleep 10 in check_failure_count().
> Added call to tc to verify arp probes are reaching the target interface. Then I verify that
> the Link Failure counts are not increasing over "time". Arp probes are sent every 100ms,
> two missed probes will trigger a Link failure. A one second wait between checking counts
> should be be more than sufficient. This speeds up the execution of the test.
>
> Thanks Nikolay:
> - 4/7 In bond_option_arp_ip_targets_clear() I changed the definition of empty_target to empty_target = {}.
> - bond_validate_tags() now verifies input is a multiple of sizeof(struct bond_vlan_tag).
> Updated VID validity check to use: !tags->vlan_id || tags->vlan_id >= VLAN_VID_MASK) as suggested.
> - In bond_option_arp_ip_targets_set() removed the redundant length check of target.target_ip.
> - Added kfree(target.tags) when bond_option_arp_ip_target_add() results in an error.
> - Removed the caching of struct bond_vlan_tag returned by bond_verify_device_path(), Nikolay
> pointed out that caching tags prevented the detection of VLAN configuration changes.
> Added a kfree(tags) for tags allocated in bond_verify_device_path().
>
> Jay, Nikolay and I had a discussion regarding locking when adding, deleting or changing vlan tags.
> Jay pointed out that user supplied tags that are stashed in the bond configuration and can only be
> changed via user space this can be done safely in an RCU manner as netlink always operates with RTNL
> held. If user space provided tags and then replumbs things, it'll be on user space to update the tags
> in a safe manor.
>
> I was concerned about changing options on a configured bond, I found that attempting to change
> a bonds configuration (using "ip set") will abort the attempt to make a change if the bond's state is
> "UP" or has slaves configured. Therefor the configuration and operational side of a bond is separated.
> I agree with Jay that the existing locking scheme is sufficient.
>
> Change since V9
> Fix kdoc build error.
>
> Changes since V8:
> Moved the #define BOND_MAX_VLAN_TAGS from patch 6 to patch 3.
> Thanks Simon for catching the bisection break.
>
> Changes since V7:
> These changes should eliminate the CI failures I have been seeing.
> 1) patch 2, changed type of bond_opt_value.extra_len to size_t.
> 2) Patch 4, added bond_validate_tags() to validate the array of bond_vlan_tag provided by
> the user.
>
> Changes since V6:
> 1) I made a number of changes to fix the failure seen in the
> kernel CI. I am still unable to reproduce the this failure, hopefully I
> have fixed it. These change are in patch #4 to functions:
> bond_option_arp_ip_targets_clear() and
> bond_option_arp_ip_targets_set()
>
> Changes since V5: Only the last 2 patches have changed since V5.
> 1) Fixed sparse warning in bond_fill_info().
> 2) Also in bond_fill_info() I resolved data.addr uninitialized when if condition is not met.
> Thank you Simon for catching this. Note: The change is different that what I shared earlier.
> 3) Fixed shellcheck warnings in test script: Blocked source warning, Ignored specific unassigned
> references and exported ALL_TESTS to resolve a reference warning.
>
> Changes since V4:
> 1)Dropped changes to proc and sysfs APIs to bonding. These APIs
> do not need to be updated to support new functionality. Netlink
> and iproute2 have been updated to do the right thing, but the
> other APIs are more or less frozen in the past.
>
> 2)Jakub reported a warning triggered in bond_info_seq_show() during
> testing. I was unable to reproduce this warning or identify
> it with code inspection. However, all my changes to bond_info_seq_show()
> have been dropped as unnecessary (see above).
> Hopefully this will resolve the issue.
>
> 3)Selftest script has been updated based on the results of shellcheck.
> Two unresolved references that are not possible to resolve are all
> that remain.
>
> 4)A patch was added updating bond_info_fill()
> to support "ip -d show <bond-device>" command.
>
> The inclusion of a list of vlan tags is optional. The new logic
> preserves both forward and backward compatibility with the kernel
> and iproute2 versions.
>
> Changes since V3:
> 1) Moved the parsing of the extended arp_ip_target out of the kernel and into
> userspace (ip command). A separate patch to iproute2 to follow shortly.
> 2) Split up the patch set to make review easier.
>
> Please see iproute changes in a separate posting.
>
> Thank you for your time and reviews.
>
> Signed-off-by: David Wilder <wilder@...ibm.com>
Not a formal reviews, but this is apparently causing self tests failures:
[ 274.739699][ T2985] Oops: general protection fault, probably for
non-canonical address 0xe2b11fc1bf7efa00: 0000 [#1] SMP KASAN
[ 274.740206][ T2985] KASAN: maybe wild-memory-access in range
[0x15891e0dfbf7d000-0x15891e0dfbf7d007]
[ 274.740513][ T2985] CPU: 0 UID: 0 PID: 2985 Comm: ip Not tainted
6.17.0-rc7-virtme #1 PREEMPT(full)
[ 274.740820][ T2985] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 274.741034][ T2985] RIP: 0010:bond_fill_info+0x4e9/0x17c0
[ 274.741271][ T2985] Code: 5d 08 41 b9 08 00 00 00 48 8d ac 24 ee 00
00 00 45 31 f6 41 29 d9 4d 63 c6 49 83 f8 05 0f 87 98 0f 00 00 48 89 d8
48 c1 e8 03 <42> 0f b6 14 20 48 89 d8 83 e0 07 83 c0 01 38 d0 7c 08 84
d2 0f 85
[ 274.741933][ T2985] RSP: 0018:ffffc900056f6db8 EFLAGS: 00010206
[ 274.742160][ T2985] RAX: 02b123c1bf7efa00 RBX: 15891e0dfbf7d000 RCX:
0000000000000000
[ 274.742475][ T2985] RDX: 0000000000000004 RSI: 0000000000000000 RDI:
ffff88800aaa2ef0
[ 274.742752][ T2985] RBP: ffffc900056f6ea6 R08: 0000000000000000 R09:
0000000004083008
[ 274.743027][ T2985] R10: 0000000000000000 R11: ffffffffb6d6ff00 R12:
dffffc0000000000
[ 274.743302][ T2985] R13: ffff88800aaa2ee8 R14: 0000000000000000 R15:
0000000000000000
[ 274.743572][ T2985] FS: 00007fdb2f61a800(0000)
GS:ffff8880783ac000(0000) knlGS:0000000000000000
[ 274.743871][ T2985] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 274.744092][ T2985] CR2: 00000000004e5a38 CR3: 0000000008c5c006 CR4:
0000000000772ef0
[ 274.744381][ T2985] PKRU: 55555554
[ 274.744515][ T2985] Call Trace:
[ 274.744645][ T2985] <TASK>
[ 274.745942][ T2985] rtnl_link_fill+0x22e/0x890[ 274.739699][ T2985]
full oops:
https://netdev-3.bots.linux.dev/vmksft-bonding-dbg/results/320021/1-bond-options-sh/stderr
more:
https://netdev-3.bots.linux.dev/vmksft-bonding-dbg/results/320021/8-bond-arp-interval-causes-panic-sh/stderr
https://netdev-3.bots.linux.dev/vmksft-bonding-dbg/results/320021/15-bond-arp-ip-target-sh/stderr
Also:
## Form letter - net-next-closed
The merge window for v6.18 has begun and therefore net-next is closed
for new drivers, features, code refactoring and optimizations. We are
currently accepting bug fixes only.
Please repost when net-next reopens after June 8th.
RFC patches sent for review only are obviously welcome at any time.
Powered by blists - more mailing lists