[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <YX/FlQKJNea3c4/B@Laptop-X1>
Date: Mon, 1 Nov 2021 18:46:45 +0800
From: Hangbin Liu <liuhangbin@...il.com>
To: Jay Vosburgh <jay.vosburgh@...onical.com>
Cc: netdev@...r.kernel.org, Veaceslav Falico <vfalico@...il.com>,
Andy Gospodarek <andy@...yhouse.net>,
Jiri Pirko <jiri@...nulli.us>,
Jonathan Toppins <jtoppins@...hat.com>
Subject: Re: [Draft PATCH net-next] Bonding: add missed_max option
Hi Jay, Denis,
Thanks for the comments. Please see my replies below.
On Fri, Oct 29, 2021 at 03:11:18PM -0700, Jay Vosburgh wrote:
> >+missed_max
> >+
> >+ Maximum number of arp_interval for missed ARP replies.
> >+ If this number is exceeded, link is reported as down.
> >+
> >+ Note a small value means a strict time. e.g. missed_max is 1 means
> >+ the correct arp reply must be recived during the interval.
> >+
> >+ default 3
>
> I'd suggest "arp" in the option name to make the scope more
> obvious.
I didn't add arp prefix in purpose because I'd like to re-use this parameter
after adding bonding IPv6 NS/NA support. I will add this reason in the commit
description.
Or if you like to make a difference for ARP and IPv6 NS, I can add the arp_
prefix.
> >@@ -3224,7 +3224,7 @@ static int bond_ab_arp_inspect(struct bonding *bond)
> >
> > /* Backup slave is down if:
> > * - No current_arp_slave AND
> >- * - more than 3*delta since last receive AND
> >+ * - more than missed_max*delta since last receive AND
> > * - the bond has an IP address
> > *
> > * Note: a non-null current_arp_slave indicates
> >@@ -3236,20 +3236,20 @@ static int bond_ab_arp_inspect(struct bonding *bond)
> > */
> > if (!bond_is_active_slave(slave) &&
> > !rcu_access_pointer(bond->current_arp_slave) &&
> >- !bond_time_in_interval(bond, last_rx, 3)) {
> >+ !bond_time_in_interval(bond, last_rx, bond->params.missed_max)) {
> > bond_propose_link_state(slave, BOND_LINK_DOWN);
> > commit++;
> > }
> >
> > /* Active slave is down if:
> >- * - more than 2*delta since transmitting OR
> >- * - (more than 2*delta since receive AND
> >+ * - more than missed_max*delta since transmitting OR
> >+ * - (more than missed_max*delta since receive AND
> > * the bond has an IP address)
> > */
> > trans_start = dev_trans_start(slave->dev);
> > if (bond_is_active_slave(slave) &&
> >- (!bond_time_in_interval(bond, trans_start, 2) ||
> >- !bond_time_in_interval(bond, last_rx, 2))) {
> >+ (!bond_time_in_interval(bond, trans_start, bond->params.missed_max) ||
> >+ !bond_time_in_interval(bond, last_rx, bond->params.missed_max))) {
> > bond_propose_link_state(slave, BOND_LINK_DOWN);
> > commit++;
> > }
>
> The above two changes make the backup and active logic both
> switch to using the missed_max value (i.e., both set to the same value),
> when previously these two cases used differing values (2 for active, 3
> for backup).
>
> Historically, these intervals were staggered deliberately; an
> old comment removed by b2220cad583c9b states:
>
> if ((slave != bond->curr_active_slave) &&
> (!bond->current_arp_slave) &&
> (time_after_eq(jiffies, slave_last_rx(bond, slave) + 3*delta_in_ticks))) {
> /* a backup slave has gone down; three times
> * the delta allows the current slave to be
> * taken out before the backup slave.
>
> I think it would be prudent to insure that having the active and
> backup timeouts set in lockstep does not result in an undesirable change
> of behavior.
Yes, I'm also a little concern about this. What about make the backup slave
timeout 1 plus delta then active slave. i.e. for backup slave
bond_time_in_interval(bond, last_rx, bond->params.missed_max + 1)
> >@@ -270,6 +272,15 @@ static const struct bond_option bond_opts[BOND_OPT_LAST] = {
> > .values = bond_intmax_tbl,
> > .set = bond_option_arp_interval_set
> > },
> >+ [BOND_OPT_MISSED_MAX] = {
> >+ .id = BOND_OPT_MISSED_MAX,
> >+ .name = "missed_max",
> >+ .desc = "Maximum number of missed ARP interval",
> >+ .unsuppmodes = BIT(BOND_MODE_8023AD) | BIT(BOND_MODE_TLB) |
> >+ BIT(BOND_MODE_ALB),
> >+ .values = bond_intmax_tbl,
>
> This allows missed_max to be set to 0; is that intended to be a
> valid setting?
In bond_time_in_interval() there is an arp_interval/2 extra time. Do you think
if this is enough for fast network when we set missed_max to 0?
Of course in the very fast network the value should be at lease 1 in case
the ARP send/recv time frame is same.
Thanks
Hangbin
Powered by blists - more mailing lists