lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 17 Feb 2024 12:39:44 +0000
From: Praveen Kannoju <praveen.kannoju@...cle.com>
To: Hangbin Liu <liuhangbin@...il.com>
CC: "j.vosburgh@...il.com" <j.vosburgh@...il.com>,
        "andy@...yhouse.net"
	<andy@...yhouse.net>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "edumazet@...gle.com" <edumazet@...gle.com>,
        "kuba@...nel.org"
	<kuba@...nel.org>,
        "pabeni@...hat.com" <pabeni@...hat.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Rajesh
 Sivaramasubramaniom <rajesh.sivaramasubramaniom@...cle.com>,
        Rama
 Nichanamatlu <rama.nichanamatlu@...cle.com>,
        Manjunath Patil
	<manjunath.b.patil@...cle.com>
Subject: RE: [PATCH RFC] bonding: rate-limit bonding driver inspect messages

> -----Original Message-----
> From: Hangbin Liu <liuhangbin@...il.com>
> Sent: 16 February 2024 02:33 PM
> To: Praveen Kannoju <praveen.kannoju@...cle.com>
> Cc: j.vosburgh@...il.com; andy@...yhouse.net; davem@...emloft.net; edumazet@...gle.com; kuba@...nel.org;
> pabeni@...hat.com; netdev@...r.kernel.org; linux-kernel@...r.kernel.org; Rajesh Sivaramasubramaniom
> <rajesh.sivaramasubramaniom@...cle.com>; Rama Nichanamatlu <rama.nichanamatlu@...cle.com>; Manjunath Patil
> <manjunath.b.patil@...cle.com>
> Subject: Re: [PATCH RFC] bonding: rate-limit bonding driver inspect messages
> 
> On Thu, Feb 15, 2024 at 10:55:54PM +0530, Praveen Kumar Kannoju wrote:
> > Rate limit bond driver log messages, to prevent a log flood in a
> > run-away situation, e.g couldn't get rtnl lock. Message flood leads to
> > instability of system and loss of other crucial messages.
> 
> Hi Praveen,
> 
> The patch looks good to me. But would you please help explain why these
> slave_info() are chosen under net_ratelimit?
> 
> Thanks
> Hangbin

Thank you, Hangbin.

The routine bond_mii_monitor() periodically inspects the slave carrier state in order to detect for state changes, on a state change internally records it and does the state change action.

Parked-to-Parked state changes goes through transient state. As an example for Up to Down, BOND_LINK_UP to BOND_LINK_DOWN, is thru BOND_LINK_FAIL.  In order to attain next parked state or transient state bond needs rtnl mutex If in a situation it cannot get it, a state change wouldn't happen.  In order to achieve a state change as quickly as possible  bond_mii_monitor() reschedules itself to come around after 1 msec. And every single come around reinspects the link and sees a state change compared to its internally recorded, which in reality internal state could be not changed earlier as failed to get rtnl lock, and throws again log indicating it sees a state change If attaining rtnl mutex take long say hypothetical 5 secs, then bond logs 5000 state change message. 1 message at every 1 msec. 

And in production environments we have seen bond taking long to achieve a state as someone else holding rtnl. Many processes do get rtnl lock.  As an example we can see eth drivers. They hold rtnl mutex for the entire duration while performing a fault recovery. There are many such scenarios.

This patch doesn't change -how- bond functions. It only simply limits this kind of log flood. 

-
Praveen.
> >
> > v2: Use exising net_ratelimit() instead of introducing new rate-limit
> > parameter.
> >
> > Signed-off-by: Praveen Kumar Kannoju <praveen.kannoju@...cle.com>
> > ---
> >  drivers/net/bonding/bond_main.c | 36
> > ++++++++++++++++++++----------------
> >  1 file changed, 20 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/net/bonding/bond_main.c
> > b/drivers/net/bonding/bond_main.c index 4e0600c..e92eba1 100644
> > --- a/drivers/net/bonding/bond_main.c
> > +++ b/drivers/net/bonding/bond_main.c
> > @@ -2610,12 +2610,13 @@ static int bond_miimon_inspect(struct bonding *bond)
> >  			commit++;
> >  			slave->delay = bond->params.downdelay;
> >  			if (slave->delay) {
> > -				slave_info(bond->dev, slave->dev, "link status down for %sinterface, disabling it in %d ms\n",
> > -					   (BOND_MODE(bond) ==
> > -					    BOND_MODE_ACTIVEBACKUP) ?
> > -					    (bond_is_active_slave(slave) ?
> > -					     "active " : "backup ") : "",
> > -					   bond->params.downdelay * bond->params.miimon);
> > +				if (net_ratelimit())
> > +					slave_info(bond->dev, slave->dev, "link status down for %sinterface, disabling it in %d ms\n",
> > +						   (BOND_MODE(bond) ==
> > +						   BOND_MODE_ACTIVEBACKUP) ?
> > +						   (bond_is_active_slave(slave) ?
> > +						   "active " : "backup ") : "",
> > +						   bond->params.downdelay * bond->params.miimon);
> >  			}
> >  			fallthrough;
> >  		case BOND_LINK_FAIL:
> > @@ -2623,9 +2624,10 @@ static int bond_miimon_inspect(struct bonding *bond)
> >  				/* recovered before downdelay expired */
> >  				bond_propose_link_state(slave, BOND_LINK_UP);
> >  				slave->last_link_up = jiffies;
> > -				slave_info(bond->dev, slave->dev, "link status up again after %d ms\n",
> > -					   (bond->params.downdelay - slave->delay) *
> > -					   bond->params.miimon);
> > +				if (net_ratelimit())
> > +					slave_info(bond->dev, slave->dev, "link status up again after %d ms\n",
> > +						   (bond->params.downdelay - slave->delay) *
> > +						   bond->params.miimon);
> >  				commit++;
> >  				continue;
> >  			}
> > @@ -2648,18 +2650,20 @@ static int bond_miimon_inspect(struct bonding *bond)
> >  			slave->delay = bond->params.updelay;
> >
> >  			if (slave->delay) {
> > -				slave_info(bond->dev, slave->dev, "link status up, enabling it in %d ms\n",
> > -					   ignore_updelay ? 0 :
> > -					   bond->params.updelay *
> > -					   bond->params.miimon);
> > +				if (net_ratelimit())
> > +					slave_info(bond->dev, slave->dev, "link status up, enabling it in %d ms\n",
> > +						   ignore_updelay ? 0 :
> > +						   bond->params.updelay *
> > +						   bond->params.miimon);
> >  			}
> >  			fallthrough;
> >  		case BOND_LINK_BACK:
> >  			if (!link_state) {
> >  				bond_propose_link_state(slave, BOND_LINK_DOWN);
> > -				slave_info(bond->dev, slave->dev, "link status down again after %d ms\n",
> > -					   (bond->params.updelay - slave->delay) *
> > -					   bond->params.miimon);
> > +				if (net_ratelimit())
> > +					slave_info(bond->dev, slave->dev, "link status down again after %d ms\n",
> > +						   (bond->params.updelay - slave->delay) *
> > +						   bond->params.miimon);
> >  				commit++;
> >  				continue;
> >  			}
> > --
> > 1.8.3.1
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ