lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 26 Jan 2023 14:22:54 +0100
From:   Andrew Lunn <andrew@...n.ch>
To:     David Laight <David.Laight@...lab.com>
Cc:     'Breno Leitao' <leitao@...ian.org>,
        "kuba@...nel.org" <kuba@...nel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "leit@...com" <leit@...com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "edumazet@...gle.com" <edumazet@...gle.com>,
        "pabeni@...hat.com" <pabeni@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Michael van der Westhuizen <rmikey@...a.com>
Subject: Re: [PATCH v3] netpoll: Remove 4s sleep during carrier detection

On Thu, Jan 26, 2023 at 09:04:42AM +0000, David Laight wrote:
> From: Breno Leitao
> > Sent: 25 January 2023 18:53
> > This patch removes the msleep(4s) during netpoll_setup() if the carrier
> > appears instantly.
> > 
> > Here are some scenarios where this workaround is counter-productive in
> > modern ages:
> > 
> > Servers which have BMC communicating over NC-SI via the same NIC as gets
> > used for netconsole. BMC will keep the PHY up, hence the carrier
> > appearing instantly.
> > 
> > The link is fibre, SERDES getting sync could happen within 0.1Hz, and
> > the carrier also appears instantly.
> > 
> > Other than that, if a driver is reporting instant carrier and then
> > losing it, this is probably a driver bug.
> 
> I can't help feeling that this will break something.
> The 4 second delay does look counter productive though.
> Obvious alternatives are 'wait a bit before the first check'
> and 'require carrier to be present for a few checks'.

I'm guessing, but i think the issue is that the MAC reports the
carrier is up, even though autoneg has not completed, and so packets
are getting dropped. Autoneg takes around 1.5 seconds, so you need to
wait this long before starting to send to prevent packets landing in
the bit bucket. And i guess polling as you suggests does not help,
since it never returns the true status.

But this is pure guesswork. Maybe some mailing list archaeology can
help explain this code.

I guess the likely breaking scenario is that simply the first 1.5
seconds of the kernel log goes to the bit bucket for broken
MACs. Which is not fatal, just annoying for somebody trying to debug a
crash in the first few seconds. I suppose dhcp might also take longer
for broken MACs, since its first requests also get lost, and it might
get into exponential back off.

I guess the risks are small here. But i use the word guess a lot...

  Andrew

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ