netdev - Re: [RFC net-next] net: track locally triggered link loss

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220521050834.legbzeumzgqwldqp@sx1>
Date:   Fri, 20 May 2022 22:08:34 -0700
From:   Saeed Mahameed <saeedm@...dia.com>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     Andrew Lunn <andrew@...n.ch>, netdev@...r.kernel.org,
        linux@...linux.org.uk, olteanv@...il.com, hkallweit1@...il.com,
        f.fainelli@...il.com, michael.chan@...adcom.com
Subject: Re: [RFC net-next] net: track locally triggered link loss

On 20 May 16:03, Jakub Kicinski wrote:
>On Fri, 20 May 2022 15:08:32 -0700 Saeed Mahameed wrote:
>> >I'm not sure this is a good example. If the PHY is doing an autoneg,
>> >the link really is down for around a second. The link peer will also
>> >so the link go down and come back up. So this seems like a legitimate
>> >time to set the carrier off and then back on again.
>> >
>> +1
>>
>> also looks very racy, what happens if a real phy link event happens in
>> between carrier_change_start() and carrier_change_end() ?
>
>But physical world is racy. What if the link flaps twice while the IRQs
>are masked? There will always be cases where we miss or miscount events.
>

this is why we have EQs in modern hw .. but i get your point, in the driver
we care only about the final link state, we don't care about how many and which
events took to get there.

>> I think you shouldn't treat configuration flows where the driver actually
>> toggles the phy link as a special case, they should be counted as a real
>> link flap.. because they are.
>
>That's not the direction of the patch at all - I'm counting locally
>generated events, I don't care as much if the link went down or not.
>
>I believe that creating a system which would at scale try to correlate
>events between peers is impractical.
>
>> It's impossible from the driver level to know if a FW link event is
>> due to configuration causes or external forces !
>
>You mean because FW or another entity (other than local host) asked for
>the link to be reset? How is that different from switch taking it down?
>Either way the host has lost link due to a non-local event. (3a) or (3b)
>

I was talking about (1) vs (2), how do you know when the IRQ/FW event
arrives what caused it ?  Maybe I just don't understand how you plan to use the
new API when re-config brings link down. 

for example: 
driver_reconfig() {
    maybe_close_rings();
    exec_fw_command(); //link will flap, events are triggered asynchronously.
    maybe_open_rings();
}

how do you wrap this with netif_carrier_change_start/end() when the link
events are async ? 

>> the new 3 APIs are going to be a heavy burden on drivers to maintain. if
>> you agree with the above and treat all phy link events as one, then we end
>> up with one new API drivers has to maintain "net_if_carrier_admin_off()"
>> which is manageable.
>
>I really don't think it's that hard...
>
>> But what about SW netdevices, should all of them change to use the "admin"
>> version ?
>
>The new statistic is an opt-in (via netdev->has_carrier_down_local)
>I think the same rules as to real devices should apply to SW devices
>but I don't intend to implement the changes for any.
>

it's and opt-in with obligation to implement it right.

>> We should keep current carrier logic as is and add new state/counter
>> to count real phy link state.
>>
>> netif_phy_link_down(netdev) {
>>     set_bit(__LINK_STATE_NOPHYLINK, &dev->state);
>>     atomic_inc(netdev->phy_link_down);
>>     netif_carrier_off(ndetdev);
>> }
>>
>> netif_phy_link_up(netdev) {...}
>>
>> such API should be maintained by real HW device drivers.
>
>"phy_link_down" has a ring of "API v2 this time we'll get it right".
>

c'mon .. same goes for netif_carrier_local_changes_start/end and
netif_carrier_admin_off().

>Does this differentiate between locally vs non-locally generated events?
>

no

>PTAL at the categorization in the commit message. There are three
>classes of events, we need three counters. local vs non-local and
>link went down vs flow was paused by SW are independent and overlapping.
>Doesn't matter what the counters are called, translating between them
>is basic math.

Ok if you want to go with this I am fine, just let's not add more peppered
confusing single purpose API calls into drivers to get some counters right,
let's implement one multi-purpose infrastructure and use it where it's needed.
It appears that all you need is for the driver to notify the stack when it's
going down for a re-config and when it comes back up, thus put all the
interesting heavy lifting logic in the stack, e.g link event classification,
freezing/flushing TX queues, flushing offloaded resources (VXALN, TLS, IPSec),
etc .. 
currently all of the above are and will be duplicated in every driver, when
all you need is a generic hint from the driver.