[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <3CA45501-9FD3-4A33-97A8-AB0BD77AC61F@pjd.dev>
Date: Thu, 15 Dec 2022 17:31:17 -0800
From: Peter Delevoryas <peter@....dev>
To: Alexander H Duyck <alexander.duyck@...il.com>
Cc: Peter Delevoryas <peter@....dev>, sam@...dozajonas.com,
davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
pabeni@...hat.com, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] net/ncsi: Always use unicast source MAC address
> On Dec 15, 2022, at 5:07 PM, Peter Delevoryas <peter@....dev> wrote:
>
>
>
>> On Dec 13, 2022, at 8:41 AM, Alexander H Duyck <alexander.duyck@...il.com> wrote:
>>
>> On Mon, 2022-12-12 at 16:47 -0800, Peter Delevoryas wrote:
>>> I use QEMU for development, and I noticed that NC-SI packets get dropped by
>>> the Linux software bridge[1] because we use a broadcast source MAC address
>>> for the first few NC-SI packets.
>>
>> Normally NC-SI packets should never be seen by a bridge.
>
> True, and it’s good to keep this in context. I’m trying to make this change
> to support simulation environments, but any change in NC-SI could easily
> result in the out-of-band network connection to BMC’s in real data centers
> failing to come up, which can be really bad and usually impossible to
> recover remotely.
>
>> Isn't NC-SI
>> really supposed to just be between the BMC and the NIC firmware?
>
> Yep
>
>> Depending on your setup it might make more sense to use something like
>> macvtap or a socket connection to just bypass the need for the bridge
>> entirely.
>
> For unicast, yes, but I want to test multiple NIC’s sharing an RMII
> link and verifying the broadcast behavior, and the failover behavior
> when an RX or TX channel goes down.
>
> The multicast UDP socket backend _does_ work, but I was getting some
> recirculation problems or some kind of buffering thing. I managed
> to get tap0 + tap1 + br0 working faster.
>
>>
>>> The spec requires that the destination MAC address is FF:FF:FF:FF:FF:FF,
>>> but it doesn't require anything about the source MAC address as far as I
>>> know. From testing on a few different NC-SI NIC's (Broadcom 57502, Nvidia
>>> CX4, CX6) I don't think it matters to the network card. I mean, Meta has
>>> been using this in mass production with millions of BMC's [2].
>>>
>>> In general, I think it's probably just a good idea to use a unicast MAC.
>>
>> I'm not sure I agree there. What is the initial value of the address?
>
> Ok so, to be honest, I thought that the BMC’s FTGMAC100 peripherals
> came with addresses provisioned from the factory, and that we were just
> discarding that value and using an address provisioned through the NIC,
> because I hadn’t really dug into the FTGMAC100 datasheet fully. I see now
> that the MAC address register I thought was a read-only manufacturing
> value is actually 8 different MAC address r/w registers for filtering.
> *facepalm*
>
> It suddenly makes a lot more sense why all these OEM Get MAC Address
> commands exist: the BMC chip doesn’t come with any MAC addresses from
> manufacturing. It’s a necessity, not some convenience artifact/etc.
>
> So, tracing some example systems to see what shows up:
>
> One example:
> INIT: Entering runlevel: 5
> Configuring network interfaces... [ 25.893118] 8021q: adding VLAN 0 to HW filter on device eth0
> [ 25.904809] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 25.917307] ftgmac100 1e660000.ethernet eth0: NCSI: Handler for packet type 0x82 returned -19
> [ 25.958096] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 25.978124] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 25.990559] ftgmac100 1e660000.ethernet eth0: NCSI: 'bad' packet ignored for type 0xd0
> [ 26.018180] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.030631] ftgmac100 1e660000.ethernet eth0: NCSI: 'bad' packet ignored for type 0xd0
> [ 26.046594] ftgmac100 1e660000.ethernet eth0: NCSI: transmit cmd 0x50 ncsi_oem_sma_mlx success during probe
> [ 26.168109] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.198101] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.238237] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.272011] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.308155] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.320504] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> done.
> [ 26.408094] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.438100] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.450537] ftgmac100 1e660000.ethernet eth0: NCSI: bcm_gmac16 MAC RE:DA:CT:ED:HE:HE
> Starting random number generator[ 26.472388] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> daemon[ 26.518241] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.559504] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> .
> [ 26.608229] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> Setup dhclient for IPv6... done.
> [ 26.681879] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.730523] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.808191] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
> [ 26.855689] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>
> Oddly, due to that code you mentioned, all NC-SI packets are using
> a broadcast source MAC address, even after the Get MAC Address sequence
> gets the MAC provisioned for the BMC from the Broadcom NIC.
>
> root@...-oob:~# ifconfig
> eth0 Link encap:Ethernet HWaddr RE:DA:CT:ED:HE:HE
> inet addr:XXXXXXX Bcast:XXXXXXXX Mask:XXXXXXXX
> inet6 addr: XXXXXXXX Scope:Global
> inet6 addr: XXXXXXXX Scope:Link
> inet6 addr: XXXXXXXX Scope:Global
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:2965 errors:0 dropped:0 overruns:0 frame:0
> TX packets:637 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:872759 (852.3 KiB) TX bytes:59936 (58.5 KiB)
> Interrupt:19
>
> But, that’s a system using the 5.0 kernel with lots of old hacks
> on top. A system using a 5.15 kernel with this change included:
>
> INIT: Entering runlevel: 5
> Configuring network interfaces... [ 6.596537] 8021q: adding VLAN 0 to HW filter on device eth0
> [ 6.609264] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> [ 6.622913] ftgmac100 1e690000.ftgmac eth0: NCSI: Handler for packet type 0x82 returned -19
> [ 6.641447] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> [ 6.662543] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> [ 6.680454] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> [ 6.694114] ftgmac100 1e690000.ftgmac eth0: NCSI: transmit cmd 0x50 ncsi_oem_sma_mlx success during probe
> [ 6.715722] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> done.
> [ 6.741372] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> [ 6.741451] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> [ 6.768714] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> Starting random [ 6.782599] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> number generator[ 6.799321] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> daemon[ 6.815680] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> [ 6.831388] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> .
> [ 6.846921] ftgmac100 1e690000.ftgmac eth0: NCSI: Network controller supports NC-SI 1.1, querying MAC address through OEM(0x8119) command
> Setup dhclient for IPv6... done.
> [ 6.908921] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> reloading rsyslo[ 6.933085] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>
> So, this BMC already had the provisioned MAC address somehow,
> even before the Nvidia Get MAC Address command towards the bottom.
>
> Adding tracing to ftgmac100:
>
> [ 2.018672] ftgmac100_initial_mac
> [ 2.026090] Read MAC address from FTGMAC100 register: RE:DA:CT:ED:AD:DR
> [ 2.040771] ftgmac100 1e690000.ftgmac: Read MAC address RE:DA:CT:ED:AD:DR from chip
> [ 2.057774] ftgmac100 1e690000.ftgmac: Using NCSI interface
> [ 2.070957] ftgmac100 1e690000.ftgmac eth0: irq 33, mapped at (ptrval)
>
> Now, after rewriting the FTGMAC100 register to fa:ce:b0:0c:20:22 and rebooting:
>
> root@...p6-2620-10d-c0b9-4b08-0-0-0-e4e:~# devmem 0x1e690008 32 0x0000face
> root@...p6-2620-10d-c0b9-4b08-0-0-0-e4e:~# devmem 0x1e690008
> 0x0000FACE
> root@...p6-2620-10d-c0b9-4b08-0-0-0-e4e:~# devmem 0x1e69000c 32 0xb00c2022
> root@...p6-2620-10d-c0b9-4b08-0-0-0-e4e:~# devmem 0x1e69000c
> 0xB00C2022
>
> [ 2.001304] ftgmac100_initial_mac
> [ 2.008727] Read MAC address from FTGMAC100 register: fa:ce:b0:0c:20:22
> [ 2.023373] ftgmac100 1e690000.ftgmac: Read MAC address fa:ce:b0:0c:20:22 from chip
> [ 2.040367] ftgmac100 1e690000.ftgmac: Using NCSI interface
>
> [ 6.581239] ftgmac100_reset_mac
> [ 6.589193] ftgmac100_reset_mac
> [ 6.596727] 8021q: adding VLAN 0 to HW filter on device eth0
> [ 6.609462] NCSI: source=fa:ce:b0:0c:20:22 dest=ff:ff:ff:ff:ff:ff
> [ 6.623117] ftgmac100 1e690000.ftgmac eth0: NCSI: Handler for packet type 0x82 returned -19
> [ 6.641647] NCSI: source=fa:ce:b0:0c:20:22 dest=ff:ff:ff:ff:ff:ff
> [ 6.662398] NCSI: source=fa:ce:b0:0c:20:22 dest=ff:ff:ff:ff:ff:ff
> [ 6.680380] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> [ 6.694000] ftgmac100 1e690000.ftgmac eth0: NCSI: transmit cmd 0x50 ncsi_oem_sma_mlx success during probe
> [ 6.715700] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
> [ 6.729528] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>
> So, it looks like whatever is initialized in ftgmac100_initial_mac becomes
> the address we use for the NCSI queries initially.
>
> The Aspeed datasheet says the FTGMAC100 MAC address registers are initialized to zero,
> and in that case the ftgmac100 driver initializes it to something random
> with eth_hw_addr_random().
>
> So, I mean correct me if I’m wrong, but I think it all seems fine?
>
> On a hard power cycle (instead of just resetting the ARM cores, which doesn’t seem to
> have reset the peripherals), maybe it would actually be zero, and get initialized
> to the random value. I’ll test that, need to do some more debug image building to do it
> remotely.
Oh, that didn’t take as long as I expected. Here’s the results from a real
power cycle:
[ 5.248154] ftgmac100_initial_mac
[ 5.255470] Read MAC Address from FTGMAC100 register: 00:00:00:00:00:00
[ 5.255482] Generated random MAC Address: 4e:c7:78:ec:cd:4a
[ 5.282434] ftgmac100 1e690000.ftgmac: Generated random MAC address 4e:c7:78:ec:cd:4a
So yeah, in full power-cycles, it’ll be some random address.
>
>> If I am not mistaken the gma_flag is used to indicate that the MAC
>> address has been acquired isn't it?
>
> That’s correct.
>
>> If using the broadcast is an issue
>> the maybe an all 0's MAC address might be more appropriate.
>
> Possibly yeah, although that would also be dropped by the Linux bridge lol,
> so it wouldn’t solve my simulation problem.
>
>> My main
>> concern would be that the dev_addr is not initialized for those first
>> few messages so you may be leaking information.
>>
>>> This might have the effect of causing the NIC to learn 2 MAC addresses from
>>> an NC-SI link if the BMC uses OEM Get MAC Address commands to change its
>>> initial MAC address, but it shouldn't really matter. Who knows if NIC's
>>> even have MAC learning enabled from the out-of-band BMC link, lol.
>>>
>>> [1]: https://tinyurl.com/4933mhaj
>>> [2]: https://tinyurl.com/mr3tyadb
>>
>> The thing is the OpenBMC approach initializes the value themselves to
>> broadcast[3]. As a result the two code bases are essentially doing the
>> same thing since mac_addr is defaulted to the broadcast address when
>> the ncsi interface is registered.
>
> That’s a very good point, thanks for pointing that out, I hadn’t
> even noticed that!
>
> Anyways, let me know what you think of the traces I added above.
> Sorry for the delay, I’ve just been busy with some other stuff,
> but I do really actually care about upstreaming this (and several
> other NC-SI changes I’ll submit after this one, which are unrelated
> but more useful).
>
> Thanks,
> Peter
>
>>
>> [3]: tinyurl.com/mr3cxf3b
>>
>>>
>>> Signed-off-by: Peter Delevoryas <peter@....dev>
>>> ---
>>> net/ncsi/ncsi-cmd.c | 10 +---------
>>> 1 file changed, 1 insertion(+), 9 deletions(-)
>>>
>>> diff --git a/net/ncsi/ncsi-cmd.c b/net/ncsi/ncsi-cmd.c
>>> index dda8b76b7798..fd090156cf0d 100644
>>> --- a/net/ncsi/ncsi-cmd.c
>>> +++ b/net/ncsi/ncsi-cmd.c
>>> @@ -377,15 +377,7 @@ int ncsi_xmit_cmd(struct ncsi_cmd_arg *nca)
>>> eh = skb_push(nr->cmd, sizeof(*eh));
>>> eh->h_proto = htons(ETH_P_NCSI);
>>> eth_broadcast_addr(eh->h_dest);
>>> -
>>> - /* If mac address received from device then use it for
>>> - * source address as unicast address else use broadcast
>>> - * address as source address
>>> - */
>>> - if (nca->ndp->gma_flag == 1)
>>> - memcpy(eh->h_source, nca->ndp->ndev.dev->dev_addr, ETH_ALEN);
>>> - else
>>> - eth_broadcast_addr(eh->h_source);
>>> + memcpy(eh->h_source, nca->ndp->ndev.dev->dev_addr, ETH_ALEN);
>>>
>>> /* Start the timer for the request that might not have
>>> * corresponding response. Given NCSI is an internal
Powered by blists - more mailing lists