lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 16 Dec 2022 20:20:22 -0800
From:   Peter Delevoryas <>
To:     Alexander Duyck <>
Cc:     Peter Delevoryas <>,,,,,,,
Subject: Re: [PATCH] net/ncsi: Always use unicast source MAC address

> On Dec 16, 2022, at 10:29 AM, Alexander Duyck <> wrote:
> On Thu, Dec 15, 2022 at 5:08 PM Peter Delevoryas <> wrote:
>>> On Dec 13, 2022, at 8:41 AM, Alexander H Duyck <> wrote:
>>> On Mon, 2022-12-12 at 16:47 -0800, Peter Delevoryas wrote:
>>>> I use QEMU for development, and I noticed that NC-SI packets get dropped by
>>>> the Linux software bridge[1] because we use a broadcast source MAC address
>>>> for the first few NC-SI packets.
>>> Normally NC-SI packets should never be seen by a bridge.
>> True, and it’s good to keep this in context. I’m trying to make this change
>> to support simulation environments, but any change in NC-SI could easily
>> result in the out-of-band network connection to BMC’s in real data centers
>> failing to come up, which can be really bad and usually impossible to
>> recover remotely.
>>> Isn't NC-SI
>>> really supposed to just be between the BMC and the NIC firmware?
>> Yep
>>> Depending on your setup it might make more sense to use something like
>>> macvtap or a socket connection to just bypass the need for the bridge
>>> entirely.
>> For unicast, yes, but I want to test multiple NIC’s sharing an RMII
>> link and verifying the broadcast behavior, and the failover behavior
>> when an RX or TX channel goes down.
>> The multicast UDP socket backend _does_ work, but I was getting some
>> recirculation problems or some kind of buffering thing. I managed
>> to get tap0 + tap1 + br0 working faster.
> Right, but I think part of the issue is that things are being extended
> in a way that may actually hurt the maintainability of it.

Hmmmm yeah you might be right.

> Specifically it seems like the general idea is that the NCSI interface
> should be using either broadcast or the assigned unicast address as
> its source MAC address.

Yeah, most likely.

> My main concern with just using the raw MAC address from the device is
> that it may be something that would be more problematic than just
> being broadcast. My suggestion at a minimum would be to verify it is
> valid before we just use it or to do something like in the code I
> referenced where if the device doesn't have a valid MAC address we
> just overwrite it with broadcast.


>>>> The spec requires that the destination MAC address is FF:FF:FF:FF:FF:FF,
>>>> but it doesn't require anything about the source MAC address as far as I
>>>> know. From testing on a few different NC-SI NIC's (Broadcom 57502, Nvidia
>>>> CX4, CX6) I don't think it matters to the network card. I mean, Meta has
>>>> been using this in mass production with millions of BMC's [2].
>>>> In general, I think it's probably just a good idea to use a unicast MAC.
>>> I'm not sure I agree there. What is the initial value of the address?
>> Ok so, to be honest, I thought that the BMC’s FTGMAC100 peripherals
>> came with addresses provisioned from the factory, and that we were just
>> discarding that value and using an address provisioned through the NIC,
>> because I hadn’t really dug into the FTGMAC100 datasheet fully. I see now
>> that the MAC address register I thought was a read-only manufacturing
>> value is actually 8 different MAC address r/w registers for filtering.
>> *facepalm*
>> It suddenly makes a lot more sense why all these OEM Get MAC Address
>> commands exist: the BMC chip doesn’t come with any MAC addresses from
>> manufacturing. It’s a necessity, not some convenience artifact/etc.
>> So, tracing some example systems to see what shows up:
>> One example:
>> INIT: Entering runlevel: 5
>> Configuring network interfaces... [   25.893118] 8021q: adding VLAN 0 to HW filter on device eth0
>> [   25.904809] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> [   25.917307] ftgmac100 1e660000.ethernet eth0: NCSI: Handler for packet type 0x82 returned -19
>> [   25.958096] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> [   25.978124] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> [   25.990559] ftgmac100 1e660000.ethernet eth0: NCSI: 'bad' packet ignored for type 0xd0
>> [   26.018180] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> [   26.030631] ftgmac100 1e660000.ethernet eth0: NCSI: 'bad' packet ignored for type 0xd0
>> [   26.046594] ftgmac100 1e660000.ethernet eth0: NCSI: transmit cmd 0x50 ncsi_oem_sma_mlx success during probe
>> [   26.168109] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> [   26.198101] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> [   26.238237] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> [   26.272011] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> [   26.308155] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> [   26.320504] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> done.
>> [   26.408094] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> [   26.438100] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> [   26.450537] ftgmac100 1e660000.ethernet eth0: NCSI: bcm_gmac16 MAC RE:DA:CT:ED:HE:HE
>> Starting random number generator[   26.472388] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> daemon[   26.518241] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> [   26.559504] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> .
>> [   26.608229] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> Setup dhclient for IPv6... done.
>> [   26.681879] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> [   26.730523] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> [   26.808191] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> [   26.855689] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
>> Oddly, due to that code you mentioned, all NC-SI packets are using
>> a broadcast source MAC address, even after the Get MAC Address sequence
>> gets the MAC provisioned for the BMC from the Broadcom NIC.
>> root@...-oob:~# ifconfig
>> eth0      Link encap:Ethernet  HWaddr RE:DA:CT:ED:HE:HE
>>          inet addr:XXXXXXX  Bcast:XXXXXXXX  Mask:XXXXXXXX
>>          inet6 addr: XXXXXXXX Scope:Global
>>          inet6 addr: XXXXXXXX Scope:Link
>>          inet6 addr: XXXXXXXX Scope:Global
>>          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>          RX packets:2965 errors:0 dropped:0 overruns:0 frame:0
>>          TX packets:637 errors:0 dropped:0 overruns:0 carrier:0
>>          collisions:0 txqueuelen:1000
>>          RX bytes:872759 (852.3 KiB)  TX bytes:59936 (58.5 KiB)
>>          Interrupt:19
>> But, that’s a system using the 5.0 kernel with lots of old hacks
>> on top. A system using a 5.15 kernel with this change included:
>> INIT: Entering runlevel: 5
>> Configuring network interfaces... [    6.596537] 8021q: adding VLAN 0 to HW filter on device eth0
>> [    6.609264] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>> [    6.622913] ftgmac100 1e690000.ftgmac eth0: NCSI: Handler for packet type 0x82 returned -19
>> [    6.641447] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>> [    6.662543] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>> [    6.680454] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>> [    6.694114] ftgmac100 1e690000.ftgmac eth0: NCSI: transmit cmd 0x50 ncsi_oem_sma_mlx success during probe
>> [    6.715722] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>> done.
>> [    6.741372] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>> [    6.741451] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>> [    6.768714] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>> Starting random [    6.782599] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>> number generator[    6.799321] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>> daemon[    6.815680] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>> [    6.831388] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>> .
>> [    6.846921] ftgmac100 1e690000.ftgmac eth0: NCSI: Network controller supports NC-SI 1.1, querying MAC address through OEM(0x8119) command
>> Setup dhclient for IPv6... done.
>> [    6.908921] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>> reloading rsyslo[    6.933085] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>> So, this BMC already had the provisioned MAC address somehow,
>> even before the Nvidia Get MAC Address command towards the bottom.
> It is probably a hold over from the last boot. I suspect you would
> need a clean power-cycle to reset it back to non-modified values.


>> Adding tracing to ftgmac100:
>> [    2.018672] ftgmac100_initial_mac
>> [    2.026090] Read MAC address from FTGMAC100 register: RE:DA:CT:ED:AD:DR
>> [    2.040771] ftgmac100 1e690000.ftgmac: Read MAC address RE:DA:CT:ED:AD:DR from chip
>> [    2.057774] ftgmac100 1e690000.ftgmac: Using NCSI interface
>> [    2.070957] ftgmac100 1e690000.ftgmac eth0: irq 33, mapped at (ptrval)
>> Now, after rewriting the FTGMAC100 register to fa:ce:b0:0c:20:22 and rebooting:
>> root@...p6-2620-10d-c0b9-4b08-0-0-0-e4e:~# devmem 0x1e690008 32 0x0000face
>> root@...p6-2620-10d-c0b9-4b08-0-0-0-e4e:~# devmem 0x1e690008
>> 0x0000FACE
>> root@...p6-2620-10d-c0b9-4b08-0-0-0-e4e:~# devmem 0x1e69000c 32 0xb00c2022
>> root@...p6-2620-10d-c0b9-4b08-0-0-0-e4e:~# devmem 0x1e69000c
>> 0xB00C2022
>> [    2.001304] ftgmac100_initial_mac
>> [    2.008727] Read MAC address from FTGMAC100 register: fa:ce:b0:0c:20:22
>> [    2.023373] ftgmac100 1e690000.ftgmac: Read MAC address fa:ce:b0:0c:20:22 from chip
>> [    2.040367] ftgmac100 1e690000.ftgmac: Using NCSI interface
>> [    6.581239] ftgmac100_reset_mac
>> [    6.589193] ftgmac100_reset_mac
>> [    6.596727] 8021q: adding VLAN 0 to HW filter on device eth0
>> [    6.609462] NCSI: source=fa:ce:b0:0c:20:22 dest=ff:ff:ff:ff:ff:ff
>> [    6.623117] ftgmac100 1e690000.ftgmac eth0: NCSI: Handler for packet type 0x82 returned -19
>> [    6.641647] NCSI: source=fa:ce:b0:0c:20:22 dest=ff:ff:ff:ff:ff:ff
>> [    6.662398] NCSI: source=fa:ce:b0:0c:20:22 dest=ff:ff:ff:ff:ff:ff
>> [    6.680380] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>> [    6.694000] ftgmac100 1e690000.ftgmac eth0: NCSI: transmit cmd 0x50 ncsi_oem_sma_mlx success during probe
>> [    6.715700] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>> [    6.729528] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
>> So, it looks like whatever is initialized in ftgmac100_initial_mac becomes
>> the address we use for the NCSI queries initially.
>> The Aspeed datasheet says the FTGMAC100 MAC address registers are initialized to zero,
>> and in that case the ftgmac100 driver initializes it to something random
>> with eth_hw_addr_random().
>> So, I mean correct me if I’m wrong, but I think it all seems fine?
>> On a hard power cycle (instead of just resetting the ARM cores, which doesn’t seem to
>> have reset the peripherals), maybe it would actually be zero, and get initialized
>> to the random value. I’ll test that, need to do some more debug image building to do it
>> remotely.
> So we know now that it is using a random MAC address on reboot.


>>> My main
>>> concern would be that the dev_addr is not initialized for those first
>>> few messages so you may be leaking information.
>>>> This might have the effect of causing the NIC to learn 2 MAC addresses from
>>>> an NC-SI link if the BMC uses OEM Get MAC Address commands to change its
>>>> initial MAC address, but it shouldn't really matter. Who knows if NIC's
>>>> even have MAC learning enabled from the out-of-band BMC link, lol.
>>>> [1]:
>>>> [2]:
>>> The thing is the OpenBMC approach initializes the value themselves to
>>> broadcast[3]. As a result the two code bases are essentially doing the
>>> same thing since mac_addr is defaulted to the broadcast address when
>>> the ncsi interface is registered.
>> That’s a very good point, thanks for pointing that out, I hadn’t
>> even noticed that!
>> Anyways, let me know what you think of the traces I added above.
>> Sorry for the delay, I’ve just been busy with some other stuff,
>> but I do really actually care about upstreaming this (and several
>> other NC-SI changes I’ll submit after this one, which are unrelated
>> but more useful).
>> Thanks,
>> Peter
> So the NC-SI spec says any value can be used for the source MAC and
> that broadcast "may" be used. I would say there are some debugging
> advantages to using broadcast that will be obvious in a packet trace.

Ehhhhh yeah I guess, but the ethertype is what I filter for. But sure,
a broadcast source MAC is pretty unique too.

> I wonder if we couldn't look at doing something like requiring
> broadcast or LAA if the gma_flag isn't set.

What is LAA? I’m out of the loop

But also: aren’t we already using broadcast if the gma_flag isn’t set?

-       if (nca->ndp->gma_flag == 1)
-               memcpy(eh->h_source, nca->ndp->>dev_addr, ETH_ALEN);
-       else
-               eth_broadcast_addr(eh->h_source);
+       memcpy(eh->h_source, nca->ndp->>dev_addr, ETH_ALEN);

> With that we could at
> least advertise that we don't expect this packet to be going out in a
> real network as we cannot guarantee the MAC is unique.

Yeah, but it probably wouldn’t help my simulation scenario.

I guess it sounds like this patch is not a good idea, which to be fair,
is totally reasonable.

I can just add some iptables rules to tunnel these packets with a different
source MAC, or fix the multicast socket issue I was having. It’s really
not a big deal, and like you’re saying, we probably don’t want to make
it harder to maintain _forever_.

I would just suggest praying for the next guy that tries to test NC-SI
stuff with QEMU and finds out NC-SI traffic gets dropped by bridges.
I had to resort to reading the source code and printing stuff with
BPF to identify this. Maybe it’s more obvious to other people this wouldn’t
work though.

Powered by blists - more mailing lists