lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 16 Dec 2022 20:20:22 -0800 From: Peter Delevoryas <peter@....dev> To: Alexander Duyck <alexander.duyck@...il.com> Cc: Peter Delevoryas <peter@....dev>, sam@...dozajonas.com, davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com, netdev@...r.kernel.org, linux-kernel@...r.kernel.org Subject: Re: [PATCH] net/ncsi: Always use unicast source MAC address > On Dec 16, 2022, at 10:29 AM, Alexander Duyck <alexander.duyck@...il.com> wrote: > > On Thu, Dec 15, 2022 at 5:08 PM Peter Delevoryas <peter@....dev> wrote: >> >> >> >>> On Dec 13, 2022, at 8:41 AM, Alexander H Duyck <alexander.duyck@...il.com> wrote: >>> >>> On Mon, 2022-12-12 at 16:47 -0800, Peter Delevoryas wrote: >>>> I use QEMU for development, and I noticed that NC-SI packets get dropped by >>>> the Linux software bridge[1] because we use a broadcast source MAC address >>>> for the first few NC-SI packets. >>> >>> Normally NC-SI packets should never be seen by a bridge. >> >> True, and it’s good to keep this in context. I’m trying to make this change >> to support simulation environments, but any change in NC-SI could easily >> result in the out-of-band network connection to BMC’s in real data centers >> failing to come up, which can be really bad and usually impossible to >> recover remotely. >> >>> Isn't NC-SI >>> really supposed to just be between the BMC and the NIC firmware? >> >> Yep >> >>> Depending on your setup it might make more sense to use something like >>> macvtap or a socket connection to just bypass the need for the bridge >>> entirely. >> >> For unicast, yes, but I want to test multiple NIC’s sharing an RMII >> link and verifying the broadcast behavior, and the failover behavior >> when an RX or TX channel goes down. >> >> The multicast UDP socket backend _does_ work, but I was getting some >> recirculation problems or some kind of buffering thing. I managed >> to get tap0 + tap1 + br0 working faster. > > Right, but I think part of the issue is that things are being extended > in a way that may actually hurt the maintainability of it. Hmmmm yeah you might be right. > Specifically it seems like the general idea is that the NCSI interface > should be using either broadcast or the assigned unicast address as > its source MAC address. Yeah, most likely. > > My main concern with just using the raw MAC address from the device is > that it may be something that would be more problematic than just > being broadcast. My suggestion at a minimum would be to verify it is > valid before we just use it or to do something like in the code I > referenced where if the device doesn't have a valid MAC address we > just overwrite it with broadcast. +1 > >>> >>>> The spec requires that the destination MAC address is FF:FF:FF:FF:FF:FF, >>>> but it doesn't require anything about the source MAC address as far as I >>>> know. From testing on a few different NC-SI NIC's (Broadcom 57502, Nvidia >>>> CX4, CX6) I don't think it matters to the network card. I mean, Meta has >>>> been using this in mass production with millions of BMC's [2]. >>>> >>>> In general, I think it's probably just a good idea to use a unicast MAC. >>> >>> I'm not sure I agree there. What is the initial value of the address? >> >> Ok so, to be honest, I thought that the BMC’s FTGMAC100 peripherals >> came with addresses provisioned from the factory, and that we were just >> discarding that value and using an address provisioned through the NIC, >> because I hadn’t really dug into the FTGMAC100 datasheet fully. I see now >> that the MAC address register I thought was a read-only manufacturing >> value is actually 8 different MAC address r/w registers for filtering. >> *facepalm* >> >> It suddenly makes a lot more sense why all these OEM Get MAC Address >> commands exist: the BMC chip doesn’t come with any MAC addresses from >> manufacturing. It’s a necessity, not some convenience artifact/etc. >> >> So, tracing some example systems to see what shows up: >> >> One example: >> INIT: Entering runlevel: 5 >> Configuring network interfaces... [ 25.893118] 8021q: adding VLAN 0 to HW filter on device eth0 >> [ 25.904809] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff >> [ 25.917307] ftgmac100 1e660000.ethernet eth0: NCSI: Handler for packet type 0x82 returned -19 >> [ 25.958096] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff >> [ 25.978124] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff >> [ 25.990559] ftgmac100 1e660000.ethernet eth0: NCSI: 'bad' packet ignored for type 0xd0 >> [ 26.018180] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff >> [ 26.030631] ftgmac100 1e660000.ethernet eth0: NCSI: 'bad' packet ignored for type 0xd0 >> [ 26.046594] ftgmac100 1e660000.ethernet eth0: NCSI: transmit cmd 0x50 ncsi_oem_sma_mlx success during probe >> [ 26.168109] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff >> [ 26.198101] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff >> [ 26.238237] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff >> [ 26.272011] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff >> [ 26.308155] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff >> [ 26.320504] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff >> done. >> [ 26.408094] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff >> [ 26.438100] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff >> [ 26.450537] ftgmac100 1e660000.ethernet eth0: NCSI: bcm_gmac16 MAC RE:DA:CT:ED:HE:HE >> Starting random number generator[ 26.472388] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff >> daemon[ 26.518241] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff >> [ 26.559504] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff >> . >> [ 26.608229] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff >> Setup dhclient for IPv6... done. >> [ 26.681879] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff >> [ 26.730523] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff >> [ 26.808191] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff >> [ 26.855689] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff >> >> Oddly, due to that code you mentioned, all NC-SI packets are using >> a broadcast source MAC address, even after the Get MAC Address sequence >> gets the MAC provisioned for the BMC from the Broadcom NIC. >> >> root@...-oob:~# ifconfig >> eth0 Link encap:Ethernet HWaddr RE:DA:CT:ED:HE:HE >> inet addr:XXXXXXX Bcast:XXXXXXXX Mask:XXXXXXXX >> inet6 addr: XXXXXXXX Scope:Global >> inet6 addr: XXXXXXXX Scope:Link >> inet6 addr: XXXXXXXX Scope:Global >> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >> RX packets:2965 errors:0 dropped:0 overruns:0 frame:0 >> TX packets:637 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:1000 >> RX bytes:872759 (852.3 KiB) TX bytes:59936 (58.5 KiB) >> Interrupt:19 >> >> But, that’s a system using the 5.0 kernel with lots of old hacks >> on top. A system using a 5.15 kernel with this change included: >> >> INIT: Entering runlevel: 5 >> Configuring network interfaces... [ 6.596537] 8021q: adding VLAN 0 to HW filter on device eth0 >> [ 6.609264] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff >> [ 6.622913] ftgmac100 1e690000.ftgmac eth0: NCSI: Handler for packet type 0x82 returned -19 >> [ 6.641447] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff >> [ 6.662543] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff >> [ 6.680454] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff >> [ 6.694114] ftgmac100 1e690000.ftgmac eth0: NCSI: transmit cmd 0x50 ncsi_oem_sma_mlx success during probe >> [ 6.715722] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff >> done. >> [ 6.741372] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff >> [ 6.741451] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff >> [ 6.768714] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff >> Starting random [ 6.782599] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff >> number generator[ 6.799321] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff >> daemon[ 6.815680] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff >> [ 6.831388] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff >> . >> [ 6.846921] ftgmac100 1e690000.ftgmac eth0: NCSI: Network controller supports NC-SI 1.1, querying MAC address through OEM(0x8119) command >> Setup dhclient for IPv6... done. >> [ 6.908921] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff >> reloading rsyslo[ 6.933085] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff >> >> So, this BMC already had the provisioned MAC address somehow, >> even before the Nvidia Get MAC Address command towards the bottom. > > It is probably a hold over from the last boot. I suspect you would > need a clean power-cycle to reset it back to non-modified values. +1 > >> Adding tracing to ftgmac100: >> >> [ 2.018672] ftgmac100_initial_mac >> [ 2.026090] Read MAC address from FTGMAC100 register: RE:DA:CT:ED:AD:DR >> [ 2.040771] ftgmac100 1e690000.ftgmac: Read MAC address RE:DA:CT:ED:AD:DR from chip >> [ 2.057774] ftgmac100 1e690000.ftgmac: Using NCSI interface >> [ 2.070957] ftgmac100 1e690000.ftgmac eth0: irq 33, mapped at (ptrval) >> >> Now, after rewriting the FTGMAC100 register to fa:ce:b0:0c:20:22 and rebooting: >> >> root@...p6-2620-10d-c0b9-4b08-0-0-0-e4e:~# devmem 0x1e690008 32 0x0000face >> root@...p6-2620-10d-c0b9-4b08-0-0-0-e4e:~# devmem 0x1e690008 >> 0x0000FACE >> root@...p6-2620-10d-c0b9-4b08-0-0-0-e4e:~# devmem 0x1e69000c 32 0xb00c2022 >> root@...p6-2620-10d-c0b9-4b08-0-0-0-e4e:~# devmem 0x1e69000c >> 0xB00C2022 >> >> [ 2.001304] ftgmac100_initial_mac >> [ 2.008727] Read MAC address from FTGMAC100 register: fa:ce:b0:0c:20:22 >> [ 2.023373] ftgmac100 1e690000.ftgmac: Read MAC address fa:ce:b0:0c:20:22 from chip >> [ 2.040367] ftgmac100 1e690000.ftgmac: Using NCSI interface >> >> [ 6.581239] ftgmac100_reset_mac >> [ 6.589193] ftgmac100_reset_mac >> [ 6.596727] 8021q: adding VLAN 0 to HW filter on device eth0 >> [ 6.609462] NCSI: source=fa:ce:b0:0c:20:22 dest=ff:ff:ff:ff:ff:ff >> [ 6.623117] ftgmac100 1e690000.ftgmac eth0: NCSI: Handler for packet type 0x82 returned -19 >> [ 6.641647] NCSI: source=fa:ce:b0:0c:20:22 dest=ff:ff:ff:ff:ff:ff >> [ 6.662398] NCSI: source=fa:ce:b0:0c:20:22 dest=ff:ff:ff:ff:ff:ff >> [ 6.680380] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff >> [ 6.694000] ftgmac100 1e690000.ftgmac eth0: NCSI: transmit cmd 0x50 ncsi_oem_sma_mlx success during probe >> [ 6.715700] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff >> [ 6.729528] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff >> >> So, it looks like whatever is initialized in ftgmac100_initial_mac becomes >> the address we use for the NCSI queries initially. >> >> The Aspeed datasheet says the FTGMAC100 MAC address registers are initialized to zero, >> and in that case the ftgmac100 driver initializes it to something random >> with eth_hw_addr_random(). >> >> So, I mean correct me if I’m wrong, but I think it all seems fine? >> >> On a hard power cycle (instead of just resetting the ARM cores, which doesn’t seem to >> have reset the peripherals), maybe it would actually be zero, and get initialized >> to the random value. I’ll test that, need to do some more debug image building to do it >> remotely. > > So we know now that it is using a random MAC address on reboot. +1 > >>> My main >>> concern would be that the dev_addr is not initialized for those first >>> few messages so you may be leaking information. >>> >>>> This might have the effect of causing the NIC to learn 2 MAC addresses from >>>> an NC-SI link if the BMC uses OEM Get MAC Address commands to change its >>>> initial MAC address, but it shouldn't really matter. Who knows if NIC's >>>> even have MAC learning enabled from the out-of-band BMC link, lol. >>>> >>>> [1]: https://tinyurl.com/4933mhaj >>>> [2]: https://tinyurl.com/mr3tyadb >>> >>> The thing is the OpenBMC approach initializes the value themselves to >>> broadcast[3]. As a result the two code bases are essentially doing the >>> same thing since mac_addr is defaulted to the broadcast address when >>> the ncsi interface is registered. >> >> That’s a very good point, thanks for pointing that out, I hadn’t >> even noticed that! >> >> Anyways, let me know what you think of the traces I added above. >> Sorry for the delay, I’ve just been busy with some other stuff, >> but I do really actually care about upstreaming this (and several >> other NC-SI changes I’ll submit after this one, which are unrelated >> but more useful). >> >> Thanks, >> Peter > > So the NC-SI spec says any value can be used for the source MAC and > that broadcast "may" be used. I would say there are some debugging > advantages to using broadcast that will be obvious in a packet trace. Ehhhhh yeah I guess, but the ethertype is what I filter for. But sure, a broadcast source MAC is pretty unique too. > I wonder if we couldn't look at doing something like requiring > broadcast or LAA if the gma_flag isn't set. What is LAA? I’m out of the loop But also: aren’t we already using broadcast if the gma_flag isn’t set? - if (nca->ndp->gma_flag == 1) - memcpy(eh->h_source, nca->ndp->ndev.dev->dev_addr, ETH_ALEN); - else - eth_broadcast_addr(eh->h_source); + memcpy(eh->h_source, nca->ndp->ndev.dev->dev_addr, ETH_ALEN); > With that we could at > least advertise that we don't expect this packet to be going out in a > real network as we cannot guarantee the MAC is unique. Yeah, but it probably wouldn’t help my simulation scenario. I guess it sounds like this patch is not a good idea, which to be fair, is totally reasonable. I can just add some iptables rules to tunnel these packets with a different source MAC, or fix the multicast socket issue I was having. It’s really not a big deal, and like you’re saying, we probably don’t want to make it harder to maintain _forever_. I would just suggest praying for the next guy that tries to test NC-SI stuff with QEMU and finds out NC-SI traffic gets dropped by bridges. I had to resort to reading the source code and printing stuff with BPF to identify this. Maybe it’s more obvious to other people this wouldn’t work though.
Powered by blists - more mailing lists