linux-kernel - Re: [PATCH] ipmi: kcs: Update OBF poll timeout to reduce latency

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <527F52AB-0070-43EA-BE82-945280CA2AEE@gmail.com>
Date: Wed, 21 Feb 2024 10:57:38 -0600
From: Andrew Geissler <geissonator@...il.com>
To: Andrew Jeffery <andrew@...econstruct.com.au>
Cc: minyard@....org,
 Paul Menzel <pmenzel@...gen.mpg.de>,
 Joel Stanley <joel@....id.au>,
 openipmi-developer@...ts.sourceforge.net,
 Linux ARM <linux-arm-kernel@...ts.infradead.org>,
 linux-aspeed <linux-aspeed@...ts.ozlabs.org>,
 Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
 openbmc@...ts.ozlabs.org
Subject: Re: [PATCH] ipmi: kcs: Update OBF poll timeout to reduce latency



> On Feb 20, 2024, at 4:36 PM, Andrew Jeffery <andrew@...econstruct.com.au> wrote:
> 
> On Tue, 2024-02-20 at 13:33 -0600, Corey Minyard wrote:
>> On Tue, Feb 20, 2024 at 04:51:21PM +0100, Paul Menzel wrote:
>>> Dear Andrew,
>> 
>> It's because increasing that number causes it to poll longer for the
>> event, the host takes longer than 100us to generate the event, and if
>> the event is missed the time when it is checked again is very long.
>> 
>> Polling for 100us is already pretty extreme. 200us is really too long.
>> 
>> The real problem is that there is no interrupt for this.  I'd also guess
>> there is no interrupt on the host side, because that would solve this
>> problem, too, as it would certainly get around to handling the interupt
>> in 100us.  I'm assuming the host driver is not the Linux driver, as it
>> should also handle this in a timely manner, even when polling.
> 
> I expect the issues Andrew G is observing are with the Power10 boot
> firmware. The boot firmware only polls. The runtime firmware enables
> interrupts.

Yep, this is with the low level host boot firmware.
Also, further testing over night showed that 200us wasn’t enough for
our larger Everest P10 machines, I needed to go to 300us. As we
were struggling to allow 200us, I assume 300us is going to be a no-go.

>> 
> 
>> 
>> The right way to fix this is probably to do the same thing the host side
>> Linux driver does.  It has a kernel thread that is kicked off to do
>> this.  Unfortunately, that's more complicated to implement, but it
>> avoids polling in this location (which causes latency issues on the BMC
>> side) and lets you poll longer without causing issues.
> 
> In Andrew G's case he's talking MCTP over KCS using a vendor-defined
> transport binding (that also leverages LPC FWH cycles for bulk data
> transfers)[1]. I think it could have taken more inspiration from the
> IPMI KCS protocol: It might be worth an experiment to write the dummy
> command value to IDR from the host side after each ODR read to signal
> the host's clearing of OBF (no interrupt for the BMC) with an IBF
> (which does interrupt the BMC). And doing the obverse for the BMC. Some
> brief thought suggests that if the dummy value is read there's no need
> to send a dummy value in reply (as it's an indicator to read the status
> register). With that the need for the spin here (or on the host side)
> is reduced at the cost of some constant protocol overhead.
> 

Thanks for the quick reviews and ideas.
I’ll see if I can find someone on the team to help out with Andrew J’s
thoughts and if that doesn’t work, look into the kernel thread idea.

> 
> 
> Andrew J