linux-kernel - Re: [PATCH] ipmi_si_intf: use usleep

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190805181850.GI136335@devbig004.ftw2.facebook.com>
Date:   Mon, 5 Aug 2019 11:18:50 -0700
From:   Tejun Heo <tj@...nel.org>
To:     Corey Minyard <minyard@....org>
Cc:     openipmi-developer@...ts.sourceforge.net,
        linux-kernel@...r.kernel.org, kernel-team@...com
Subject: Re: [PATCH] ipmi_si_intf: use usleep_range() instead of busy looping

Hello, Corey.

On Thu, Aug 01, 2019 at 12:40:02PM -0500, Corey Minyard wrote:
> I spent some time looking at this.  Without the patch, I
> measured around 3.5ms to send/receive a get device ID message
> and uses 100% of the CPU on a core.
> 
> I measured your patch, it slowed it down to around 10.5ms
> per message, which is not good.  Though it did just use a
> few percent of the core.
> 
> I wrote some code to auto-adjust the timer.  It settled on
> a delay around 35us, which gave 4.7ms per message, which is
> probably acceptable, and used around 40% of the CPU.  If
> I use any timeout (even a 0-10us range) it won't go below
> 4ms per message.

Out of curiosity, what makes 4.7ms okay and 10.5ms not?  At least for
the use case we have in the fleet (sensor reading mostly), the less
disruptive the better as long as things don't timeout and fail.

> The process is running at nice 19 priority, so it won't
> have a significant effect on other processes from a priority
> point of view.  It may still be hitting the scheduler locks
> pretty hard, though.  But I played with things quite a bit

And power.  Imagine multi six digit number of machines burning a full
core just because of this busy loop to read temperature sensors some
msecs faster.

> and the behavior or the management controller is too
> erratic to set a really good timeout.  Maybe other ones
> are better, don't know.
> 
> One other option we have is that the driver has something
> called "maintenance mode".  If it detect that you have
> reset the management controller or are issuing firmware
> commands, it will modify timeout behavior.  It can also
> be activated manually.  I could also make it switch to
> just calling schedule instead of delaying when in that
> mode.

Yeah, whatever which makes the common-case behavior avoid busy looping
would work.

> The right thing it do is complain bitterly to vendors who
> build hardware that has to be polled.  But besides that,

For sure, but there already are a lot of machines with this thing and
it'll take multiple years for them to retire so...

> I'm thinking the maintenance mode is the thing to do.
> It will also change behavior if you reset the management
> controller, but only for 30 seconds or so.  Does that
> work?

Yeah, sounds good to me.

Thnaks.

-- 
tejun