[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABk29Nv7iJEcDg3rgSvfTkXEM69ZeLByJAsZYuA5qpdj645nZw@mail.gmail.com>
Date: Tue, 2 Mar 2021 12:55:07 -0800
From: Josh Don <joshdon@...gle.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
linux-kernel <linux-kernel@...r.kernel.org>,
Clement Courbet <courbet@...gle.com>,
Oleg Rombakh <olegrom@...gle.com>
Subject: Re: [PATCH] sched: Optimize __calc_delta.
On Fri, Feb 26, 2021 at 1:03 PM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Fri, Feb 26, 2021 at 11:52:39AM -0800, Josh Don wrote:
> > From: Clement Courbet <courbet@...gle.com>
> >
> > A significant portion of __calc_delta time is spent in the loop
> > shifting a u64 by 32 bits. Use a __builtin_clz instead of iterating.
> >
> > This is ~7x faster on benchmarks.
>
> Have you tried on hardware without such fancy instructions?
Was not able to find any on hand unfortunately. Clement did rework the
patch to use fls() instead, and has benchmarks for the generic and asm
variations. All of which are faster than the loop. In my next reply,
I'll include the updated patch inline.
Powered by blists - more mailing lists