linux-kernel - Re: [PATCH V2 3/3] perf: Optimize sched

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <X8encVJSgbXVLGvT@google.com>
Date:   Wed, 2 Dec 2020 23:40:49 +0900
From:   Namhyung Kim <namhyung@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     kan.liang@...ux.intel.com, mingo@...nel.org,
        linux-kernel@...r.kernel.org, eranian@...gle.com,
        irogers@...gle.com, gmx@...gle.com, acme@...nel.org,
        jolsa@...hat.com, ak@...ux.intel.com, benh@...nel.crashing.org,
        paulus@...ba.org, mpe@...erman.id.au
Subject: Re: [PATCH V2 3/3] perf: Optimize sched_task() in a context switch

Hi Peter and Kan,

On Tue, Dec 01, 2020 at 06:29:03PM +0100, Peter Zijlstra wrote:
> On Mon, Nov 30, 2020 at 11:38:42AM -0800, kan.liang@...ux.intel.com wrote:
> > From: Kan Liang <kan.liang@...ux.intel.com>
> > 
> > Some calls to sched_task() in a context switch can be avoided. For
> > example, large PEBS only requires flushing the buffer in context switch
> > out. The current code still invokes the sched_task() for large PEBS in
> > context switch in.
> 
> I still hate this one, how's something like this then?
> Which I still don't really like.. but at least its simpler.
> 
> (completely untested, may contain spurious edits, might ICE the
> compiler and set your pets on fire if it doesn't)

I've tested Kan's v2 patches and it worked well.  Will test your
version (with the fix in the other email) too.


> 
> And given this is an optimization, can we actually measure it to improve
> matters?

I just checked perf bench sched pipe result.  Without perf record
running, it usually takes less than 7 seconds.  Note that this (and
below) is a median value of 10 runs.

  # perf bench sched pipe
  # Running 'sched/pipe' benchmark:
  # Executed 1000000 pipe operations between two processes

     Total time: 6.875 [sec]

       6.875700 usecs/op
         145439 ops/sec


And I ran it again with perf record like below.  This is a result when
I applied the patch 1 and 2 only.

  # perf record -aB -c 100001 -e cycles:pp perf bench sched pipe
  # Running 'sched/pipe' benchmark:
  # Executed 1000000 pipe operations between two processes

     Total time: 8.198 [sec]

       8.198952 usecs/op
         121966 ops/sec
  [ perf record: Woken up 10 times to write data ]
  [ perf record: Captured and wrote 4.972 MB perf.data ]


With patch 3 applied, the total time went down a little bit.

  # perf record -aB -c 100001 -e cycles:pp perf bench sched pipe
  # Running 'sched/pipe' benchmark:
  # Executed 1000000 pipe operations between two processes

     Total time: 7.785 [sec]

       7.785119 usecs/op
         128450 ops/sec
  [ perf record: Woken up 12 times to write data ]
  [ perf record: Captured and wrote 4.622 MB perf.data ]


Thanks,
Namhyung