lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <X8encVJSgbXVLGvT@google.com>
Date:   Wed, 2 Dec 2020 23:40:49 +0900
From:   Namhyung Kim <namhyung@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     kan.liang@...ux.intel.com, mingo@...nel.org,
        linux-kernel@...r.kernel.org, eranian@...gle.com,
        irogers@...gle.com, gmx@...gle.com, acme@...nel.org,
        jolsa@...hat.com, ak@...ux.intel.com, benh@...nel.crashing.org,
        paulus@...ba.org, mpe@...erman.id.au
Subject: Re: [PATCH V2 3/3] perf: Optimize sched_task() in a context switch

Hi Peter and Kan,

On Tue, Dec 01, 2020 at 06:29:03PM +0100, Peter Zijlstra wrote:
> On Mon, Nov 30, 2020 at 11:38:42AM -0800, kan.liang@...ux.intel.com wrote:
> > From: Kan Liang <kan.liang@...ux.intel.com>
> > 
> > Some calls to sched_task() in a context switch can be avoided. For
> > example, large PEBS only requires flushing the buffer in context switch
> > out. The current code still invokes the sched_task() for large PEBS in
> > context switch in.
> 
> I still hate this one, how's something like this then?
> Which I still don't really like.. but at least its simpler.
> 
> (completely untested, may contain spurious edits, might ICE the
> compiler and set your pets on fire if it doesn't)

I've tested Kan's v2 patches and it worked well.  Will test your
version (with the fix in the other email) too.


> 
> And given this is an optimization, can we actually measure it to improve
> matters?

I just checked perf bench sched pipe result.  Without perf record
running, it usually takes less than 7 seconds.  Note that this (and
below) is a median value of 10 runs.

  # perf bench sched pipe
  # Running 'sched/pipe' benchmark:
  # Executed 1000000 pipe operations between two processes

     Total time: 6.875 [sec]

       6.875700 usecs/op
         145439 ops/sec


And I ran it again with perf record like below.  This is a result when
I applied the patch 1 and 2 only.

  # perf record -aB -c 100001 -e cycles:pp perf bench sched pipe
  # Running 'sched/pipe' benchmark:
  # Executed 1000000 pipe operations between two processes

     Total time: 8.198 [sec]

       8.198952 usecs/op
         121966 ops/sec
  [ perf record: Woken up 10 times to write data ]
  [ perf record: Captured and wrote 4.972 MB perf.data ]


With patch 3 applied, the total time went down a little bit.

  # perf record -aB -c 100001 -e cycles:pp perf bench sched pipe
  # Running 'sched/pipe' benchmark:
  # Executed 1000000 pipe operations between two processes

     Total time: 7.785 [sec]

       7.785119 usecs/op
         128450 ops/sec
  [ perf record: Woken up 12 times to write data ]
  [ perf record: Captured and wrote 4.622 MB perf.data ]


Thanks,
Namhyung

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ