lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 14 Feb 2019 12:29:20 +0100 (CET)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Aubrey Li <aubrey.li@...ux.intel.com>
cc:     mingo@...hat.com, peterz@...radead.org, hpa@...or.com,
        ak@...ux.intel.com, tim.c.chen@...ux.intel.com,
        dave.hansen@...el.com, arjan@...ux.intel.com, aubrey.li@...el.com,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v11 2/3] x86,/proc/pid/status: Add AVX-512 usage elapsed
 time

On Wed, 13 Feb 2019, Aubrey Li wrote:

> AVX-512 components use could cause core turbo frequency drop. So
> it's useful to expose AVX-512 usage elapsed time as a heuristic hint
> for the user space job scheduler to cluster the AVX-512 using tasks
> together.
> 
> Example:
> $ cat /proc/pid/status | grep AVX512_elapsed_ms
> AVX512_elapsed_ms:      1020
> 
> The number '1020' denotes 1020 millisecond elapsed since last time
> context switch the off-CPU task using AVX-512 components, thus the

I know what you are trying to say, but this sentence does not parse. So
what you want to say is:

  This means that 1020 milliseconds have elapsed since the AVX512 usage of
  the task was detected when the task was scheduled out.

Aside of that 1020ms is hardly evidence for real AVX512 usage, so you want
to come up with a better example than that.

But that makes me think about the usefulness of this hint in general.

A AVX512 using task which runs alone on a CPU, is going to have either no
AVX512 usage recorded at all or the time elapsed since the last recording
is absurdly long. IOW, this needs crystal ball magic to decode because
there is no correlation between that elapsed time and the time when the
last context switch happened simply because that time is not available in
/proc/$PID/status. Sure you can oracle it out from /proc/$PID/stat with
even more crystal ball magic, but there is no explanation at all.

There may be use case scenarios where this crystal ball prediction is
actually useful, but the inaccuracy of that information and the possible
pitfalls for any user space application which uses it need to be documented
in detail. Without that, this is going to cause more trouble and confusion
than benefit.

Thanks,

	tglx

Powered by blists - more mailing lists