[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.1902231849400.1666@nanos.tec.linutronix.de>
Date: Sat, 23 Feb 2019 19:16:17 +0100 (CET)
From: Thomas Gleixner <tglx@...utronix.de>
To: Aubrey Li <aubrey.li@...ux.intel.com>
cc: mingo@...hat.com, peterz@...radead.org, hpa@...or.com,
ak@...ux.intel.com, tim.c.chen@...ux.intel.com,
dave.hansen@...el.com, arjan@...ux.intel.com, aubrey.li@...el.com,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v12 3/3] Documentation/filesystems/proc.txt: add
AVX512_elapsed_ms
On Thu, 21 Feb 2019, Aubrey Li wrote:
> @@ -45,6 +45,7 @@ Table of Contents
> 3.9 /proc/<pid>/map_files - Information about memory mapped files
> 3.10 /proc/<pid>/timerslack_ns - Task timerslack value
> 3.11 /proc/<pid>/patch_state - Livepatch patch operation state
> + 3.12 /proc/<pid>/AVX512_elapsed_ms - time elapsed since last AVX512 use
So is this a separate file now?
> +3.12 /proc/<pid>/AVX512_elapsed_ms - time elapsed since last AVX512 use
> +--------------------------------------------------------------------------
> +If AVX512 is supported on the machine, this file displays time elapsed since
This is not a file and this documentation wants to be where the status file
is described.
> +last AVX512 usage of the task in millisecond.
Since last usage is misleading. What you want to say is:
The entry shows the milliseconds elapsed since the last time AVX512 usage
was recorded.
> +The per-task AVX512 usage tracking mechanism is added during context switch.
> +When the task is scheduled out, the AVX512 timestamp of the task is tagged
> +by jiffies if AVX512 usage is detected.
> +
> +When this interface is queried, AVX512_elapsed_ms is calculated as follows:
> +
> + delta = (long)(jiffies_now - AVX512_timestamp);
> + AVX512_elpased_ms = jiffies_to_msecs(delta);
This information is not really helpful for someone who wants to use that
field.
> +
> +Because this tracking mechanism depends on context switch, the number of
> +AVX512_elapsed_ms could be inaccurate if the AVX512 using task runs alone on
> +a CPU and not scheduled out for a long time. An extreme experiment shows a
> +task is spinning on the AVX512 ops on an isolated CPU, but the longest elapsed
> +time is close to 4 seconds(HZ = 250).
> +
> +So 5s or even longer is an appropriate threshold for the job scheduler to poll
> +and decide if the task should be classifed as an AVX512 task and migrated
> +away from the core on which a Non-AVX512 task is running.
5 seconds or long is appropriate? No. It really depends on the workload and
the scheduling scenarios. What the documentation has to provide is the
information that this value is a crystal ball estimate and what the reasons
are why its inaccurate.
Something like this instead of this conglomorate of useful, irrelevant and
misleading information:
The AVX512_elapsed_ms entry shows the milliseconds elapsed since the last
time AVX512 usage was recorded. The recording happens on a best effort
basis when a task is scheduled out. This means that the value depends on
two factors:
1) The time which the task spent on the CPU without being scheduled
out. With CPU isolation and a single runnable task this can take
several seconds.
2) The time since the task was scheduled out last. Depending on the
reason for being scheduled out (time slice exhausted, syscall ...)
this can be arbitrary long time.
As a consequence the value cannot be considered precise and authoritive
information. The application which uses this information has to be aware
of the overall scenario on the system in order to determine whether a
task is a real AVX512 user or not.
See? No jiffies, no code snippets, no absolute numbers and no magic
recommendation which might be correct for your test scenario, but
completely bogus for some other scenario.
Instead it contains the things which a application programmer who wants to
use that value needs to know. He then has to map it to his scenario and
build the crystal ball logic which makes it perhaps useful.
Thanks,
tglx
Powered by blists - more mailing lists