[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <86ca3bcd-de60-4784-8a32-8df360a4ceda@intel.com>
Date: Thu, 5 Sep 2024 11:08:04 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
CC: <fenghua.yu@...el.com>, <shuah@...nel.org>, <tony.luck@...el.com>,
<peternewman@...gle.com>, <babu.moger@....com>,
Maciej Wieczór-Retman <maciej.wieczor-retman@...el.com>,
<linux-kselftest@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/6] selftests/resctrl: Ensure measurements skip
initialization of default benchmark
Hi Ilpo,
On 9/5/24 5:10 AM, Ilpo Järvinen wrote:
> On Wed, 4 Sep 2024, Reinette Chatre wrote:
>> On 9/4/24 4:57 AM, Ilpo Järvinen wrote:
>>> On Fri, 30 Aug 2024, Reinette Chatre wrote:
>>>> On 8/30/24 3:56 AM, Ilpo Järvinen wrote:
>>>>> On Thu, 29 Aug 2024, Reinette Chatre wrote:
>
>>>>>> @@ -699,111 +639,80 @@ int resctrl_val(const struct resctrl_test
>>>>>> *test,
>>>>>> return ret;
>>>>>> }
>>>>>> - /*
>>>>>> - * If benchmark wasn't successfully started by child, then
>>>>>> child
>>>>>> should
>>>>>> - * kill parent, so save parent's pid
>>>>>> - */
>>>>>> ppid = getpid();
>>>>>> - if (pipe(pipefd)) {
>>>>>> - ksft_perror("Unable to create pipe");
>>>>>> + /* Taskset test to specified CPU. */
>>>>>> + ret = taskset_benchmark(ppid, uparams->cpu, &old_affinity);
>>>>>
>>>>> Previously only CPU affinity for bm_pid was set but now it's set before
>>>>> fork(). Quickly checking the Internet, it seems that CPU affinity gets
>>>>> inherited on fork() so now both processes will have the same affinity
>>>>> which might make the other process to interfere with the measurement.
>>>>
>>>> Setting the affinity is intended to ensure that the buffer preparation
>>>> occurs in the same topology as where the runtime portion will run.
>>>> This preparation is done before the work to be measured starts.
>>>>
>>>> This does tie in with the association with the resctrl group and I
>>>> will elaborate more below ...
>>>
>>> Okay, that's useful to retain but thinking this further, now we're also
>>> going do non-trivial amount of work in between the setup and the test by
>>
>> Could you please elaborate how the amount of work during setup can be an
>> issue? I have been focused on the measurements that are done afterwards
>> that do have clear boundaries from what I can tell.
>
> Well, you used it as a justification: "Setting the affinity is intended
> to ensure that the buffer preparation occurs in the same topology as where
> the runtime portion will run." So I assumed you had some expectations about
> "preparations" done outside of those "clear boundaries" but now you seem
> to take entirely opposite stance?
I do not follow you here. With the "clear boundaries" I meant the
measurements and associated variables that have a clear start/init and
stop/read while the other task sleeps. Yes, preparations are done outside
of this since that should not be measured. You stated "now we're also going
do non-trivial amount of work in between the setup and the test" ...
could you clarify what the problem is with this? Before this work
the "non-trivial amount of work" (for "fill_buf") was done as part of the
test with (wrong) guesses about how long it takes. This work aims to improve
this.
>
> fork() quite heavy operation as it has to copy various things including
> the address space which you just made to contain a huge mem blob. :-)
As highlighted in a comment found in the patch, the kernel uses copy-on-write
and all tests only read from the buffer after fork().
>
> BTW, perhaps we could use some lighter weighted fork variant in the
> resctrl selftests, the processes don't really need to be that separated
> to justify using full fork() (and custom benchmarks will do execvp()).
Are you thinking about pthreads? It is not clear to me that this is
necessary. It is also not clear to me what problem you are describing that
needs this solution. When I understand that better it will be easier to
discuss solutions.
>
>>> forking. I guess that doesn't matter for memflush = true case but might be
>>> meaningful for the memflush = false case that seems to be there to allow
>>> keeping caches hot (I personally haven't thought how to use "caches hot"
>>> test but we do have that capability by the fact that memflush paremeter
>>> exists).
>>
>> I believe that memflush = true will always be needed/used by the tests
>> relying on memory bandwidth measurement since that reduces cache hits during
>> measurement phase and avoids the additional guessing on how long the workload
>> should be run before reliable/consistent measurements can start.
>>
>> Thinking about the memflush = false case I now think that we should use that
>> for the CMT test. The buffer is allocated and initialized while the task
>> is configured with appropriate allocation limits so there should not be a
>> reason to flush the buffer from the cache. In fact, flushing the cache
>> introduces
>> the requirement to guess the workload's "settle" time (time to allocate the
>> buffer
>> into the cache again) before its occupancy can be measured. As a quick test I
>> set memflush = false on one system and it brought down the average diff
>> between
>> the cache portion size and the occupancy counts. I'll try it out on a few more
>> systems to confirm.
>
> Oh great!
>
> I've not really figured out the logic used in the old CMT test because
> there was the rewrite for it in the series I started to upstream some of
> these improvements from. But I was unable to rebase successfully that
> rewrite either because somebody had used a non-publically available tree
> as a basis for it so I never did even have time to understand what even
> the rewritten test did thanks to the very complex diff.
>
>>>>> Neither behavior, however, seems to result in the intended behavior as
>>>>> we
>>>>> either get interfering processes (if inherited) or no desired resctrl
>>>>> setup for the benchmark process.
>>>>
>>>> There are two processes to consider in the resource group, the parent
>>>> (that
>>>> sets up the buffer and does the measurements) and the child (that runs the
>>>> workload to be measured). Thanks to your commit da50de0a92f3
>>>> ("selftests/resctrl:
>>>> Calculate resctrl FS derived mem bw over sleep(1) only") the parent
>>>> will be sleeping while the child runs its workload and there is no
>>>> other interference I am aware of. The only additional measurements
>>>> that I can see would be the work needed to actually start and stop the
>>>> measurements and from what I can tell this falls into the noise.
>>>>
>>>> Please do keep in mind that the performance counters used, iMC, cannot
>>>> actually
>>>> be bound to a single CPU since it is a per-socket PMU. The measurements
>>>> have
>>>> thus never been as fine grained as the code pretends it to be.
>>>
>>> I was thinking if I should note the amount of work is small. Maybe it's
>>> fine to leave that noise there and I'm just overly cautious :-), when I
>>> used to do networking research in the past life, I wanted to eliminate as
>>> much noise sources so I guess it comes from that background.
>>
>> The goal of these tests are to verify *resctrl*, these are not intended to be
>> hardware validation tests. I think it would be better for resctrl if more time
>> is spent on functional tests of resctrl than these performance tests.
>
> This sounds so easy... (no offence) :-) If only there wouldn't be the
> black boxes and we'd have good and fine-grained ways to instrument it,
> it would be so much easier to realize non-statistical means to do
> functional tests.
>
To me functional tests of resctrl indeed sounds easier. Tests can interact with the
resctrl interface to ensure it works as expected ... test the boundaries
of user input to the various files, test task movement between groups, test moving of
monitor groups, test assigning CPUs to resource groups, ensure the "mode" of a
resource group can be changed and behaves as expected (for example, shared vs exclusive),
ensure changes to one file are reflected in others, like changing schemata is reflected
in "size" and "bit_usage", etc. etc. These are all tests of *resctrl* that supports
development and can be used to verify all is well as major changes (that we are seeing
more and more of) are made to the kernel subsystem. None of this is "black box" and
is all deterministic with obvious pass/fail. This can be made even more reliable with
not just checking of resctrl files but see if changes via resctrl is reflected in MSRs
as expected.
Reinette
Powered by blists - more mailing lists