[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <79d79110-9fbc-0e96-d17e-68a1f8f2c224@collabora.com>
Date: Thu, 12 May 2022 15:28:16 +0200
From: Tomeu Vizoso <tomeu.vizoso@...labora.com>
To: Rob Clark <robdclark@...il.com>, Daniel Vetter <daniel@...ll.ch>
Cc: Jessica Zhang <quic_jesszhan@...cinc.com>,
"open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
linux-arm-msm <linux-arm-msm@...r.kernel.org>,
Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
Maxime Ripard <mripard@...nel.org>,
Thomas Zimmermann <tzimmermann@...e.de>,
David Airlie <airlied@...ux.ie>,
Jonathan Corbet <corbet@....net>, Sean Paul <sean@...rly.run>,
Abhinav Kumar <quic_abhinavk@...cinc.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
dri-devel <dri-devel@...ts.freedesktop.org>,
freedreno <freedreno@...ts.freedesktop.org>
Subject: Re: [Freedreno] [RFC v2] drm/msm: Add initial ci/ subdirectory
On 5/11/22 7:46 PM, Rob Clark wrote:
> On Wed, May 11, 2022 at 10:12 AM Daniel Vetter <daniel@...ll.ch> wrote:
>>
>> On Tue, 10 May 2022 at 22:26, Rob Clark <robdclark@...il.com> wrote:
>>>
>>> On Tue, May 10, 2022 at 12:39 PM Jessica Zhang
>>> <quic_jesszhan@...cinc.com> wrote:
>>>>
>>>>
>>>>
>>>> On 5/10/2022 7:13 AM, Tomeu Vizoso wrote:
>>>>> And use it to store expectations about what the drm/msm driver is
>>>>> supposed to pass in the IGT test suite.
>>>>>
>>>>> Also include a configuration file that points to the out-of-tree CI
>>>>> scripts.
>>>>>
>>>>> By storing the test expectations along the code we can make sure both
>>>>> stay in sync with each other, and so we can know when a code change
>>>>> breaks those expectations.
>>>>>
>>>>> This will allow all contributors to drm/msm to reuse the infrastructure
>>>>> already in gitlab.freedesktop.org to test the driver on several
>>>>> generations of the hardware.
>>>>>
>>>>> v2:
>>>>> - Fix names of result expectation files to match SoC
>>>>> - Don't execute tests that are going to skip on all boards
>>>>>
>>>>> Signed-off-by: Tomeu Vizoso <tomeu.vizoso@...labora.com>
>>>>> ---
>>>>> Documentation/gpu/msm_automated_testing.rst | 70 +++++++++
>>>>> drivers/gpu/drm/msm/ci/gitlab-ci.yml | 11 ++
>>>>> drivers/gpu/drm/msm/ci/msm.testlist | 148 ++++++++++++++++++
>>>>> .../gpu/drm/msm/ci/msm_apq8016_results.txt | 140 +++++++++++++++++
>>>>> .../gpu/drm/msm/ci/msm_apq8096_results.txt | 140 +++++++++++++++++
>>>>> drivers/gpu/drm/msm/ci/msm_sc7180_results.txt | 141 +++++++++++++++++
>>>>> drivers/gpu/drm/msm/ci/msm_sdm845_results.txt | 141 +++++++++++++++++
>>>>> 7 files changed, 791 insertions(+)
>>>>> create mode 100644 Documentation/gpu/msm_automated_testing.rst
>>>>> create mode 100644 drivers/gpu/drm/msm/ci/gitlab-ci.yml
>>>>> create mode 100644 drivers/gpu/drm/msm/ci/msm.testlist
>>>>> create mode 100644 drivers/gpu/drm/msm/ci/msm_apq8016_results.txt
>>>>> create mode 100644 drivers/gpu/drm/msm/ci/msm_apq8096_results.txt
>>>>> create mode 100644 drivers/gpu/drm/msm/ci/msm_sc7180_results.txt
>>>>> create mode 100644 drivers/gpu/drm/msm/ci/msm_sdm845_results.txt
>>>>>
>
> [snip]
>
>>>>> diff --git a/drivers/gpu/drm/msm/ci/msm_sc7180_results.txt b/drivers/gpu/drm/msm/ci/msm_sc7180_results.txt
>>>>> new file mode 100644
>>>>> index 000000000000..01f7b4b399b5
>>>>> --- /dev/null
>>>>> +++ b/drivers/gpu/drm/msm/ci/msm_sc7180_results.txt
>>>>> @@ -0,0 +1,141 @@
>>>>> +igt@...e_auth@...client-simple,dmesg-warn
>>>>> +igt@...e_auth@...client-master-drop,pass
>>>>> +igt@...e_auth@...ic-auth,pass
>>>>> +igt@...e_auth@...y-magics,pass
>>>>> +igt@...e_getclient,pass
>>>>> +igt@...e_getstats,pass
>>>>> +igt@...e_getversion,pass
>>>>> +igt@...e_setmaster_vs_auth,pass
>>>>> +igt@..._read@...alid-buffer,pass
>>>>> +igt@..._read@...lt-buffer,pass
>>>>> +igt@..._read@...ty-block,pass
>>>>> +igt@..._read@...ty-nonblock,pass
>>>>> +igt@..._read@...rt-buffer-block,pass
>>>>> +igt@..._read@...rt-buffer-nonblock,pass
>>>>> +igt@..._read@...rt-buffer-wakeup,pass
>>>>> +igt@..._addfb_basic@...sed-handle,pass
>>>>> +igt@..._addfb_basic@...sed-pitches,pass
>>>>> +igt@..._addfb_basic@...sed-offsets,pass
>>>>> +igt@..._addfb_basic@...sed-modifier,pass
>>>>> +igt@..._addfb_basic@...acy-format,dmesg-warn
>>>>> +igt@..._addfb_basic@...handle,pass
>>>>> +igt@..._addfb_basic@...ic,pass
>>>>> +igt@..._addfb_basic@...-pitch-0,pass
>>>>> +igt@..._addfb_basic@...-pitch-32,pass
>>>>> +igt@..._addfb_basic@...-pitch-63,pass
>>>>> +igt@..._addfb_basic@...-pitch-128,pass
>>>>> +igt@..._addfb_basic@...-pitch-256,pass
>>>>> +igt@..._addfb_basic@...-pitch-1024,pass
>>>>> +igt@..._addfb_basic@...-pitch-999,pass
>>>>> +igt@..._addfb_basic@...-pitch-65536,pass
>>>>> +igt@..._addfb_basic@...e-max,pass
>>>>> +igt@..._addfb_basic@...-wide,pass
>>>>> +igt@..._addfb_basic@...-high,dmesg-warn
>>>>
>>>> For test results on Trogdor, is is possible to have them be
>>>> success/fail/skip only?
>>>>
>>>> Results such as dmesg-warn/dmesg-fail are igt_runner specific and
>>>> because there isn't support for igt_runner on ChromeOS, they will be
>>>> difficult to replicate and debug.
>>>
>>> Actually, I wonder if it would be better to just treat
>>> dmesg-warn/dmesg-fail as pass/fail? I'd noticed some flakes on
>>> rockchip which looked just like unrelated dmesg msg which just
>>> happened to show up while the test was running.
>>
>> This is kinda the reason behind standardizing on drm dmesg logging, so
>> that we have some chances at filtering stuff out. Not sure that's a
>> good idea, since when your entire box splats and lockdep is dead, then
>> continuing to run drm tests is still fairly pointless.
>
> I'm not sure if we are using it yet for drm-ci, but for mesa-ci we
> monitor dmesg (over serial port, from the controller) for splats, so
> we already have the tech for restarting or aborting the CI run. We
> don't need igt-runner to tell us.
Yep, these scripts are currently being used as-is from Mesa, so we got
that functionality for free.
>> I think this is another reason why trying at least to standardize this
>> stuff over drivers would be pretty good idea.
>>
>>> Additionally, some of the tests, like msm_recovery, are *expected* to
>>> generate some dmesg spam since they are intentionally triggering GPU
>>> hangs to test the recovery mechanism.
>>
>> Uh I don't like that. It just allows userspace to spam dmesg, which
>> doesn't seem like a great idea. That's at least why i915 dumps these
>> at a lower level, and in the past had a special "I'm going to whack
>> the gpu real hard expect hangs" knob in debugfs.
>>
>> Having tests which intentionally spam dmesg above info level isn't
>> really good since then you need endless amounts of test-specific
>> encoding of what is considered a success and what not. Like when a
>> backmerge breaks a testcases which is already at dmesg-fail, is that
>> bad or not? Probably bad, but was the situation before that really
>> good or already kinda on fire?
>
> I guess I could add some debugfs knobs to squelch the dmesg msgs on
> gpu hangs. In the normal case, I'd prefer that gpu hangs are not
> silent.. since that is something we get in feedback reports if a user
> (or dogfooder) reports a bug.
>
> The rockchip case I mentioned was some unrelated dmesg about
> linktraining failing.. presumably because there was no display
> attached? IDK, I didn't look too closely. But my point is we could
> be getting unrelated and asynchronous dmesg spam, even from other
> kernel subsystems. Letting that be part of the test results just
> sounds like asking for flakes.
I think some drivers are currently a bit too buggy to behave reliably
under CI unless one reduces coverage (rockchip on rk3399, for example).
And some other drivers (in other subsystems as well) could do with a
review of what they print to the console. I guess these are things we
could and probably should fix?
Cheers,
Tomeu
> BR,
> -R
>
>> -Daniel
>>
>>> BR,
>>> -R
>>>
Powered by blists - more mailing lists