[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADnq5_OBsCtmzzB7yC85OjHfALCjUUPTDgw7C9UsghNfx7hPnw@mail.gmail.com>
Date: Thu, 14 Mar 2024 09:27:35 -0400
From: Alex Deucher <alexdeucher@...il.com>
To: "Sharma, Shashank" <shashank.sharma@....com>
Cc: "Khatri, Sunil" <sukhatri@....com>, Sunil Khatri <sunil.khatri@....com>,
Alex Deucher <alexander.deucher@....com>, Christian König <christian.koenig@....com>,
amd-gfx@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] drm:amdgpu: add firmware information of all IP's
On Thu, Mar 14, 2024 at 2:10 AM Sharma, Shashank
<shashank.sharma@....com> wrote:
>
>
> On 14/03/2024 06:58, Khatri, Sunil wrote:
> >
> > On 3/14/2024 2:06 AM, Alex Deucher wrote:
> >> On Tue, Mar 12, 2024 at 8:42 AM Sunil Khatri <sunil.khatri@...com>
> >> wrote:
> >>> Add firmware version information of each
> >>> IP and each instance where applicable.
> >>>
> >> Is there a way we can share some common code with devcoredump,
> >> debugfs, and the info IOCTL? All three places need to query this
> >> information and the same logic is repeated in each case.
> >
> > Hello Alex,
> >
> > Yes you re absolutely right the same information is being retrieved
> > again as done in debugfs. I can reorganize the code so same function
> > could be used by debugfs and devcoredump but this is exactly what i
> > tried to avoid here. I did try to use minimum functionality in
> > devcoredump without shuffling a lot of code here and there.
> >
> > Also our devcoredump is implemented in amdgpu_reset.c and not all the
> > information is available here and there we might have to include lot
> > of header and cross functions in amdgpu_reset until we want a
> > dedicated file for devcoredump.
>
>
> I think Alex is suggesting to have one common backend to generate all
> the core debug info, and then different wrapper functions which can pack
> this raw info into the packets aligned with respective infra like
> devcore/debugfs/info IOCTL, which seems like a good idea to me.
Right, something like this. I'm trying to avoid having to touch
several places every time we add a new firmware type or TA, etc.
Maybe something like an array of valid firmwares for each device and
then we can just walk the array and call a helper function to fetch
the firmware versions and name strings for the requested type. Then
each use case can just call the helpers to get what it needs.
Alex
>
> If you think you need a new file for this backend it should be fine.
>
>
> something like:
>
> amdgpu_debug_core.c::
>
> struct amdgpu_core_debug_info {
>
> /* Superset of all the info you are collecting from HW */
>
> };
>
> - amdgpu_debug_generate_core_info
>
> {
>
> /* This function collects the core debug info from HW and saves in
> amdgpu_core_debug_info,
>
> we can update this periodically regardless of a request */
>
> }
>
> and then:
>
> devcore_info *amdgpu_debug_pack_for_devcore(core_debug_info)
>
> {
>
> /* convert core debug info into devcore aligned format/data */
>
> }
>
> ioctl_info *amdgpu_debug_pack_for_info_ioctl(core_debug_info)
>
> {
>
> /* convert core debug info into info IOCTL aligned format/data */
>
> }
>
> debugfs_info *amdgpu_debug_pack_for_debugfs(core_debug_info)
>
> {
>
> /* convert core debug info into debugfs aligned format/data */
>
> }
>
> - Shashank
>
> >
> >
> >
> > Info IOCTL does have a lot of information which also is in pipeline to
> > be dumped but this if we want to reuse the functionality of IOCTL we
> > need to reorganize a lot of code.
> >
> > If that is the need of the hour i could work on that. Please let me know.
> >
> > This is my suggestion if it makes sense:
> >
> > 1. If we want to reuse a lot of functionality then we need to
> > modularize some of the functions further so they could be consumed
> > directly by devcoredump.
> > 2. We should also have a dedicated file for devcoredump.c/.h so its
> > easy to include headers of needed functionality cleanly and easy to
> > expand devcoredump.
> > 3. based on the priority and importance of this task we can add
> > information else some repetition is a real possibility.
> >
> >>
> >> Alex
> >>
> >>
> >>> Signed-off-by: Sunil Khatri <sunil.khatri@....com>
> >>> ---
> >>> drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 122
> >>> ++++++++++++++++++++++
> >>> 1 file changed, 122 insertions(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
> >>> index 611fdb90a1fc..78ddc58aef67 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
> >>> @@ -168,6 +168,123 @@ void amdgpu_coredump(struct amdgpu_device
> >>> *adev, bool vram_lost,
> >>> {
> >>> }
> >>> #else
> >>> +static void amdgpu_devcoredump_fw_info(struct amdgpu_device *adev,
> >>> struct drm_printer *p)
> >>> +{
> >>> + uint32_t version;
> >>> + uint32_t feature;
> >>> + uint8_t smu_program, smu_major, smu_minor, smu_debug;
> >>> +
> >>> + drm_printf(p, "VCE feature version: %u, fw version: 0x%08x\n",
> >>> + adev->vce.fb_version, adev->vce.fw_version);
> >>> + drm_printf(p, "UVD feature version: %u, fw version: 0x%08x\n",
> >>> + 0, adev->uvd.fw_version);
> >>> + drm_printf(p, "GMC feature version: %u, fw version: 0x%08x\n",
> >>> + 0, adev->gmc.fw_version);
> >>> + drm_printf(p, "ME feature version: %u, fw version: 0x%08x\n",
> >>> + adev->gfx.me_feature_version,
> >>> adev->gfx.me_fw_version);
> >>> + drm_printf(p, "PFP feature version: %u, fw version: 0x%08x\n",
> >>> + adev->gfx.pfp_feature_version,
> >>> adev->gfx.pfp_fw_version);
> >>> + drm_printf(p, "CE feature version: %u, fw version: 0x%08x\n",
> >>> + adev->gfx.ce_feature_version,
> >>> adev->gfx.ce_fw_version);
> >>> + drm_printf(p, "RLC feature version: %u, fw version: 0x%08x\n",
> >>> + adev->gfx.rlc_feature_version,
> >>> adev->gfx.rlc_fw_version);
> >>> +
> >>> + drm_printf(p, "RLC SRLC feature version: %u, fw version:
> >>> 0x%08x\n",
> >>> + adev->gfx.rlc_srlc_feature_version,
> >>> + adev->gfx.rlc_srlc_fw_version);
> >>> + drm_printf(p, "RLC SRLG feature version: %u, fw version:
> >>> 0x%08x\n",
> >>> + adev->gfx.rlc_srlg_feature_version,
> >>> + adev->gfx.rlc_srlg_fw_version);
> >>> + drm_printf(p, "RLC SRLS feature version: %u, fw version:
> >>> 0x%08x\n",
> >>> + adev->gfx.rlc_srls_feature_version,
> >>> + adev->gfx.rlc_srls_fw_version);
> >>> + drm_printf(p, "RLCP feature version: %u, fw version: 0x%08x\n",
> >>> + adev->gfx.rlcp_ucode_feature_version,
> >>> + adev->gfx.rlcp_ucode_version);
> >>> + drm_printf(p, "RLCV feature version: %u, fw version: 0x%08x\n",
> >>> + adev->gfx.rlcv_ucode_feature_version,
> >>> + adev->gfx.rlcv_ucode_version);
> >>> + drm_printf(p, "MEC feature version: %u, fw version: 0x%08x\n",
> >>> + adev->gfx.mec_feature_version,
> >>> + adev->gfx.mec_fw_version);
> >>> +
> >>> + if (adev->gfx.mec2_fw)
> >>> + drm_printf(p,
> >>> + "MEC2 feature version: %u, fw version:
> >>> 0x%08x\n",
> >>> + adev->gfx.mec2_feature_version,
> >>> + adev->gfx.mec2_fw_version);
> >>> +
> >>> + drm_printf(p, "IMU feature version: %u, fw version: 0x%08x\n",
> >>> + 0, adev->gfx.imu_fw_version);
> >>> + drm_printf(p, "PSP SOS feature version: %u, fw version:
> >>> 0x%08x\n",
> >>> + adev->psp.sos.feature_version,
> >>> + adev->psp.sos.fw_version);
> >>> + drm_printf(p, "PSP ASD feature version: %u, fw version:
> >>> 0x%08x\n",
> >>> + adev->psp.asd_context.bin_desc.feature_version,
> >>> + adev->psp.asd_context.bin_desc.fw_version);
> >>> +
> >>> + drm_printf(p, "TA XGMI feature version: 0x%08x, fw version:
> >>> 0x%08x\n",
> >>> + adev->psp.xgmi_context.context.bin_desc.feature_version,
> >>> + adev->psp.xgmi_context.context.bin_desc.fw_version);
> >>> + drm_printf(p, "TA RAS feature version: 0x%08x, fw version:
> >>> 0x%08x\n",
> >>> + adev->psp.ras_context.context.bin_desc.feature_version,
> >>> + adev->psp.ras_context.context.bin_desc.fw_version);
> >>> + drm_printf(p, "TA HDCP feature version: 0x%08x, fw version:
> >>> 0x%08x\n",
> >>> + adev->psp.hdcp_context.context.bin_desc.feature_version,
> >>> + adev->psp.hdcp_context.context.bin_desc.fw_version);
> >>> + drm_printf(p, "TA DTM feature version: 0x%08x, fw version:
> >>> 0x%08x\n",
> >>> + adev->psp.dtm_context.context.bin_desc.feature_version,
> >>> + adev->psp.dtm_context.context.bin_desc.fw_version);
> >>> + drm_printf(p, "TA RAP feature version: 0x%08x, fw version:
> >>> 0x%08x\n",
> >>> + adev->psp.rap_context.context.bin_desc.feature_version,
> >>> + adev->psp.rap_context.context.bin_desc.fw_version);
> >>> + drm_printf(p, "TA SECURE DISPLAY feature version: 0x%08x, fw
> >>> version: 0x%08x\n",
> >>> + adev->psp.securedisplay_context.context.bin_desc.feature_version,
> >>> + adev->psp.securedisplay_context.context.bin_desc.fw_version);
> >>> +
> >>> + /* SMC firmware */
> >>> + version = adev->pm.fw_version;
> >>> +
> >>> + smu_program = (version >> 24) & 0xff;
> >>> + smu_major = (version >> 16) & 0xff;
> >>> + smu_minor = (version >> 8) & 0xff;
> >>> + smu_debug = (version >> 0) & 0xff;
> >>> + drm_printf(p, "SMC feature version: %u, program: %d, fw
> >>> version: 0x%08x (%d.%d.%d)\n",
> >>> + 0, smu_program, version, smu_major, smu_minor,
> >>> smu_debug);
> >>> +
> >>> + /* SDMA firmware */
> >>> + for (int i = 0; i < adev->sdma.num_instances; i++) {
> >>> + drm_printf(p, "SDMA%d feature version: %u, firmware
> >>> version: 0x%08x\n",
> >>> + i, adev->sdma.instance[i].feature_version,
> >>> + adev->sdma.instance[i].fw_version);
> >>> + }
> >>> +
> >>> + drm_printf(p, "VCN feature version: %u, fw version: 0x%08x\n",
> >>> + 0, adev->vcn.fw_version);
> >>> + drm_printf(p, "DMCU feature version: %u, fw version: 0x%08x\n",
> >>> + 0, adev->dm.dmcu_fw_version);
> >>> + drm_printf(p, "DMCUB feature version: %u, fw version:
> >>> 0x%08x\n",
> >>> + 0, adev->dm.dmcub_fw_version);
> >>> + drm_printf(p, "PSP TOC feature version: %u, fw version:
> >>> 0x%08x\n",
> >>> + adev->psp.toc.feature_version,
> >>> adev->psp.toc.fw_version);
> >>> +
> >>> + version = adev->mes.kiq_version & AMDGPU_MES_VERSION_MASK;
> >>> + feature = (adev->mes.kiq_version &
> >>> AMDGPU_MES_FEAT_VERSION_MASK)
> >>> + >>
> >>> AMDGPU_MES_FEAT_VERSION_SHIFT;
> >>> + drm_printf(p, "MES_KIQ feature version: %u, fw version:
> >>> 0x%08x\n",
> >>> + feature, version);
> >>> +
> >>> + version = adev->mes.sched_version & AMDGPU_MES_VERSION_MASK;
> >>> + feature = (adev->mes.sched_version &
> >>> AMDGPU_MES_FEAT_VERSION_MASK)
> >>> + >>
> >>> AMDGPU_MES_FEAT_VERSION_SHIFT;
> >>> + drm_printf(p, "MES feature version: %u, fw version: 0x%08x\n",
> >>> + feature, version);
> >>> +
> >>> + drm_printf(p, "VPE feature version: %u, fw version: 0x%08x\n",
> >>> + adev->vpe.feature_version, adev->vpe.fw_version);
> >>> +
> >>> +}
> >>> +
> >>> static ssize_t
> >>> amdgpu_devcoredump_read(char *buffer, loff_t offset, size_t count,
> >>> void *data, size_t datalen)
> >>> @@ -215,6 +332,11 @@ amdgpu_devcoredump_read(char *buffer, loff_t
> >>> offset, size_t count,
> >>> }
> >>> }
> >>>
> >>> + if (coredump->adev) {
> >>> + drm_printf(&p, "IP Firmwares\n");
> >>> + amdgpu_devcoredump_fw_info(coredump->adev, &p);
> >>> + }
> >>> +
> >>> if (coredump->ring) {
> >>> drm_printf(&p, "\nRing timed out details\n");
> >>> drm_printf(&p, "IP Type: %d Ring Name: %s\n",
> >>> --
> >>> 2.34.1
> >>>
Powered by blists - more mailing lists