lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Fri, 8 Mar 2024 11:11:57 +0100
From: Christian König <ckoenig.leichtzumerken@...il.com>
To: "Khatri, Sunil" <sukhatri@....com>,
 Christian König <christian.koenig@....com>,
 Sunil Khatri <sunil.khatri@....com>, Alex Deucher
 <alexander.deucher@....com>, Shashank Sharma <shashank.sharma@....com>
Cc: amd-gfx@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org,
 linux-kernel@...r.kernel.org, Mukul Joshi <mukul.joshi@....com>,
 Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@....com>
Subject: Re: [PATCH v2 2/2] drm/amdgpu: add vm fault information to
 devcoredump

Am 08.03.24 um 10:16 schrieb Khatri, Sunil:
>
> On 3/8/2024 2:39 PM, Christian König wrote:
>> Am 07.03.24 um 21:50 schrieb Sunil Khatri:
>>> Add page fault information to the devcoredump.
>>>
>>> Output of devcoredump:
>>> **** AMDGPU Device Coredump ****
>>> version: 1
>>> kernel: 6.7.0-amd-staging-drm-next
>>> module: amdgpu
>>> time: 29.725011811
>>> process_name: soft_recovery_p PID: 1720
>>>
>>> Ring timed out details
>>> IP Type: 0 Ring Name: gfx_0.0.0
>>>
>>> [gfxhub] Page fault observed
>>> Faulty page starting at address: 0x0000000000000000
>>> Protection fault status register: 0x301031
>>>
>>> VRAM is lost due to GPU reset!
>>>
>>> Signed-off-by: Sunil Khatri <sunil.khatri@....com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 14 +++++++++++++-
>>>   1 file changed, 13 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
>>> index 147100c27c2d..8794a3c21176 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
>>> @@ -203,8 +203,20 @@ amdgpu_devcoredump_read(char *buffer, loff_t 
>>> offset, size_t count,
>>>                  coredump->ring->name);
>>>       }
>>>   +    if (coredump->adev) {
>>> +        struct amdgpu_vm_fault_info *fault_info =
>>> +            &coredump->adev->vm_manager.fault_info;
>>> +
>>> +        drm_printf(&p, "\n[%s] Page fault observed\n",
>>> +               fault_info->vmhub ? "mmhub" : "gfxhub");
>>> +        drm_printf(&p, "Faulty page starting at address: 0x%016llx\n",
>>> +               fault_info->addr);
>>> +        drm_printf(&p, "Protection fault status register: 0x%x\n",
>>> +               fault_info->status);
>>> +    }
>>> +
>>>       if (coredump->reset_vram_lost)
>>> -        drm_printf(&p, "VRAM is lost due to GPU reset!\n");
>>> +        drm_printf(&p, "\nVRAM is lost due to GPU reset!\n");
>>
>> Why this additional new line?
> The intent is the devcoredump have different sections clearly 
> demarcated with an new line else "VRAM is lost due to GPU reset!" 
> seems part of the page fault information.
> [gfxhub] Page fault observed
> Faulty page starting at address: 0x0000000000000000
> Protection fault status register: 0x301031
>
> VRAM is lost due to GPU reset!

In that case I would print the newline independent if VRAM is lost or 
not. Otherwise you get:

Protection fault status register:...

VRAM is lost due to GPU reset!
AMDGPU register dumps:

In one case and:


Protection fault status register:...
AMDGPU register dumps:

In the other case which breaks this sectioning quite a bit.

Regards,
Christian.

>
> Regards
> Sunil
>
>>
>> Apart from that looks really good to me.
>>
>> Regards,
>> Christian.
>>
>>>       if (coredump->adev->reset_info.num_regs) {
>>>           drm_printf(&p, "AMDGPU register dumps:\nOffset:     
>>> Value:\n");
>>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ