lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e1795fa8-f34e-6805-c937-1937490f09a5@amd.com>
Date:   Thu, 17 Aug 2023 17:26:49 +0200
From:   Shashank Sharma <shashank.sharma@....com>
To:     André Almeida <andrealmeid@...lia.com>
Cc:     pierre-eric.pelloux-prayer@....com, amd-gfx@...ts.freedesktop.org,
        'Marek Olšák' <maraeo@...il.com>,
        Timur Kristóf <timur.kristof@...il.com>,
        Samuel Pitoiset <samuel.pitoiset@...il.com>,
        kernel-dev@...lia.com, alexander.deucher@....com,
        christian.koenig@....com, linux-kernel@...r.kernel.org,
        dri-devel@...ts.freedesktop.org
Subject: Re: [PATCH v4 2/4] drm/amdgpu: Rework coredump to use memory
 dynamically


On 17/08/2023 17:17, André Almeida wrote:
>
>
> Em 17/08/2023 12:04, Shashank Sharma escreveu:
>>
>> On 17/08/2023 15:45, André Almeida wrote:
>>> Hi Shashank,
>>>
>>> Em 17/08/2023 03:41, Shashank Sharma escreveu:
>>>> Hello Andre,
>>>>
>>>> On 15/08/2023 21:50, André Almeida wrote:
>>>>> Instead of storing coredump information inside amdgpu_device struct,
>>>>> move if to a proper separated struct and allocate it dynamically. 
>>>>> This
>>>>> will make it easier to further expand the logged information.
>>>>>
>>>>> Signed-off-by: André Almeida <andrealmeid@...lia.com>
>>>>> ---
>>>>> v4: change kmalloc to kzalloc
>>>>> ---
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        | 14 +++--
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 63 
>>>>> ++++++++++++++--------
>>>>>   2 files changed, 49 insertions(+), 28 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>> index 9c6a332261ab..0d560b713948 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>> @@ -1088,11 +1088,6 @@ struct amdgpu_device {
>>>>>       uint32_t *reset_dump_reg_list;
>>>>>       uint32_t            *reset_dump_reg_value;
>>>>>       int                             num_regs;
>>>>> -#ifdef CONFIG_DEV_COREDUMP
>>>>> -    struct amdgpu_task_info         reset_task_info;
>>>>> -    bool                            reset_vram_lost;
>>>>> -    struct timespec64               reset_time;
>>>>> -#endif
>>>>>       bool                            scpm_enabled;
>>>>>       uint32_t                        scpm_status;
>>>>> @@ -1105,6 +1100,15 @@ struct amdgpu_device {
>>>>>       uint32_t            aid_mask;
>>>>>   };
>>>>> +#ifdef CONFIG_DEV_COREDUMP
>>>>> +struct amdgpu_coredump_info {
>>>>> +    struct amdgpu_device        *adev;
>>>>> +    struct amdgpu_task_info         reset_task_info;
>>>>> +    struct timespec64               reset_time;
>>>>> +    bool                            reset_vram_lost;
>>>>> +};
>>>>
>>>> The patch looks good to me in general, but I would recommend 
>>>> slightly different arrangement and segregation of GPU reset 
>>>> information.
>>>>
>>>> Please consider a higher level structure adev->gpu_reset_info, and 
>>>> move everything related to reset dump info into that, including 
>>>> this new coredump_info structure, something like this:
>>>>
>>>> struct amdgpu_reset_info {
>>>>
>>>>      uint32_t *reset_dump_reg_list;
>>>>
>>>>      uint32_t *reset_dump_reg_value;
>>>>
>>>>      int num_regs;
>>>>
>>>
>>> Right, I can encapsulate there reset_dump members,
>>>
>>>> #ifdef CONFIG_DEV_COREDUMP
>>>>
>>>>     struct amdgpu_coredump_info *coredump_info;/* keep this dynamic 
>>>> allocation */
>>>
>>> but we don't need a pointer for amdgpu_coredump_info inside 
>>> amdgpu_device or inside of amdgpu_device->gpu_reset_info, right?
>>
>> I think it would be better if we keep all of the GPU reset related 
>> data in the same structure, so adev->gpu_reset_info->coredump_info 
>> sounds about right to me.
>>
>
> But after patch 2/4, we don't need to store a coredump_info pointer 
> inside adev, this is what I meant. What would be the purpose of having 
> this pointer? It's freed by amdgpu_devcoredump_free(), so we don't 
> need to keep track of it.

Well, actually we are pulling in some 0parallel efforts on enhancing the 
GPU reset information, and we were planning to use the coredump info for 
some additional things. So if I have the coredump_info available (like 
reset_task_info and vram_lost) across a few functions in the driver with 
adev, it would make my job easy there :).

- Shashank

>
>> - Shashank
>>
>>>
>>>>
>>>> #endif
>>>>
>>>> }
>>>>
>>>>
>>>> This will make sure that all the relevant information is at the 
>>>> same place.
>>>>
>>>> - Shashank
>>>>
>>>        amdgpu_inc_vram_lost(tmp_adev);

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ