[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <76af0bc2-fecf-4f5b-8c52-924f49ac9b7a@amd.com>
Date: Thu, 21 Nov 2024 21:28:29 +0530
From: Dhananjay Ugwekar <Dhananjay.Ugwekar@....com>
To: Peter Jung <ptr1337@...hyos.org>, peterz@...radead.org, mingo@...hat.com,
rui.zhang@...el.com, irogers@...gle.com, kan.liang@...ux.intel.com,
tglx@...utronix.de, bp@...en8.dei, gautham.shenoy@....com
Cc: kprateek.nayak@....com, ravi.bangoria@....com, x86@...nel.org,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v7 10/10] perf/x86/rapl: Add core energy counter support
for AMD CPUs
On 11/20/2024 8:00 PM, Peter Jung wrote:
> Hi Dhananjay,
>
> On 20.11.24 14:58, Dhananjay Ugwekar wrote:
>> Hello Peter Jung,
>>
>> Thanks for trying out the patchset,
>>
>> On 11/20/2024 1:28 PM, Peter Jung wrote:
>>> Hi together,
>>>
>>> This patch seems to crash the kernel and results into a not bootable system.
>>>
>>>
>>> The patch has been applied on base 6.12.rc7 - I have not tested it yet on linux-next.
>>>
>>> I was able to reproduce this issue also on the v6 and the only "good" version was the v4.
>>> This has been reproduced on several zen3+ machines and also on my 9950X.
>>>
>>> Bisect log:
>>> ```
>>> git bisect start
>>> # status: waiting for both good and bad commits
>>> # good: [2d5404caa8c7bb5c4e0435f94b28834ae5456623] Linux 6.12-rc7
>>> git bisect good 2d5404caa8c7bb5c4e0435f94b28834ae5456623
>>> # status: waiting for bad commit, 1 good commit known
>>> # bad: [372e95a40e04ae6ebe69300b76566af6455ba84e] perf/x86/rapl: Add core energy counter support for AMD CPUs
>>> git bisect bad 372e95a40e04ae6ebe69300b76566af6455ba84e
>>> # good: [fd3c84b2fc8a50030e8c7d91983f50539035ec3a] perf/x86/rapl: Rename rapl_pmu variables
>>> git bisect good fd3c84b2fc8a50030e8c7d91983f50539035ec3a
>>> # good: [96673b2c940e71fde50a54311ecdce00ff7a8e0b] perf/x86/rapl: Modify the generic variable names to *_pkg*
>>> git bisect good 96673b2c940e71fde50a54311ecdce00ff7a8e0b
>>> # good: [68b214c92635f0b24a3f3074873b77f4f1a82b80] perf/x86/rapl: Move the cntr_mask to rapl_pmus struct
>>> git bisect good 68b214c92635f0b24a3f3074873b77f4f1a82b80
>>> # first bad commit: [372e95a40e04ae6ebe69300b76566af6455ba84e] perf/x86/rapl: Add core energy counter support for AMD CPUs
>>> ```
>>>
>>> Nov 17 12:17:37 varvalian kernel: RIP: 0010:internal_create_group+0x9a/0x4e0
>>> Nov 17 12:17:37 varvalian kernel: Code: 7b 20 00 0f 84 cb 00 00 00 48 8d 74 24 1c 48 8d 54 24 18 4c 89 ff e8 15 8a 99 00 48 83 3b 00 74 59 48 8b 43 18 48 85 c0 74 11 <48> 8b 30 48 85 f6 74 09 4c 8b 5b 08 4d 85 db 75 1a 48 8b 43 20 48
>>> Nov 17 12:17:37 varvalian kernel: RSP: 0018:ffffaa5281fe7868 EFLAGS: 00010202
>>> Nov 17 12:17:37 varvalian kernel: RAX: 796772656e650073 RBX: ffffffffc2a642aa RCX: f781ec27a963db00
>>> Nov 17 12:17:37 varvalian kernel: RDX: ffffaa5281fe7880 RSI: ffffaa5281fe7884 RDI: ffff90c611dc8400
>>> Nov 17 12:17:37 varvalian kernel: RBP: 000000000000000f R08: 0000000000000000 R09: 0000000000000001
>>> Nov 17 12:17:37 varvalian kernel: R10: 0000000002000001 R11: ffffffff8e86ee00 R12: 0000000000000000
>>> Nov 17 12:17:37 varvalian kernel: R13: ffff90c6038469c0 R14: ffff90c611dc8400 R15: ffff90c611dc8400
>>> Nov 17 12:17:37 varvalian kernel: FS: 00007163efc54880(0000) GS:ffff90c8efe00000(0000) knlGS:0000000000000000
>>> Nov 17 12:17:37 varvalian kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> Nov 17 12:17:37 varvalian kernel: CR2: 00005c1834b98298 CR3: 0000000121298000 CR4: 0000000000f50ef0
>>> Nov 17 12:17:37 varvalian kernel: PKRU: 55555554
>>> Nov 17 12:17:47 varvalian kernel: ------------[ cut here ]------------
>>> ```
>>>
>>> Ill do on the weekend some additonal tests based on the latest linux-next snapshot and provide some more logs.
>> Can you please try with the below diff once,
>>
>> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
>> index e9be1f31163d..d3bb3865c1b1 100644
>> --- a/arch/x86/events/rapl.c
>> +++ b/arch/x86/events/rapl.c
>> @@ -699,6 +699,7 @@ static const struct attribute_group *rapl_attr_update[] = {
>>
>> static const struct attribute_group *rapl_core_attr_update[] = {
>> &rapl_events_core_group,
>> + NULL,
>> };
>>
>> static int __init init_rapl_pmu(struct rapl_pmus *rapl_pmus)
>>
>> Regards,
>> Dhananjay
>>
>
>
> Thanks! This patch appears to fix the issue, when the kernel is built with clang. Thanks for providing such fast fix! :)
Great!, Thanks for the confirmation.
Regards,
Dhananjay
>
> Peter
>
>
>>> Regards,
>>>
>>> Peter
>
Powered by blists - more mailing lists