[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f90ee4f3-704d-4776-99e7-04f30969d93e@cachyos.org>
Date: Wed, 20 Nov 2024 15:30:20 +0100
From: Peter Jung <ptr1337@...hyos.org>
To: Dhananjay Ugwekar <Dhananjay.Ugwekar@....com>, peterz@...radead.org,
mingo@...hat.com, rui.zhang@...el.com, irogers@...gle.com,
kan.liang@...ux.intel.com, tglx@...utronix.de, bp@...en8.dei,
gautham.shenoy@....com
Cc: kprateek.nayak@....com, ravi.bangoria@....com, x86@...nel.org,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v7 10/10] perf/x86/rapl: Add core energy counter support
for AMD CPUs
Hi Dhananjay,
On 20.11.24 14:58, Dhananjay Ugwekar wrote:
> Hello Peter Jung,
>
> Thanks for trying out the patchset,
>
> On 11/20/2024 1:28 PM, Peter Jung wrote:
>> Hi together,
>>
>> This patch seems to crash the kernel and results into a not bootable system.
>>
>>
>> The patch has been applied on base 6.12.rc7 - I have not tested it yet on linux-next.
>>
>> I was able to reproduce this issue also on the v6 and the only "good" version was the v4.
>> This has been reproduced on several zen3+ machines and also on my 9950X.
>>
>> Bisect log:
>> ```
>> git bisect start
>> # status: waiting for both good and bad commits
>> # good: [2d5404caa8c7bb5c4e0435f94b28834ae5456623] Linux 6.12-rc7
>> git bisect good 2d5404caa8c7bb5c4e0435f94b28834ae5456623
>> # status: waiting for bad commit, 1 good commit known
>> # bad: [372e95a40e04ae6ebe69300b76566af6455ba84e] perf/x86/rapl: Add core energy counter support for AMD CPUs
>> git bisect bad 372e95a40e04ae6ebe69300b76566af6455ba84e
>> # good: [fd3c84b2fc8a50030e8c7d91983f50539035ec3a] perf/x86/rapl: Rename rapl_pmu variables
>> git bisect good fd3c84b2fc8a50030e8c7d91983f50539035ec3a
>> # good: [96673b2c940e71fde50a54311ecdce00ff7a8e0b] perf/x86/rapl: Modify the generic variable names to *_pkg*
>> git bisect good 96673b2c940e71fde50a54311ecdce00ff7a8e0b
>> # good: [68b214c92635f0b24a3f3074873b77f4f1a82b80] perf/x86/rapl: Move the cntr_mask to rapl_pmus struct
>> git bisect good 68b214c92635f0b24a3f3074873b77f4f1a82b80
>> # first bad commit: [372e95a40e04ae6ebe69300b76566af6455ba84e] perf/x86/rapl: Add core energy counter support for AMD CPUs
>> ```
>>
>> Nov 17 12:17:37 varvalian kernel: RIP: 0010:internal_create_group+0x9a/0x4e0
>> Nov 17 12:17:37 varvalian kernel: Code: 7b 20 00 0f 84 cb 00 00 00 48 8d 74 24 1c 48 8d 54 24 18 4c 89 ff e8 15 8a 99 00 48 83 3b 00 74 59 48 8b 43 18 48 85 c0 74 11 <48> 8b 30 48 85 f6 74 09 4c 8b 5b 08 4d 85 db 75 1a 48 8b 43 20 48
>> Nov 17 12:17:37 varvalian kernel: RSP: 0018:ffffaa5281fe7868 EFLAGS: 00010202
>> Nov 17 12:17:37 varvalian kernel: RAX: 796772656e650073 RBX: ffffffffc2a642aa RCX: f781ec27a963db00
>> Nov 17 12:17:37 varvalian kernel: RDX: ffffaa5281fe7880 RSI: ffffaa5281fe7884 RDI: ffff90c611dc8400
>> Nov 17 12:17:37 varvalian kernel: RBP: 000000000000000f R08: 0000000000000000 R09: 0000000000000001
>> Nov 17 12:17:37 varvalian kernel: R10: 0000000002000001 R11: ffffffff8e86ee00 R12: 0000000000000000
>> Nov 17 12:17:37 varvalian kernel: R13: ffff90c6038469c0 R14: ffff90c611dc8400 R15: ffff90c611dc8400
>> Nov 17 12:17:37 varvalian kernel: FS: 00007163efc54880(0000) GS:ffff90c8efe00000(0000) knlGS:0000000000000000
>> Nov 17 12:17:37 varvalian kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> Nov 17 12:17:37 varvalian kernel: CR2: 00005c1834b98298 CR3: 0000000121298000 CR4: 0000000000f50ef0
>> Nov 17 12:17:37 varvalian kernel: PKRU: 55555554
>> Nov 17 12:17:47 varvalian kernel: ------------[ cut here ]------------
>> ```
>>
>> Ill do on the weekend some additonal tests based on the latest linux-next snapshot and provide some more logs.
> Can you please try with the below diff once,
>
> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
> index e9be1f31163d..d3bb3865c1b1 100644
> --- a/arch/x86/events/rapl.c
> +++ b/arch/x86/events/rapl.c
> @@ -699,6 +699,7 @@ static const struct attribute_group *rapl_attr_update[] = {
>
> static const struct attribute_group *rapl_core_attr_update[] = {
> &rapl_events_core_group,
> + NULL,
> };
>
> static int __init init_rapl_pmu(struct rapl_pmus *rapl_pmus)
>
> Regards,
> Dhananjay
>
Thanks! This patch appears to fix the issue, when the kernel is built
with clang. Thanks for providing such fast fix! :)
Peter
>> Regards,
>>
>> Peter
Powered by blists - more mailing lists