[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <SN6PR12MB263989CCDCC0F74138B6B747F8A40@SN6PR12MB2639.namprd12.prod.outlook.com>
Date: Fri, 23 Aug 2019 15:28:59 +0000
From: "Ghannam, Yazen" <Yazen.Ghannam@....com>
To: "Ghannam, Yazen" <Yazen.Ghannam@....com>,
Adam Borowski <kilobyte@...band.pl>
CC: "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"bp@...en8.de" <bp@...en8.de>
Subject: RE: [PATCH v3 0/8] AMD64 EDAC fixes
> -----Original Message-----
> From: linux-edac-owner@...r.kernel.org <linux-edac-owner@...r.kernel.org> On Behalf Of Ghannam, Yazen
> Sent: Thursday, August 22, 2019 1:54 PM
> To: Adam Borowski <kilobyte@...band.pl>
> Cc: linux-edac@...r.kernel.org; linux-kernel@...r.kernel.org; bp@...en8.de
> Subject: RE: [PATCH v3 0/8] AMD64 EDAC fixes
>
...
> I wonder if the module is being loaded multiple times. I'll try to reproduce this and track it down.
>
I was able to reproduce a similar failure. I do see that the module is being loaded multiple times on failure.
Here's a call trace from one dump_stack() output:
[ +0.004964] CPU: 132 PID: 2680 Comm: systemd-udevd Not tainted 4.20.0-edac-debug+ #36
[ +0.009802] Call Trace:
[ +0.002727] dump_stack+0x63/0x85
[ +0.003696] amd64_edac_init+0x2163/0x3000 [amd64_edac_mod]
[ +0.006216] ? __wake_up+0x13/0x20
[ +0.003790] ? 0xffffffffc120d000
[ +0.003694] do_one_initcall+0x4a/0x1c9
[ +0.004277] ? _cond_resched+0x19/0x40
[ +0.004178] ? kmem_cache_alloc_trace+0x15c/0x1d0
[ +0.005244] do_init_module+0x5f/0x216
[ +0.004180] load_module+0x21d5/0x2ac0
[ +0.004179] ? wait_woken+0x80/0x80
[ +0.003889] __do_sys_finit_module+0xfc/0x120
[ +0.004858] ? __do_sys_finit_module+0xfc/0x120
[ +0.005052] __x64_sys_finit_module+0x1a/0x20
[ +0.004857] do_syscall_64+0x5a/0x120
[ +0.004081] entry_SYSCALL_64_after_hwframe+0x44/0xa9
So it seems that userspace (systemd-udevd) keeps trying to load the module. I'm not sure how to prevent this from within the module.
Boris,
Do you think it'd be appropriate to change the return values for some cases?
For example, ECC disabled is a hardware configuration. This doesn't mean that the module failed any operations in this case.
In other words, the module checks for a feature. If the feature is not present, then return without failure (and maybe give a message).
Thanks,
Yazen
Powered by blists - more mailing lists