[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9f7e9465-897a-445b-acd6-a968a683d14b@gmail.com>
Date: Fri, 11 Aug 2023 12:49:18 +0200
From: Rafał Miłecki <zajec5@...il.com>
To: Florian Fainelli <florian.fainelli@...adcom.com>,
Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Will Deacon <will@...nel.org>, Waiman Long <longman@...hat.com>,
Boqun Feng <boqun.feng@...il.com>, Russell King <linux@...linux.org.uk>,
Daniel Lezcano <daniel.lezcano@...aro.org>,
Thomas Gleixner <tglx@...utronix.de>, Florian Fainelli
<f.fainelli@...il.com>, linux-clk@...r.kernel.org,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>,
Network Development <netdev@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Cc: OpenWrt Development List <openwrt-devel@...ts.openwrt.org>,
bcm-kernel-feedback-list <bcm-kernel-feedback-list@...adcom.com>
Subject: Re: ARM board lockups/hangs triggered by locks and mutexes
On 7.08.2023 20:34, Florian Fainelli wrote:
> On 8/7/23 04:10, Rafał Miłecki wrote:
>> On 4.08.2023 13:07, Rafał Miłecki wrote:
>>> I triple checked that. Dropping a single unused function breaks kernel /
>>> device stability on BCM53573!
>>>
>>> AFAIK the only thing below diff actually affects is location of symbols
>>> (I actually verified that by comparing System.map before and after -
>>> over 22'000 of relocated symbols).
>>>
>>> Can some unfortunate location of symbols cause those hangs/lockups?
>>
>> I performed another experiment. First I dropped mtd_check_of_node() to
>> bring kernel back to the stable state.
>>
>> Then I started adding useless code to the mtdchar_unlocked_ioctl(). I
>> ended up adding just enough to make sure all post-mtd symbols in
>> System.map got the same offset as in case of backporting
>> mtd_check_of_node().
>>
>> I started experiencing lockups/hangs again.
>>
>> I repeated the same test with adding dumb code to the brcm_nvram_probe()
>> and verifying symbols offsets following brcm_nvram_probe one.
>>
>> I believe this confirms that this problem is about offset or alignment
>> of some specific symbol(s). The remaining question is what symbols and
>> how to fix or workaround that.
>
> In the config.gz file you attached in your first email, both CONFIG_MTD_* and CONFIG_NVMEM_* so it is not like we are reaching into module space for code and/or data and need veneers or anything, it is part of the kernel image so we can assert the maximum distance between instructions etc.
>
> Now is it just that specific mutex that is an issue, or do other mutexes through the system do cause problems as well?
If you mean mtd mutex, I'm quite sure it's not the one to blame. It just
happened modified function was using a mutex. Could be any other.
> Do we suspect the toolchain to be possibly problematic?
Maybe, I really don't know much such low level stuff.
>>
>> Following dump change brings back lockups/hangs:
>>
>> diff --git a/drivers/mtd/mtdchar.c b/drivers/mtd/mtdchar.c
>> index ee437af41..0a24dec55 100644
>> --- a/drivers/mtd/mtdchar.c
>> +++ b/drivers/mtd/mtdchar.c
>> @@ -1028,6 +1028,22 @@ static long mtdchar_unlocked_ioctl(struct file *file, u_int cmd, u_long arg)
>> {
>> int ret;
>>
>> + if (!file)
>> + pr_info("Missing\n");
>> + WARN_ON(!file);
>> + WARN_ON(cmd == 1234);
>> + WARN_ON(cmd == 5678);
>> + WARN_ON(cmd == 1234);
>> + WARN_ON(cmd == 5678);
>> + WARN_ON(cmd == 1234);
>> + WARN_ON(cmd == 5678);
>> + WARN_ON(cmd == 1234);
>> + WARN_ON(cmd == 5678);
>> + WARN_ON(cmd == 1234);
>> + WARN_ON(cmd == 5678);
>> + WARN_ON(cmd == 1234);
>> + WARN_ON(cmd == 5678);
>> +
>> mutex_lock(&mtd_mutex);
>> ret = mtdchar_ioctl(file, cmd, arg);
>> mutex_unlock(&mtd_mutex);
>>
>
Powered by blists - more mailing lists