[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <88a3af9d-7aa8-5c60-2625-6b529bd8d93c@cogentembedded.com>
Date: Thu, 24 May 2018 20:24:08 +0300
From: Sergei Shtylyov <sergei.shtylyov@...entembedded.com>
To: Vladimir Zapolskiy <vladimir_zapolskiy@...tor.com>,
"David S. Miller" <davem@...emloft.net>
Cc: netdev@...r.kernel.org, linux-renesas-soc@...r.kernel.org
Subject: Re: [PATCH 0/6] ravb/sh_eth: fix sleep in atomic by reusing shared
ethtool handlers
On 05/24/2018 07:40 PM, Sergei Shtylyov wrote:
>> For ages trivial changes to RAVB and SuperH ethernet links by means of
>> standard 'ethtool' trigger a 'sleeping function called from invalid
>> context' bug, to visualize it on r8a7795 ULCB:
>>
>> % ethtool -r eth0
>> BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
>> in_atomic(): 1, irqs_disabled(): 128, pid: 554, name: ethtool
>> INFO: lockdep is turned off.
>> irq event stamp: 0
>> hardirqs last enabled at (0): [<0000000000000000>] (null)
>> hardirqs last disabled at (0): [<ffff0000080e1d3c>] copy_process.isra.7.part.8+0x2cc/0x1918
>> softirqs last enabled at (0): [<ffff0000080e1d3c>] copy_process.isra.7.part.8+0x2cc/0x1918
>> softirqs last disabled at (0): [<0000000000000000>] (null)
>> CPU: 5 PID: 554 Comm: ethtool Not tainted 4.17.0-rc4-arm64-renesas+ #33
>> Hardware name: Renesas H3ULCB board based on r8a7795 ES2.0+ (DT)
>> Call trace:
>> dump_backtrace+0x0/0x198
>> show_stack+0x24/0x30
>> dump_stack+0xb8/0xf4
>> ___might_sleep+0x1c8/0x1f8
>> __might_sleep+0x58/0x90
>> __mutex_lock+0x50/0x890
>> mutex_lock_nested+0x3c/0x50
>> phy_start_aneg_priv+0x38/0x180
>> phy_start_aneg+0x24/0x30
>> ravb_nway_reset+0x3c/0x68
>> dev_ethtool+0x3dc/0x2338
>> dev_ioctl+0x19c/0x490
>> sock_do_ioctl+0xe0/0x238
>> sock_ioctl+0x254/0x460
>> do_vfs_ioctl+0xb0/0x918
>> ksys_ioctl+0x50/0x80
>> sys_ioctl+0x34/0x48
>> __sys_trace_return+0x0/0x4
>>
>> The root cause is that an attempt to modify ECMR and GECMR registers
>> only when RX/TX function is disabled was too overcomplicated in its
>> original implementation, also processing of an optional Link Change
>> interrupt added even more complexity, as a result the implementation
>> was error prone.
>>
>> The new locking scheme is confirmed to be correct by dumping driver
>> specific and generic PHY framework function calls with aid of ftrace
>> while running more or less advanced tests.
>>
>> Please note that sh_eth patches from the series were built-tested only.
>>
>> On purpose I do not add Fixes tags, the reused PHY handlers were added
>> way later than the fixed problems were firstly found in the drivers.
>
> I think you went one step too far with these fixes. On the first glance,
> the real fixes are to remove grabbing/releasing the spinlock for the duration
> of the phylib calls. Am I right? If so, making use of the new phylib APIs
> would be a further enhancement, it's not needed for fixing the splats per se...
Note that I hadn't looked at the patches #3/#6 at the time of writing this;
those seem to be more complicated than the rest.
MBR, Sergei
Powered by blists - more mailing lists