[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c2d4a0d7-cd32-174d-5f93-04c618eadbf3@gmail.com>
Date: Sat, 27 Jan 2018 12:09:38 +0800
From: Jia-Ju Bai <baijiaju1990@...il.com>
To: Al Viro <viro@...IV.linux.org.uk>,
David Miller <davem@...emloft.net>
Cc: 3chas3@...il.com, linux-atm-general@...ts.sourceforge.net,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] atm: firestream: Replace GFP_ATOMIC with GFP_KERNEL in
fs_send
On 2018/1/27 1:08, Al Viro wrote:
> On Fri, Jan 26, 2018 at 11:07:39AM -0500, David Miller wrote:
>>>> This is found by a static analysis tool named DCNS written by myself.
>>> The trouble is, places like
>>> net/atm/raw.c:65: vcc->send = atm_send_aal0;
>>> net/atm/raw.c:74: vcc->send = vcc->dev->ops->send;
>>> net/atm/raw.c:83: vcc->send = vcc->dev->ops->send;
>>> mean extra call chains. It's *not* just vcc_sendmsg(), and e.g.
>>> ret = ATM_SKB(skb)->vcc->send(ATM_SKB(skb)->vcc, skb)
>>> ? DROP_PACKET : 1;
>>> bh_unlock_sock(sk_atm(vcc));
>>> in pppoatm_send() definitely is called under a spinlock.
>>>
>>> Looking through the driver (in advanced bitrot, as usual for drivers/atm),
>>> I'd say that submit_queue() is fucked in head in the "queue full" case.
>>> And judging by the history, had been thus since the original merge...
>> Jia-Ju, I'm probably not going to apply any of your GFP_KERNEL
>> conversions.
>>
>> Al's analysis above is similar to how things looked for other patches
>> you submited of this nature.
>>
>> So because of the lack of care and serious auditing you put into these
>> conversions, I really have no choice but to drop them from my queue
>> because on the whole they are adding bugs rather than improving the
>> code.
> FWIW, the tool *does* promise to be useful
Thanks, I am happy to hear that :)
> - as far as I understand it
> looks for places where GFP_ATOMIC allocation goes with blocking operations
> near every callchain leading there. And that indicates something fishy
> going on - either pointless GFP_ATOMIC (in benign case), or something
> much nastier: a callchain that would require GFP_ATOMIC. In that case
> whatever blocking operation found along the way is a bug.
In fact, my tool first collects all places of GFP_ATOMIC and mdelay in
the whole kernel code.
Then it starts analysis from the entry of each interrupt handler and
spin-lock function call in the whole kernel code,
to mark the places of GFP_ATOMIC and mdelay that are called in atomic
context.
The remaining unmarked places of GFP_ATOMIC and mdelay are reported and
can be replaced with GFP_KERNEL and mdelay (or usleep_range).
Though my tool has handled some common situations of function pointers,
But it does not well handle function pointer passing in this code
(vcc->send = vcc->dev->ops->send), so the tool needs to be improved.
I am sorry for my incorrect report...
> This time it has, AFAICS, caught a genuine bug in drivers/atm/firestream.c -
> static void submit_qentry (struct fs_dev *dev, struct queue *q, struct FS_QENTRY *qe)
> {
> u32 wp;
> struct FS_QENTRY *cqe;
>
> /* XXX Sanity check: the write pointer can be checked to be
> still the same as the value passed as qe... -- REW */
> /* udelay (5); */
> while ((wp = read_fs (dev, Q_WP (q->offset))) & Q_FULL) {
> fs_dprintk (FS_DEBUG_TXQ, "Found queue at %x full. Waiting.\n",
> q->offset);
> schedule ();
> }
> ...
> static void submit_queue (struct fs_dev *dev, struct queue *q,
> u32 cmd, u32 p1, u32 p2, u32 p3)
> {
> struct FS_QENTRY *qe;
>
> qe = get_qentry (dev, q);
> qe->cmd = cmd;
> qe->p0 = p1;
> qe->p1 = p2;
> qe->p2 = p3;
> submit_qentry (dev, q, qe);
> ...
> static int fs_send (struct atm_vcc *atm_vcc, struct sk_buff *skb)
> {
> ...
> td = kmalloc (sizeof (struct FS_BPENTRY), GFP_ATOMIC);
> ...
> submit_queue (dev, &dev->hp_txq,
> QE_TRANSMIT_DE | vcc->channo,
> virt_to_bus (td), 0,
> virt_to_bus (td));
> ...
>
> Either all callchains leading to fs_send() are in non-atomic contexts
> (in which case that GFP_ATOMIC would be pointless) or there's one
> where we cannot block. Any such would be a potential deadlock.
>
> And the latter appears to be the case - fs_send() is atmdev_ops ->send(),
> which can end up in atm_vcc ->send(), which can be called from under
> ->sk_lock.slock.
Yes, I think schedule() can cause a sleep-in-atomic-context bug.
> What would be really useful:
> * list of "here's a list of locations where we do something
> blocking; each callchain to this GFP_ATOMIC allocation passes either
> upstream of one of those without leaving atomic context in between
> or downstream without entering one".
> * after that - backtracking these callchains further, watching
> for ones in atomic contexts. Can be done manually, but if that tool
> can assist in doing so, all the better. If we do find one, we have
> found a deadlock - just take the blocking operation reported for
> that callchain and that's it. If it's not an obvious false positive
> (e.g.
> if (!foo)
> spin_lock(&splat);
> .....
> if (foo)
> schedule();
> ), it's worth reporting to maintainers of the code in question.
> * if all callchains reach obviously non-atomic contexts
> (syscall entry, for example, or a kernel thread payload, or
> a function documented to require a mutex held by caller, etc.) -
> then a switch to GFP_KERNEL might be appropriate. With analysis
> of callchains posted as you are posting that.
> * either way, having the tool print the callchains out
> would be a good idea - for examining them, for writing reports,
> etc.
Thanks for your very helpful advice :)
I will follow it in my patches.
Thanks,
Jia-Ju Bai
Powered by blists - more mailing lists