lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Sat, 27 Jan 2018 12:09:38 +0800 From: Jia-Ju Bai <baijiaju1990@...il.com> To: Al Viro <viro@...IV.linux.org.uk>, David Miller <davem@...emloft.net> Cc: 3chas3@...il.com, linux-atm-general@...ts.sourceforge.net, netdev@...r.kernel.org, linux-kernel@...r.kernel.org Subject: Re: [PATCH] atm: firestream: Replace GFP_ATOMIC with GFP_KERNEL in fs_send On 2018/1/27 1:08, Al Viro wrote: > On Fri, Jan 26, 2018 at 11:07:39AM -0500, David Miller wrote: >>>> This is found by a static analysis tool named DCNS written by myself. >>> The trouble is, places like >>> net/atm/raw.c:65: vcc->send = atm_send_aal0; >>> net/atm/raw.c:74: vcc->send = vcc->dev->ops->send; >>> net/atm/raw.c:83: vcc->send = vcc->dev->ops->send; >>> mean extra call chains. It's *not* just vcc_sendmsg(), and e.g. >>> ret = ATM_SKB(skb)->vcc->send(ATM_SKB(skb)->vcc, skb) >>> ? DROP_PACKET : 1; >>> bh_unlock_sock(sk_atm(vcc)); >>> in pppoatm_send() definitely is called under a spinlock. >>> >>> Looking through the driver (in advanced bitrot, as usual for drivers/atm), >>> I'd say that submit_queue() is fucked in head in the "queue full" case. >>> And judging by the history, had been thus since the original merge... >> Jia-Ju, I'm probably not going to apply any of your GFP_KERNEL >> conversions. >> >> Al's analysis above is similar to how things looked for other patches >> you submited of this nature. >> >> So because of the lack of care and serious auditing you put into these >> conversions, I really have no choice but to drop them from my queue >> because on the whole they are adding bugs rather than improving the >> code. > FWIW, the tool *does* promise to be useful Thanks, I am happy to hear that :) > - as far as I understand it > looks for places where GFP_ATOMIC allocation goes with blocking operations > near every callchain leading there. And that indicates something fishy > going on - either pointless GFP_ATOMIC (in benign case), or something > much nastier: a callchain that would require GFP_ATOMIC. In that case > whatever blocking operation found along the way is a bug. In fact, my tool first collects all places of GFP_ATOMIC and mdelay in the whole kernel code. Then it starts analysis from the entry of each interrupt handler and spin-lock function call in the whole kernel code, to mark the places of GFP_ATOMIC and mdelay that are called in atomic context. The remaining unmarked places of GFP_ATOMIC and mdelay are reported and can be replaced with GFP_KERNEL and mdelay (or usleep_range). Though my tool has handled some common situations of function pointers, But it does not well handle function pointer passing in this code (vcc->send = vcc->dev->ops->send), so the tool needs to be improved. I am sorry for my incorrect report... > This time it has, AFAICS, caught a genuine bug in drivers/atm/firestream.c - > static void submit_qentry (struct fs_dev *dev, struct queue *q, struct FS_QENTRY *qe) > { > u32 wp; > struct FS_QENTRY *cqe; > > /* XXX Sanity check: the write pointer can be checked to be > still the same as the value passed as qe... -- REW */ > /* udelay (5); */ > while ((wp = read_fs (dev, Q_WP (q->offset))) & Q_FULL) { > fs_dprintk (FS_DEBUG_TXQ, "Found queue at %x full. Waiting.\n", > q->offset); > schedule (); > } > ... > static void submit_queue (struct fs_dev *dev, struct queue *q, > u32 cmd, u32 p1, u32 p2, u32 p3) > { > struct FS_QENTRY *qe; > > qe = get_qentry (dev, q); > qe->cmd = cmd; > qe->p0 = p1; > qe->p1 = p2; > qe->p2 = p3; > submit_qentry (dev, q, qe); > ... > static int fs_send (struct atm_vcc *atm_vcc, struct sk_buff *skb) > { > ... > td = kmalloc (sizeof (struct FS_BPENTRY), GFP_ATOMIC); > ... > submit_queue (dev, &dev->hp_txq, > QE_TRANSMIT_DE | vcc->channo, > virt_to_bus (td), 0, > virt_to_bus (td)); > ... > > Either all callchains leading to fs_send() are in non-atomic contexts > (in which case that GFP_ATOMIC would be pointless) or there's one > where we cannot block. Any such would be a potential deadlock. > > And the latter appears to be the case - fs_send() is atmdev_ops ->send(), > which can end up in atm_vcc ->send(), which can be called from under > ->sk_lock.slock. Yes, I think schedule() can cause a sleep-in-atomic-context bug. > What would be really useful: > * list of "here's a list of locations where we do something > blocking; each callchain to this GFP_ATOMIC allocation passes either > upstream of one of those without leaving atomic context in between > or downstream without entering one". > * after that - backtracking these callchains further, watching > for ones in atomic contexts. Can be done manually, but if that tool > can assist in doing so, all the better. If we do find one, we have > found a deadlock - just take the blocking operation reported for > that callchain and that's it. If it's not an obvious false positive > (e.g. > if (!foo) > spin_lock(&splat); > ..... > if (foo) > schedule(); > ), it's worth reporting to maintainers of the code in question. > * if all callchains reach obviously non-atomic contexts > (syscall entry, for example, or a kernel thread payload, or > a function documented to require a mutex held by caller, etc.) - > then a switch to GFP_KERNEL might be appropriate. With analysis > of callchains posted as you are posting that. > * either way, having the tool print the callchains out > would be a good idea - for examining them, for writing reports, > etc. Thanks for your very helpful advice :) I will follow it in my patches. Thanks, Jia-Ju Bai
Powered by blists - more mailing lists