lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <671c20540904021228l2f719dd7y1dd9724d68e5c29b@mail.gmail.com>
Date:	Thu, 2 Apr 2009 15:28:06 -0400
From:	Tom Burns <tom.i.burns@...il.com>
To:	linux-kernel@...r.kernel.org
Subject: BUG: sleeping while atomic from ipmr_destroy_unres

Hi All,

  I apologize for talking about a kernel bug against 2.6.20, but it's
in a production environment, I don't have the resources to replace the
kernel and reproduce the issue at this time, and from my code
comparison this issue appears to still exist in 2.6.29 (might_sleep()
still results in schedule() call if CONFIG_PREEMPT_VOLUNTARY=y,
__do_softirq still calls __local_bh_disable(), which still puts us in
atomic for the duration of the __do_softirq - which makes sense, and
the ipmr code is still the same).

  I'm debugging a kernel BUG reported in a production environment
running 2.6.20 SMP (from Redhat RPM, full version
2.6.20-1.2320.fc5smp) and have run into 2 related BUGs.  When we only
see the first it's non-fatal, but the second BUG (below) is always
paired with a panic and resultant crash.

Here is the first BUG that makes it to the log:

 BUG: sleeping function called from invalid context at kernel/mutex.c:86
 in_atomic():1, irqs_disabled():0
  [phys_startup_32+-1084201671/-1073741824] ipmr_expire_process+0x0/0x8e
  [phys_startup_32+-1084105458/-1073741824] mutex_lock+0x15/0x29
  [phys_startup_32+-1084464563/-1073741824] rtnetlink_rcv+0x17/0x3d
  [phys_startup_32+-1084407949/-1073741824] netlink_data_ready+0x12/0x52
  [phys_startup_32+-1084412015/-1073741824] netlink_sendskb+0x1c/0x33
  [phys_startup_32+-1084463665/-1073741824] rtnl_unicast+0x18/0x23
  [phys_startup_32+-1084201716/-1073741824] ipmr_destroy_unres+0xc7/0xf4
  [phys_startup_32+-1086087209/-1073741824] hrtimer_run_queues+0x127/0x141
  [phys_startup_32+-1085998565/-1073741824] enable_irq+0x92/0xaf
  [phys_startup_32+-1084161925/-1073741824] xfrm_timer_handler+0x18d/0x21a
  [phys_startup_32+-1084201576/-1073741824] ipmr_expire_process+0x5f/0x8e
  [phys_startup_32+-1086134877/-1073741824] run_timer_softirq+0x101/0x164
  [phys_startup_32+-1086148568/-1073741824] __do_softirq+0x5d/0xba
  [phys_startup_32+-1086299719/-1073741824] do_softirq+0x59/0xb1
  [phys_startup_32+-1086179650/-1073741824] scheduler_tick+0x7c/0xdc
  [phys_startup_32+-1085997014/-1073741824] handle_edge_irq+0x0/0x10a
  [phys_startup_32+-1086299433/-1073741824] do_IRQ+0xc6/0xdb
  [phys_startup_32+-1086306041/-1073741824] common_interrupt+0x23/0x28
  [phys_startup_32+-1084161997/-1073741824] xfrm_timer_handler+0x145/0x21a

It appears that the top ipmr_expire_process call is some sort of
reporting mistake in dump_stack(), I don't see how it's being executed
by mutex_lock().

I've determined that the problem is caused by a combination of config
flags and code path.

Our kernel config has:
CONFIG_PREEMPT_VOLUNTARY = y
CONFIG_PREEMPT_BKL = y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_SPINLOCK_SLEEP=y
CONFIG_PREEMPT is not set and neither is CONFIG_PREEMPT_NONE

With these flags, in_atomic() is always set to '1' during the entire
time doing the softirq.

The code path which causes the BUG is in net/ipv4/ipmr.c in function
ipmr_destroy_unres:

 307                if (skb->nh.iph->version == 0) {
 308                        struct nlmsghdr *nlh = (struct nlmsghdr
*)skb_pull(skb, sizeof(struct iphdr));
 309                        nlh->nlmsg_type = NLMSG_ERROR;
 310                        nlh->nlmsg_len =
NLMSG_LENGTH(sizeof(struct nlmsgerr));
 311                        skb_trim(skb, nlh->nlmsg_len);
 312                        e = NLMSG_DATA(nlh);
 313                        e->error = -ETIMEDOUT;
 314                        memset(&e->msg, 0, sizeof(e->msg));
 315
 316                        rtnl_unicast(skb, NETLINK_CB(skb).pid);
 317                } else
 318                        kfree_skb(skb);

So the BUG only occurs if we try to unicast out a netlink error packet
in response to the inability to resolve a netlink multicast packet
(iph->version == 0). When this happens we eventually call mutex_lock,
who calls might_sleep(), which with our kernel config results in
reporting the BUG in __might_sleep(), and then attempting to
schedule() (with CONFIG_PREEMPT_VOLUNTARY = y, might_sleep() resolves
to "__might_sleep(); schedule();" ).

I'm unsure if we're supposed to be in_atomic at this point because in
hrtimer_run_queue before running the timer function
(ipmr_expire_process) seems to try to re-enable kernel preemption
(__run_hrtimer calls spin_unlock calls _spin_unlock calls
preempt_enable(), which resolves to nothing since CONFIG_PREEMPT = n.

Unfortunately this otherwise harmless BUG seems to become fatal to the
system if there is a process waiting to be scheduled when mutex_lock
calls might_sleep().  When that's the case, we see the following BUG
immediately after the first one:

 BUG: scheduling while atomic: Parser/0x00000100/2465
  [phys_startup_32+-1084110266/-1073741824] __sched_text_start+0x56/0xa21
  [phys_startup_32+-1086166488/-1073741824] release_console_sem+0x17f/0x1be
  [<e0510524>] __nf_conntrack_confirm+0x2a6/0x2d9 [nf_conntrack]
  [phys_startup_32+-1086299433/-1073741824] do_IRQ+0xc6/0xdb
  [phys_startup_32+-1084105164/-1073741824] __mutex_lock_slowpath+0x45/0x77
  [phys_startup_32+-1084105441/-1073741824] mutex_lock+0x26/0x29
  [phys_startup_32+-1084464563/-1073741824] rtnetlink_rcv+0x17/0x3d
  [phys_startup_32+-1084407949/-1073741824] netlink_data_ready+0x12/0x52
  [phys_startup_32+-1084412015/-1073741824] netlink_sendskb+0x1c/0x33
  [phys_startup_32+-1084409373/-1073741824] netlink_ack+0x178/0x193
  [phys_startup_32+-1084409070/-1073741824] netlink_run_queue+0x7b/0xca
  [phys_startup_32+-1084464481/-1073741824] rtnetlink_rcv_msg+0x0/0x1e4
  [phys_startup_32+-1086302708/-1073741824] dump_stack+0x12/0x14
  [phys_startup_32+-1084201671/-1073741824] ipmr_expire_process+0x0/0x8e
  [phys_startup_32+-1084464549/-1073741824] rtnetlink_rcv+0x25/0x3d
  [phys_startup_32+-1084407949/-1073741824] netlink_data_ready+0x12/0x52
  [phys_startup_32+-1084412015/-1073741824] netlink_sendskb+0x1c/0x33
  [phys_startup_32+-1084463665/-1073741824] rtnl_unicast+0x18/0x23
  [phys_startup_32+-1084201716/-1073741824] ipmr_destroy_unres+0xc7/0xf4
  [phys_startup_32+-1086087209/-1073741824] hrtimer_run_queues+0x127/0x141
  [phys_startup_32+-1085998565/-1073741824] enable_irq+0x92/0xaf
  [phys_startup_32+-1084161925/-1073741824] xfrm_timer_handler+0x18d/0x21a
  [phys_startup_32+-1084201576/-1073741824] ipmr_expire_process+0x5f/0x8e
  [phys_startup_32+-1086134877/-1073741824] run_timer_softirq+0x101/0x164
  [phys_startup_32+-1086148568/-1073741824] __do_softirq+0x5d/0xba
  [phys_startup_32+-1086299719/-1073741824] do_softirq+0x59/0xb1
  [phys_startup_32+-1086179650/-1073741824] scheduler_tick+0x7c/0xdc
  [phys_startup_32+-1085997014/-1073741824] handle_edge_irq+0x0/0x10a
  [phys_startup_32+-1086299433/-1073741824] do_IRQ+0xc6/0xdb
  [phys_startup_32+-1086306041/-1073741824] common_interrupt+0x23/0x28
  [phys_startup_32+-1084161997/-1073741824] xfrm_timer_handler+0x145/0x21a
  =======================

At this point the computer has effectively crashed.

I'm a complete newcomer to netlink packets, so while I'm going to try
to send some unresolvable netlink packets with hope to make this BUG
easily reproduce able, if anyone has any other ideas I'd love to hear
them.

Thank you,
Tom Burns
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ