[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <51C9A6A9.3000202@mentor.com>
Date: Tue, 25 Jun 2013 15:18:17 +0100
From: Dean Jenkins <Dean_Jenkins@...tor.com>
To: Alexander Holler <holler@...oftware.de>
CC: Peter Hurley <peter@...leysoftware.com>,
linux-kernel@...r.kernel.org, Jiri Slaby <jslaby@...e.cz>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Marcel Holtmann <marcel@...tmann.org>,
Gustavo Padovan <gustavo@...ovan.org>,
Johan Hedberg <johan.hedberg@...il.com>,
linux-bluetooth@...r.kernel.org
Subject: Re: BUG: tty: memory corruption through tty_release/tty_ldisc_release
On 17/05/13 05:43, Alexander Holler wrote:
> Am 16.05.2013 23:53, schrieb Peter Hurley:
>
>> And the tty layer can't really _prevent_ the tty driver from mishandling
>> the port kref.
>>
>>> Especially since it seemed to have been worked before tty_ports got
>>> introduced.
>> Well, at the time tty_port was introduced to RFCOMM, there was nothing
>> to tear-down in tty_port. Now that tty_port owns the flip buffers and
>> must do proper tear-down, the problem has surfaced.
>>
>>> But I can't add much more to this discussion, as I'm rather a novice
>>> in regard to the tty subsystem. I even don't know much about the task
>>> sharing between tty, tty_port and tty_ldisc, except the stuff I found
>>> out because I got hit by that bug and therefor have read some of the
>>> sources.
>> Ok. Could you paste the BUG() and steps to reproduce?
>> I have a plan to fix it but I'd like to review what you have
>> first.
> As described before, it ends up with memory corruption because freed
> memory is used, so if a BUG() happens, it doesn't help much. E.g. with
> kernel 3.9.2 I never have seen a bug, just a rebooting machine
> (sometimes minutes after the real bug happened).
>
> To reproduce it, call rfcomm connect /dev/rfcommN and after the
> connection to the remote device happened, power down the remote device
> and wait 20s (the timeout until a connection drop will be discovered).
I expect this behaviour depends on the remote device. Does the device
close the RFCOMM session cleanly ? Perhaps an out of range test would be
better so that the connection drops.
> Furthermore I would suggest to use commit ecbbfd4, because of the above
> mentioned problem. With that you might have luck and see a BUG like this:
>
> May 16 00:06:18 laptopahvpn kernel: [ 51.238969] ------------[ cut
> here ]------------
> May 16 00:06:18 laptopahvpn kernel: [ 51.241754] kernel BUG at
> kernel/workqueue.c:609!
> May 16 00:06:18 laptopahvpn kernel: [ 5.603591] error attempted to
> write to tty [0x (null)] = NULL
> May 16 00:06:18 laptopahvpn kernel: [ 51.244131] invalid opcode: 0000
> [#1] SMP
> May 16 00:06:18 laptopahvpn kernel: [ 51.249491] Modules linked in:
> sch_sfq cdc_acm msr nfs lockd sunrpc rfcomm bnep iptable_nat nf_na
> t_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_recent xt_conntrack
> nf_conntrack iptable_filter xt_LOG xt_limit ip6table_filter ip6_ta
> bles ipv6 btusb bluetooth snd_hda_codec_hdmi coretemp kvm_intel
> snd_hda_codec_realtek arc4 kvm crc32c_intel iwldvm ghash_clmulni_intel
> mac80211 aesni_intel aes_x86_64 ablk_helper cryptd samsung_laptop xts
> lrw gf128mul iwlwifi microcode cfg80211 xhci_hcd rfkill snd_hda_intel
> snd_hda_codec snd_hwdep snd_pcm ehci_hcd snd_page_alloc snd_timer snd
> usbcore soundcore lpc_ich usb_common mfd_core joydev
> May 16 00:06:18 laptopahvpn kernel: [ 51.261073] CPU 1
> May 16 00:06:18 laptopahvpn kernel: [ 51.261106] Pid: 2449, comm:
> rfcomm Not tainted 3.7.0-rc2-00023-gecbbfd4-dirty #208 SAMSUNG
> ELECTRONICS CO., LTD. 900X3C/900X3D/900X4C/900X4D/SAMSUNG_NP1234567890
> May 16 00:06:18 laptopahvpn kernel: [ 51.266958] RIP:
> 0010:[<ffffffff810492fe>] [<ffffffff810492fe>] get_work_gcwq+0x5e/0x60
> May 16 00:06:18 laptopahvpn kernel: [ 51.270064] RSP:
> 0018:ffff88020f253da0 EFLAGS: 00010016
> May 16 00:06:18 laptopahvpn kernel: [ 51.273155] RAX: ffffffff81931380
> RBX: ffff880214fee400 RCX: 0000000000000024
> May 16 00:06:18 laptopahvpn kernel: [ 51.276270] RDX: 007fffc4010a7f73
> RSI: 0000000000000000 RDI: ffff880214fee400
> May 16 00:06:18 laptopahvpn kernel: [ 51.279333] RBP: 0000000000000000
> R08: 000000000000000a R09: 000000000000181c
> May 16 00:06:18 laptopahvpn kernel: [ 51.282319] R10: 0000000000000000
> R11: 000000000000181b R12: 0000000000000000
> May 16 00:06:18 laptopahvpn kernel: [ 51.285286] R13: 0000000000000004
> R14: ffff880210863000 R15: 0000000000000000
> May 16 00:06:18 laptopahvpn kernel: [ 51.288265] FS:
> 00007f8bd6e94700(0000) GS:ffff88021f280000(0000) knlGS:0000000000000000
> May 16 00:06:18 laptopahvpn kernel: [ 51.291283] CS: 0010 DS: 0000
> ES: 0000 CR0: 0000000080050033
> May 16 00:06:18 laptopahvpn kernel: [ 51.294328] CR2: 00007fc249111e60
> CR3: 000000020f1d3000 CR4: 00000000001407e0
> May 16 00:06:18 laptopahvpn kernel: [ 51.297415] DR0: 0000000000000000
> DR1: 0000000000000000 DR2: 0000000000000000
> May 16 00:06:18 laptopahvpn kernel: [ 51.300506] DR3: 0000000000000000
> DR6: 00000000ffff0ff0 DR7: 0000000000000400
> May 16 00:06:18 laptopahvpn kernel: [ 51.303555] Process rfcomm (pid:
> 2449, threadinfo ffff88020f252000, task ffff880210bbee80)
> May 16 00:06:18 laptopahvpn kernel: [ 51.306638] Stack:
> May 16 00:06:18 laptopahvpn kernel: [ 51.309704] ffffffff8104a471
> 0000000000014040 0000000000000296 ffff880210863000
> May 16 00:06:18 laptopahvpn kernel: [ 51.312850] 0000000000000000
> 0000000000000001 ffffffff81258188 0000000000000000
> May 16 00:06:18 laptopahvpn kernel: [ 51.315998] ffffffff812591b4
> 0000000000013fc0 ffff880215278700 0000000000000000
> May 16 00:06:18 laptopahvpn kernel: [ 51.319139] Call Trace:
> May 16 00:06:18 laptopahvpn kernel: [ 51.322236] [<ffffffff8104a471>]
> ? __cancel_work_timer+0x31/0xa0
> May 16 00:06:18 laptopahvpn kernel: [ 51.325398] [<ffffffff81258188>]
> ? tty_ldisc_halt+0x18/0x20
> May 16 00:06:18 laptopahvpn kernel: [ 51.328551] [<ffffffff812591b4>]
> ? tty_ldisc_release+0x34/0x110
> May 16 00:06:18 laptopahvpn kernel: [ 51.331719] [<ffffffff81251dbc>]
> ? tty_release+0x4ac/0x520
> May 16 00:06:18 laptopahvpn kernel: [ 51.334873] [<ffffffff810f2161>]
> ? __fput+0xe1/0x230
> May 16 00:06:18 laptopahvpn kernel: [ 51.338030] [<ffffffff8104d75f>]
> ? task_work_run+0x8f/0xd0
> May 16 00:06:18 laptopahvpn kernel: [ 51.341208] [<ffffffff81002919>]
> ? do_notify_resume+0x69/0xc0
> May 16 00:06:18 laptopahvpn kernel: [ 51.344383] [<ffffffff8104d649>]
> ? task_work_add+0x49/0x60
> May 16 00:06:18 laptopahvpn kernel: [ 51.347578] [<ffffffff81422e1a>]
> ? int_signal+0x12/0x17
> May 16 00:06:18 laptopahvpn kernel: [ 51.350777] Code: d5 a0 d3 85 81
> c3 0f 1f 80 00 00 00 00 31 c0 66 0f 1f 44 00 00 f3 c3 66 0f 1f 44 00 00
> 30 c0 48 8b 00 48 8b 00 c3 83 fa 04 74 ea <0f> 0b e8 9b ff ff ff ba 05
> 00 00 00 48 85 c0 74 03 8b 50 04 89
> May 16 00:06:18 laptopahvpn kernel: [ 51.358380] RIP
> [<ffffffff810492fe>] get_work_gcwq+0x5e/0x60
> May 16 00:06:18 laptopahvpn kernel: [ 51.362070] RSP <ffff88020f253da0>
> May 16 00:06:18 laptopahvpn kernel: [ 51.365766] ---[ end trace
> f2ccc5bea5182396 ]---
>
>
> But only fixing the problem with rewriting rfcomm/tty.c but without any
> explanations about the expected lifetime of tty_port doesn't help much.
> As proved the switch to tty_port has some pitfalls and even people with
> a deeper insight into the new tty layer entered them.
>
> E.g. the fact that tty_port is self-destructing suggests the conclusion
> that the problem isn't in rfcomm, but in tty_release() (that's why I
> placed the wrong workaround there).
>
> So without at least some small clarification about the expected lifetime
> of tty_port, it's likely someone else will enter the same pit (which
> unfortunately isn't seen that easy and a BUG() doesn't have to happen).
> In include/linux/tty.h is just
>
> "The tty port has a different lifetime to the tty so must be kept apart."
>
> As it isn't specified that tty_port has to live as long as tty, I would
> (again) conclude it could have a shorter livetime than tty. Maybe
> someone can clarify that statement there.
>
> I assume I would be able to fix the problem in rfcomm myself, if someone
> would offer me an explanation about the expected lifetime of tty_port
> and some confirmation, that the call of tty_ldisc_release() in
> tty_release() isn't the real problem.
>
> E.g. why isn't that call to tty_ldisc_release() in tty_port_destructor()
> or in tty_port_destroy()? If it would be there the problem (and one
> pitfall) would be gone too. struct tty_port seems to have a pointer to
> tty (even two, tty and itty), so calling tty_ldisc_release() in
> tty_port_destroy() looks possible.
>
I have experienced this crash on Mageia 3 kernel 3.8.13 on my laptop. I
was using SLIP over a RFCOMM connection between my laptop and an ARM
board. I was using ssh sessions on my laptop over Ethernet to control
Bluetooth on the ARM board. If my KDE session dies, then the ssh
sessions die which kills any slattach and rfcomm programs on my ARM target.
I had a failure that killed my KDE session on my external monitor, I did
"service dm restart" to bring up the KDE login on my monitor but I
closed the lid on my laptop which put everything to sleep. When I opened
the lid on the laptop, the kernel panicked. /var/log/syslog showed a
similar crash as reported in your E-mail. See below at end of this E-mail.
I suspect the dying KDE session disconnected SLIP and RFCOMM on both my
laptop and my ARM target board.
Is there a bugzilla bug report that is tracking this issue ? If I can
reproduce this issue then I would like to add more information to a bug
report.
Note that I added some RFCOMM session disconnection fixes to Linux 3.10
commits (8 off) 24fd642ccb24c8b5732d7d7b5e98277507860b2a to
fea7b02fbf73adb2e746f00ed279a782de7e74e4. This may or may not help this
crash.
For completeness my crash bracktrace looks like the following. I think
this kernel contains commit ecbbfd4.
Jun 25 12:52:32 localhost kernel: PM: Finishing wakeup.
Jun 25 12:52:32 localhost kernel: ------------[ cut here ]------------
Jun 25 12:52:32 localhost kernel: kernel BUG at kernel/workqueue.c:610!
Jun 25 12:52:32 localhost kernel: Restarting tasks ...
Jun 25 12:52:32 localhost kernel: invalid opcode: 0000 [#1] SMP
Jun 25 12:52:32 localhost kernel: Modules linked in: slip slhc lp tun
ipt_IFWLOG ipt_psd cls_basic cls_flow cls_fw cls_u32 sch_tbf sch_prio
sch_htb sch_hfsc sch_ingress sch_sfq bridge stp llc xt_CHECKSUM
ipt_rpfilter ip6t_rpfilter xt_statistic xt_CT xt_realm xt_LOG
xt_connlimit xt_addrtype ip_set_hash_ip xt_comment xt_recent xt_nat
ipt_ULOG ipt_REJECT ipt_MASQUERADE ipt_ECN ipt_CLUSTERIP ipt_ah xt_set
ip_set nf_nat_tftp xt_time nf_nat_snmp_basic xt_TCPMSS nf_conntrack_snmp
nf_nat_sip xt_sctp xt_policy nf_nat_pptp nf_nat_proto_gre ip6t_REJECT
nf_conntrack_ipv6 nf_nat_irc ip6table_raw nf_nat_h323 ip6table_mangle
nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane
nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_udplite
nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre
nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast
nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY
nf_defrag_ipv6 nf_tproxy_core xt_tcpmss xt_pkttype xt_physdev xt_owner
xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_mark xt_mac xt_limit
xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp
xt_conntrack xt_connmark xt_CLASSIFY xt_AUDIT xt_tcpudp xt_state
iptable_raw iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink ip6table_filter
ip6_tables lockd iptable_filter ip_tables x_tables vmnet(O) fuse
vsock(O) vmci(O) vmmon(O) af_packet rndis_host cdc_ether usbnet mii
rfcomm bnep ipv6 uvcvideo videobuf2_vmalloc videobuf2_memops
videobuf2_core videodev btusb rt2800usb rt2x00usb rt2800lib rt2x00lib
bluetooth crc_ccitt media pl2303 usbserial iTCO_wdt coretemp kvm_intel
iTCO_vendor_support snd_hda_codec_idt arc4 iwldvm snd_hda_intel mac80211
snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer snd kvm ppdev
parport_pc mei lpc_ich i2c_i801 soundcore parport dell_laptop thermal
battery dcdbas ghash_clmulni_intel ac yenta_socket dell_wmi
sparse_keymap joydev iwlwifi e1000e pcmcia_rsrc pcmcia_core cfg80211
rfkill cpufreq_ondemand cpufreq_cons
Jun 25 12:52:52 localhost kernel: ervative cpufreq_powersave
acpi_cpufreq mperf processor evdev nvram i8k sunrpc autofs4 hid_generic
usbhid hid dm_crypt mmc_block sdhci_pci xhci_hcd ehci_pci ehci_hcd
crc32c_intel usbcore aesni_intel xts sdhci aes_x86_64 lrw gf128mul
ablk_helper cryptd mmc_core sr_mod usb_common nouveau i915 mxm_wmi wmi
ttm video button i2c_algo_bit drm_kms_helper drm i2c_core dm_mod [last
unloaded: microcode]
Jun 25 12:52:32 localhost kernel: CPU 0 <4>[15002.946347] Pid: 1946,
comm: rfcomm Tainted: P O 3.8.13-desktop-1.mga3 #1 Dell Inc.
Latitude E6430/0H3MT5
Jun 25 12:52:32 localhost kernel: RIP: 0010:[<ffffffff81071ee1>]
[<ffffffff81071ee1>] get_work_gcwq+0x61/0x70
Jun 25 12:52:32 localhost kernel: RSP: 0018:ffff8801a4385d58 EFLAGS:
00010006
Jun 25 12:52:32 localhost kernel: RAX: ffffffff81c02a80 RBX:
ffff8801dba86600 RCX: ffffffffffffffc8
Jun 25 12:52:32 localhost kernel: RDX: 007fffc400edd422 RSI:
0000000000000000 RDI: ffff8801dba86600
Jun 25 12:52:32 localhost kernel: RBP: ffff8801a4385d58 R08:
0000000000000000 R09: ffff88022e1fafa0
Jun 25 12:52:32 localhost kernel: R10: 0000000000000040 R11:
0000000000000000 R12: 0000000000000000
Jun 25 12:52:32 localhost kernel: R13: 0000000000000000 R14:
0000000000000000 R15: ffff8801ad1a1000
Jun 25 12:52:32 localhost kernel: FS: 00007fa709d21700(0000)
GS:ffff88022e200000(0000) knlGS:0000000000000000
Jun 25 12:52:32 localhost kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Jun 25 12:52:32 localhost kernel: CR2: 00007fa08e462cb0 CR3:
00000001acbca000 CR4: 00000000001407f0
Jun 25 12:52:32 localhost kernel: DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
Jun 25 12:52:32 localhost kernel: DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
Jun 25 12:52:32 localhost kernel: done.
Jun 25 12:52:32 localhost kernel: Process rfcomm (pid: 1946, threadinfo
ffff8801a4384000, task ffff880223082da0)
Jun 25 12:52:32 localhost kernel: Stack:
Jun 25 12:52:32 localhost kernel: video LNXVIDEO:00: Restoring backlight
state
Jun 25 12:52:32 localhost kernel: ffff8801a4385d98
ffffffff81074a7b<6>[15002.946424] video LNXVIDEO:01: Restoring backlight
state
Jun 25 12:52:32 localhost kernel: ffff8801dba86724 0000000000000282
Jun 25 12:52:32 localhost kernel: ffff8801a41c1b00 ffff8801ad1a1000
0000000000000000 0000000000000000
Jun 25 12:52:32 localhost kernel: ffff8801a4385da8 ffffffff81074b20
ffff8801a4385db8 ffffffff81403cbd
Jun 25 12:52:32 localhost kernel: Call Trace:
Jun 25 12:52:32 localhost kernel: [<ffffffff81074a7b>]
__cancel_work_timer+0x3b/0xb0
Jun 25 12:52:32 localhost kernel: [<ffffffff81074b20>]
cancel_work_sync+0x10/0x20
Jun 25 12:52:32 localhost kernel: [<ffffffff81403cbd>]
tty_ldisc_halt+0x1d/0x30
Jun 25 12:52:32 localhost kernel: [<ffffffff81404b17>]
tty_ldisc_release+0x17/0x90
Jun 25 12:52:32 localhost kernel: [<ffffffff813fd270>]
tty_release+0x470/0x580
Jun 25 12:52:32 localhost kernel: [<ffffffff8101440e>] ?
do_signal+0xbe/0x8e0
Jun 25 12:52:32 localhost kernel: [<ffffffff8117938c>] __fput+0xec/0x240
Jun 25 12:52:32 localhost kernel: [<ffffffff811794ee>] ____fput+0xe/0x10
Jun 25 12:52:32 localhost kernel: [<ffffffff81076e07>]
task_work_run+0xa7/0xe0
Jun 25 12:52:32 localhost kernel: [<ffffffff81014ca1>]
do_notify_resume+0x71/0xb0
Jun 25 12:52:32 localhost kernel: [<ffffffff815e12aa>] int_signal+0x12/0x17
Jun 25 12:52:32 localhost kernel: Code: e2 00 00 48 03 04 d5 00 bc ab 81
5d c3 0f 1f 44 00 00 30 c0 48 8b 00 5d 48 8b 00 c3 66 0f 1f 44 00 00 31
c0 5d c3 83 fa 40 74 e0 <0f> 0b 66 66 66 66 2e 0f 1f 84 00 00 00 00 00
0f 1f 44 00 00 55
Jun 25 12:52:32 localhost kernel: RIP [<ffffffff81071ee1>]
get_work_gcwq+0x61/0x70
Jun 25 12:52:32 localhost kernel: RSP <ffff8801a4385d58>
Jun 25 12:52:32 localhost kernel: ---[ end trace f219a9b765a4acc9 ]---
Jun 25 12:52:52 localhost vmnetBridge: Removing interface eth0 index:2
Jun 25 12:52:52 localhost vmnetBridge: RTM_NEWLINK: name:eth0 index:2
flags:0x00011043
Jun 25 12:52:52 localhost vmnetBridge: Adding interface eth0 index:2
Thanks,
Regards,
Dean Jenkins
Mentor Graphics
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists