lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 16 Jun 2016 13:47:10 +0900
From:	Minchan Kim <minchan@...nel.org>
To:	Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
CC:	Andrew Morton <akpm@...ux-foundation.org>, <linux-mm@...ck.org>,
	<linux-kernel@...r.kernel.org>, Vlastimil Babka <vbabka@...e.cz>,
	<dri-devel@...ts.freedesktop.org>, Hugh Dickins <hughd@...gle.com>,
	John Einar Reitan <john.reitan@...s.arm.com>,
	Jonathan Corbet <corbet@....net>,
	Joonsoo Kim <iamjoonsoo.kim@....com>,
	Konstantin Khlebnikov <koct9i@...il.com>,
	Mel Gorman <mgorman@...e.de>,
	Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
	Rafael Aquini <aquini@...hat.com>,
	Rik van Riel <riel@...hat.com>,
	Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
	<virtualization@...ts.linux-foundation.org>,
	Gioh Kim <gi-oh.kim@...fitbricks.com>,
	Chan Gyun Jeong <chan.jeong@....com>,
	Sangseok Lee <sangseok.lee@....com>,
	Kyeongdon Kim <kyeongdon.kim@....com>,
	Chulmin Kim <cmlaika.kim@...sung.com>
Subject: Re: [PATCH v7 00/12] Support non-lru page migration

On Thu, Jun 16, 2016 at 01:23:43PM +0900, Sergey Senozhatsky wrote:
> On (06/16/16 11:58), Minchan Kim wrote:
> [..]
> > RAX: 2065676162726166 so rax is totally garbage, I think.
> > It means obj_to_head returns garbage because get_first_obj_offset is
> > utter crab because (page_idx / class->pages_per_zspage) was totally
> > wrong.
> > 
> > > 					^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > >     6408:       f0 0f ba 28 00          lock btsl $0x0,(%rax)
> >  
> > <snip>
> > 
> > > > Could you test with [zsmalloc: keep first object offset in struct page]
> > > > in mmotm?
> > > 
> > > sure, I can.  will it help, tho? we have a race condition here I think.
> > 
> > I guess root cause is caused by get_first_obj_offset.
> 
> sounds reasonable.
> 
> > Please test with it.
> 
> 
> this is what I'm getting with the [zsmalloc: keep first object offset in struct page]
> applied:  "count:0 mapcount:-127". which may be not related to zsmalloc at this point.
> 
> kernel: BUG: Bad page state in process khugepaged  pfn:101db8
> kernel: page:ffffea0004076e00 count:0 mapcount:-127 mapping:          (null) index:0x1

Hm, it seems double free.

It doen't happen if you disable zram? IOW, it seems to be related
zsmalloc migration?

How easy can you reprodcue it? Could you bisect it?

> kernel: flags: 0x8000000000000000()
> kernel: page dumped because: nonzero mapcount
> kernel: Modules linked in: lzo zram zsmalloc mousedev coretemp hwmon crc32c_intel snd_hda_codec_realtek i2c_i801 snd_hda_codec_generic r8169 mii snd_hda_intel snd_hda_codec snd_hda_core acpi_cpufreq snd_pcm snd_timer snd soundcore lpc_ich processor mfd_core sch_fq_codel sd_mod hid_generic usb
> kernel: CPU: 3 PID: 38 Comm: khugepaged Not tainted 4.7.0-rc3-next-20160615-dbg-00005-gfd11984-dirty #491
> kernel:  0000000000000000 ffff8801124c73f8 ffffffff814d69b0 ffffea0004076e00
> kernel:  ffffffff81e658a0 ffff8801124c7420 ffffffff811e9b63 0000000000000000
> kernel:  ffffea0004076e00 ffffffff81e658a0 ffff8801124c7440 ffffffff811e9ca9
> kernel: Call Trace:
> kernel:  [<ffffffff814d69b0>] dump_stack+0x68/0x92
> kernel:  [<ffffffff811e9b63>] bad_page+0x158/0x1a2
> kernel:  [<ffffffff811e9ca9>] free_pages_check_bad+0xfc/0x101
> kernel:  [<ffffffff811ee516>] free_hot_cold_page+0x135/0x5de
> kernel:  [<ffffffff811eea26>] __free_pages+0x67/0x72
> kernel:  [<ffffffff81227c63>] release_freepages+0x13a/0x191
> kernel:  [<ffffffff8122b3c2>] compact_zone+0x845/0x1155
> kernel:  [<ffffffff8122ab7d>] ? compaction_suitable+0x76/0x76
> kernel:  [<ffffffff8122bdb2>] compact_zone_order+0xe0/0x167
> kernel:  [<ffffffff8122bcd2>] ? compact_zone+0x1155/0x1155
> kernel:  [<ffffffff8122ce88>] try_to_compact_pages+0x2f1/0x648
> kernel:  [<ffffffff8122ce88>] ? try_to_compact_pages+0x2f1/0x648
> kernel:  [<ffffffff8122cb97>] ? compaction_zonelist_suitable+0x3a6/0x3a6
> kernel:  [<ffffffff811ef1ea>] ? get_page_from_freelist+0x2c0/0x133c
> kernel:  [<ffffffff811f0350>] __alloc_pages_direct_compact+0xea/0x30d
> kernel:  [<ffffffff811f0266>] ? get_page_from_freelist+0x133c/0x133c
> kernel:  [<ffffffff811ee3b2>] ? drain_all_pages+0x1d6/0x205
> kernel:  [<ffffffff811f21a8>] __alloc_pages_nodemask+0x143d/0x16b6
> kernel:  [<ffffffff8111f405>] ? debug_show_all_locks+0x226/0x226
> kernel:  [<ffffffff811f0d6b>] ? warn_alloc_failed+0x24c/0x24c
> kernel:  [<ffffffff81110ffc>] ? finish_wait+0x1a4/0x1b0
> kernel:  [<ffffffff81122faf>] ? lock_acquire+0xec/0x147
> kernel:  [<ffffffff81d32ed0>] ? _raw_spin_unlock_irqrestore+0x3b/0x5c
> kernel:  [<ffffffff81d32edc>] ? _raw_spin_unlock_irqrestore+0x47/0x5c
> kernel:  [<ffffffff81110ffc>] ? finish_wait+0x1a4/0x1b0
> kernel:  [<ffffffff8128f73a>] khugepaged+0x1d4/0x484f
> kernel:  [<ffffffff8128f566>] ? hugepage_vma_revalidate+0xef/0xef
> kernel:  [<ffffffff810d5bcc>] ? finish_task_switch+0x3de/0x484
> kernel:  [<ffffffff81d32f18>] ? _raw_spin_unlock_irq+0x27/0x45
> kernel:  [<ffffffff8111d13f>] ? trace_hardirqs_on_caller+0x3d2/0x492
> kernel:  [<ffffffff81111487>] ? prepare_to_wait_event+0x3f7/0x3f7
> kernel:  [<ffffffff81d28bf5>] ? __schedule+0xa4d/0xd16
> kernel:  [<ffffffff810cd0de>] kthread+0x252/0x261
> kernel:  [<ffffffff8128f566>] ? hugepage_vma_revalidate+0xef/0xef
> kernel:  [<ffffffff810cce8c>] ? kthread_create_on_node+0x377/0x377
> kernel:  [<ffffffff81d3387f>] ret_from_fork+0x1f/0x40
> kernel:  [<ffffffff810cce8c>] ? kthread_create_on_node+0x377/0x377
> -- Reboot --
> 
> 	-ss

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ