linux-kernel - Re: [Linux-nvdimm] [GIT PULL] PMEM driver for v4.1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <55657BED.80503@plexistor.com>
Date:	Wed, 27 May 2015 11:10:21 +0300
From:	Boaz Harrosh <boaz@...xistor.com>
To:	Matthew Wilcox <willy@...ux.intel.com>
CC:	Ingo Molnar <mingo@...nel.org>, Christoph Hellwig <hch@....de>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, linux-nvdimm@...ts.01.org
Subject: Re: [Linux-nvdimm] [GIT PULL] PMEM driver for v4.1

On 05/26/2015 10:31 PM, Matthew Wilcox wrote:
> On Tue, May 26, 2015 at 11:41:41AM +0300, Boaz Harrosh wrote:
>> I would please like to help. What is the breakage you
>> see with DAX.
>>
>> I'm routinely testing with DAX so it is a surprise,
>> Though I'm testing with my version with pages and
>> __copy_from_user_nocache, and so on.
>> Or I might have missed it. What test are you failing?
> 
> generic/019 fails in several fun ways.
> 
> The first way, which I fixed yesterday, is that the test was using
> the wrong way to find the 'make-it-fail' switch for the block device.
> That's now in xfstests.  The messages from xfstests were unnecessarily
> worrying; they were complaining about an inconsistent filesystem, which
> might be expected as the test had failed to abort cleanly and left a
> couple of tasks actively writing to the filesystem.
> 

OK Apparently I never ran generic/019 see below

> (I hadn't seen the problem before because I was using two devices pmem0
> and pmem1; with the new pmem driver, I got one device and partitioned
> it instead.  The problem only occurs when using partitions, not when
> using entire devices).
> 

My version of pmem+pages has back this option of slicing up pmem to
arbitrary sized pieces, because each piece can load with one of
EPMEM_UNCACHED, EPMEM_WRITE_TROUGH, EPMEM_WRITE_COMBINED, EPMEM_CACHED,
EPMEM_PAGES memory mapping mode.
(I have almost all the proper code for EPMEM_CACHED / EPMEM_PAGES
 support in surrounding code including m/fsync and mmap, memcpy_nt
 and so on, only left out is sync/freeze with regard to mmap)

But Yes I will tell the guys to add a partitions testing as well it
is important that it will work.

So would you suspect that we should have the same problem with
the original driver?

> The second way is that we hit two BUG/WARN messages.  The first (which
> we hit simultaneously on three CPUs in this run!) is:
> WARNING: CPU: 7 PID: 2922 at fs/buffer.c:1143 mark_buffer_dirty+0x19e/0x270()
> 

Hum, so that must be from some directory handling code, no? Regular DAX IO
does not do write_begin. But why would it make any difference if its a
partition or not, that's weird. Since this is a real page not a pmem buffer
at all.

> The stack trace probably isn't useful, and anyway it's horribly corrupted
> due to triggering the stack trace simultaneously on three CPUs.
> 
> The second one we hit was this one:
> 
>  ------------[ cut here ]------------
>  WARNING: CPU: 0 PID: 2930 at fs/block_dev.c:56 __blkdev_put+0xc5/0x210()
>  Modules linked in: ext4 crc16 jbd2 pmem binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc snd_hda_codec_hdmi iTCO_wdt iTCO_vendor_support evdev x86_pkg_temp_thermal coretemp kvm_intel kvm crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd psmouse serio_raw pcspkr i2c_i801 snd_hda_codec_realtek snd_hda_codec_generic lpc_ich mfd_core mei_me mei i915 snd_hda_intel i2c_algo_bit snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_hda_core loop video drm_kms_helper fuse snd_timer snd drm soundcore button processor parport_pc ppdev lp parport sg sd_mod ehci_pci ehci_hcd ahci libahci crc32c_intel libata fan scsi_mod xhci_pci nvme xhci_hcd e1000e ptp pps_core usbcore usb_common thermal thermal_sys
>  CPU: 0 PID: 2930 Comm: umount Tainted: G        W       4.1.0-rc4+ #10
>  Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Q87M-D2H, BIOS F6 08/03/2013
>   ffffffff81a04063 ffff8800a58e3d98 ffffffff81653644 0000000000000000
>   0000000000000000 ffff8800a58e3dd8 ffffffff81081fea 0000000000000000
>   ffff880236580880 ffff880236580ae8 ffff880236580a60 ffff880236580898
>  Call Trace:
>   [<ffffffff81653644>] dump_stack+0x4c/0x65
>   [<ffffffff81081fea>] warn_slowpath_common+0x8a/0xc0
>   [<ffffffff810820da>] warn_slowpath_null+0x1a/0x20
>   [<ffffffff81260475>] __blkdev_put+0xc5/0x210
>   [<ffffffff81260f72>] blkdev_put+0x52/0x180
>   [<ffffffff8121e631>] kill_block_super+0x41/0x80
>   [<ffffffff8121ea94>] deactivate_locked_super+0x44/0x80
>   [<ffffffff8121ef0c>] deactivate_super+0x6c/0x80
>   [<ffffffff81242133>] cleanup_mnt+0x43/0xa0
>   [<ffffffff812421e2>] __cleanup_mnt+0x12/0x20
>   [<ffffffff810a7104>] task_work_run+0xc4/0xf0
>   [<ffffffff8101bdd9>] do_notify_resume+0x59/0x80
>   [<ffffffff8165cd66>] int_signal+0x12/0x17
>  ---[ end trace 73da47765ccceacf ]---
> 
> I suspect these are generic ext4 problems that will occur without DAX.
> DAX just makes them more likely to occur since only metadata I/O now
> goes through the 'likely to fail' path.
> 

But why would WARN_ON_ONCE(write_inode_now(inode, true)) fail?
None of the pmem.c IO paths ever return any error. It must be failing in
block core even before reaching pmem. It looks like we are stuck with
dirty mappings after the devices are already being torn down, or something ...

> Are you skipping generic/019 or just not seeing these failures?
> 

Hu funny I just looked and I see with ./check auto I get
generic/018 1s ... [not run] defragmentation not supported for fstype "m1fs"
generic/020 0s ... 0s

019 is not even printing a skip. But if I run it directly I get:
generic/019      [not run] /sys/kernel/debug/fail_make_request  not found. \
	Seems that CONFIG_FAIL_MAKE_REQUEST kernel config option not enabled

So my bad, I will try to properly configure and recreate this failure here
as well.

Thanks
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/