lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 23 Jul 2015 17:32:03 -0600
From:	Toshi Kani <toshi.kani@...com>
To:	Spencer Baugh <sbaugh@...ern.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Fengguang Wu <fengguang.wu@...el.com>,
	Joern Engel <joern@...fs.org>,
	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
	Mel Gorman <mgorman@...e.de>,
	Johannes Weiner <hannes@...xchg.org>,
	Michal Hocko <mhocko@...e.cz>,
	Shachar Raindel <raindel@...lanox.com>,
	Boaz Harrosh <boaz@...xistor.com>,
	Andy Lutomirski <luto@...capital.net>,
	Joonsoo Kim <iamjoonsoo.kim@....com>,
	Andrey Ryabinin <a.ryabinin@...sung.com>,
	Roman Pen <r.peniaev@...il.com>,
	Andrey Konovalov <adech.fo@...il.com>,
	Eric Dumazet <edumazet@...gle.com>,
	Dmitry Vyukov <dvyukov@...gle.com>,
	Rob Jones <rob.jones@...ethink.co.uk>,
	WANG Chao <chaowang@...hat.com>,
	open list <linux-kernel@...r.kernel.org>,
	"open list:MEMORY MANAGEMENT" <linux-mm@...ck.org>
Cc:	Joern Engel <joern@...estorage.com>,
	Spencer Baugh <Spencer.baugh@...estorage.com>
Subject: Re: [PATCH] mm: add resched points to
 remap_pmd_range/ioremap_pmd_range

On Thu, 2015-07-23 at 14:54 -0700, Spencer Baugh wrote:
> From: Joern Engel <joern@...fs.org>
> 
> Mapping large memory spaces can be slow and prevent high-priority
> realtime threads from preempting lower-priority threads for a long time.

Yes, and one of the goals of large page ioremap support is to address such
problem.

> In my case it was a 256GB mapping causing at least 950ms scheduler
> delay.  Problem detection is ratelimited and depends on interrupts
> happening at the right time, so actual delay is likely worse.

ioremap supports 1GB and 2MB mappings now.  If you create 1GB mappings, you
only need to initialize 256 pud entries, which should not take a long time.

Is the 256GB range aligned by 1GB (or 2MB)?  From the log below, it appears
that you ended up with 4KB mappings, which is the problem.

> ------------[ cut here ]------------
> WARNING: at arch/x86/kernel/irq.c:182 do_IRQ+0x126/0x140()
> Thread not rescheduled for 36 jiffies
> CPU: 14 PID: 6684 Comm: foo Tainted: G           O 3.10.59+
>  0000000000000009 ffff883f7fbc3ee0 ffffffff8163a12c ffff883f7fbc3f18
>  ffffffff8103f131 ffff887f48275ac0 0000000000000012 000000000000007c
>  0000000000000000 ffff887f5bc11fd8 ffff883f7fbc3f78 ffffffff8103f19c
> Call Trace:
>  <IRQ>  [<ffffffff8163a12c>] dump_stack+0x19/0x1b
>  [<ffffffff8103f131>] warn_slowpath_common+0x61/0x80
>  [<ffffffff8103f19c>] warn_slowpath_fmt+0x4c/0x50
>  [<ffffffff810bd917>] ? rcu_irq_exit+0x77/0xc0
>  [<ffffffff8164a556>] do_IRQ+0x126/0x140
>  [<ffffffff816407ef>] common_interrupt+0x6f/0x6f
>  <EOI>  [<ffffffff810fde68>] ? set_pageblock_migratetype+0x28/0x30
>  [<ffffffff8126da37>] ? clear_page_c_e+0x7/0x10
>  [<ffffffff811004b3>] ? get_page_from_freelist+0x5b3/0x880
>  [<ffffffff81100863>] __alloc_pages_nodemask+0xe3/0x810
>  [<ffffffff8126f48b>] ? trace_hardirqs_on_thunk+0x3a/0x3c
>  [<ffffffff81138206>] alloc_pages_current+0x86/0x120
>  [<ffffffff810fc02e>] __get_free_pages+0xe/0x50
>  [<ffffffff81034e85>] pte_alloc_one_kernel+0x15/0x20
>  [<ffffffff8111b6cd>] __pte_alloc_kernel+0x1d/0xf0

This shows that you created 4KB (pte) mappings.

>  [<ffffffff8126531c>] ioremap_page_range+0x2cc/0x320
>  [<ffffffff81031619>] __ioremap_caller+0x1e9/0x2b0
>  [<ffffffff810316f7>] ioremap_nocache+0x17/0x20
>  [<ffffffff81275b45>] pci_iomap+0x55/0xb0
>  [<ffffffffa007f29a>] vfio_pci_mmap+0x1ea/0x210 [vfio_pci]
>  [<ffffffffa0025173>] vfio_device_fops_mmap+0x23/0x30 [vfio]
>  [<ffffffff81124ed8>] mmap_region+0x3d8/0x5e0
>  [<ffffffff811253e5>] do_mmap_pgoff+0x305/0x3c0
>  [<ffffffff8126f3f3>] ? call_rwsem_down_write_failed+0x13/0x20
>  [<ffffffff81111677>] vm_mmap_pgoff+0x67/0xa0
>  [<ffffffff811237e2>] SyS_mmap_pgoff+0x272/0x2e0
>  [<ffffffff810067e2>] SyS_mmap+0x22/0x30
>  [<ffffffff81648c59>] system_call_fastpath+0x16/0x1b
> ---[ end trace 6b0a8d2341444bdd ]---
> ------------[ cut here ]------------
> WARNING: at arch/x86/kernel/irq.c:182 do_IRQ+0x126/0x140()
> Thread not rescheduled for 95 jiffies
> CPU: 14 PID: 6684 Comm: foo Tainted: G        W  O 3.10.59+
>  0000000000000009 ffff883f7fbc3ee0 ffffffff8163a12c ffff883f7fbc3f18
>  ffffffff8103f131 ffff887f48275ac0 000000000000002f 000000000000007c
>  0000000000000000 00007fadd1e00000 ffff883f7fbc3f78 ffffffff8103f19c
> Call Trace:
>  <IRQ>  [<ffffffff8163a12c>] dump_stack+0x19/0x1b
>  [<ffffffff8103f131>] warn_slowpath_common+0x61/0x80
>  [<ffffffff8103f19c>] warn_slowpath_fmt+0x4c/0x50
>  [<ffffffff810bd917>] ? rcu_irq_exit+0x77/0xc0
>  [<ffffffff8164a556>] do_IRQ+0x126/0x140
>  [<ffffffff816407ef>] common_interrupt+0x6f/0x6f
>  <EOI>  [<ffffffff81640483>] ? _raw_spin_lock+0x13/0x30
>  [<ffffffff8111b621>] __pte_alloc+0x31/0xc0
>  [<ffffffff8111feac>] remap_pfn_range+0x45c/0x470

remap_pfn_range() does not have large page mappings support yet.  So, yes,
this can still take a long time at this point.  We can extend large page
support for this interface if necessary.

>  [<ffffffffa007f1f8>] vfio_pci_mmap+0x148/0x210 [vfio_pci]
>  [<ffffffffa0025173>] vfio_device_fops_mmap+0x23/0x30 [vfio]
>  [<ffffffff81124ed8>] mmap_region+0x3d8/0x5e0
>  [<ffffffff811253e5>] do_mmap_pgoff+0x305/0x3c0
>  [<ffffffff8126f3f3>] ? call_rwsem_down_write_failed+0x13/0x20
>  [<ffffffff81111677>] vm_mmap_pgoff+0x67/0xa0
>  [<ffffffff811237e2>] SyS_mmap_pgoff+0x272/0x2e0
>  [<ffffffff810067e2>] SyS_mmap+0x22/0x30
>  [<ffffffff81648c59>] system_call_fastpath+0x16/0x1b
> ---[ end trace 6b0a8d2341444bde ]---
> ------------[ cut here ]------------
> WARNING: at arch/x86/kernel/irq.c:182 do_IRQ+0x126/0x140()
> Thread not rescheduled for 45 jiffies
> CPU: 18 PID: 21726 Comm: foo Tainted: G           O 3.10.59+
>  0000000000000009 ffff88203f203ee0 ffffffff8163a13c ffff88203f203f18
>  ffffffff8103f131 ffff881ec5f1ad60 0000000000000016 000000000000006e
>  0000000000000000 ffffc939a6dd8000 ffff88203f203f78 ffffffff8103f19c
> Call Trace:
>  <IRQ>  [<ffffffff8163a13c>] dump_stack+0x19/0x1b
>  [<ffffffff8103f131>] warn_slowpath_common+0x61/0x80
>  [<ffffffff8103f19c>] warn_slowpath_fmt+0x4c/0x50
>  [<ffffffff810bd917>] ? rcu_irq_exit+0x77/0xc0
>  [<ffffffff8164a556>] do_IRQ+0x126/0x140
>  [<ffffffff816407ef>] common_interrupt+0x6f/0x6f
>  <EOI>  [<ffffffff81640861>] ? retint_restore_args+0x13/0x13
>  [<ffffffff810346c7>] ? free_memtype+0x87/0x150
>  [<ffffffff8112bb46>] ? vunmap_page_range+0x1e6/0x2a0
>  [<ffffffff8112c5e1>] remove_vm_area+0x51/0x70
>  [<ffffffff810318a7>] iounmap+0x67/0xa0

iounmap() should be fast if you created 1GB mappings.

Thanks,
-Toshi

>  [<ffffffff812757e5>] pci_iounmap+0x35/0x40
>  [<ffffffffa00973da>] vfio_pci_release+0x9a/0x150 [vfio_pci]
>  [<ffffffffa0065cbc>] vfio_device_fops_release+0x1c/0x40 [vfio]
>  [<ffffffff8114d82b>] __fput+0xdb/0x220
>  [<ffffffff8114d97e>] ____fput+0xe/0x10
>  [<ffffffff810614ac>] task_work_run+0xbc/0xe0
>  [<ffffffff81043d0e>] do_exit+0x3ce/0xe50
>  [<ffffffff8104557f>] do_group_exit+0x3f/0xa0
>  [<ffffffff81054769>] get_signal_to_deliver+0x1a9/0x5b0
>  [<ffffffff810023f8>] do_signal+0x48/0x5e0
>  [<ffffffff81056778>] ? k_getrusage+0x368/0x3d0
>  [<ffffffff810736e2>] ? default_wake_function+0x12/0x20
>  [<ffffffff816471c0>] ? kprobe_flush_task+0xc0/0x150
>  [<ffffffff81070684>] ? finish_task_switch+0xc4/0xe0
>  [<ffffffff810029f5>] do_notify_resume+0x65/0x80
>  [<ffffffff8164098e>] retint_signal+0x4d/0x9f
> ---[ end trace 3506c05e4a0af3e5 ]---

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ