lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMZfGtVOOBWeQ0yfZikYdW_Bt1Dd+1ZR13UZK6VDnaaDAOzhkw@mail.gmail.com>
Date:   Thu, 29 Apr 2021 12:05:49 +0800
From:   Muchun Song <songmuchun@...edance.com>
To:     Mike Kravetz <mike.kravetz@...cle.com>
Cc:     Jonathan Corbet <corbet@....net>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, bp@...en8.de,
        X86 ML <x86@...nel.org>, hpa@...or.com,
        dave.hansen@...ux.intel.com, luto@...nel.org,
        Peter Zijlstra <peterz@...radead.org>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Andrew Morton <akpm@...ux-foundation.org>, paulmck@...nel.org,
        pawan.kumar.gupta@...ux.intel.com,
        Randy Dunlap <rdunlap@...radead.org>, oneukum@...e.com,
        anshuman.khandual@....com, jroedel@...e.de,
        Mina Almasry <almasrymina@...gle.com>,
        David Rientjes <rientjes@...gle.com>,
        Matthew Wilcox <willy@...radead.org>,
        Oscar Salvador <osalvador@...e.de>,
        Michal Hocko <mhocko@...e.com>,
        "Song Bao Hua (Barry Song)" <song.bao.hua@...ilicon.com>,
        David Hildenbrand <david@...hat.com>,
        HORIGUCHI NAOYA(堀口 直也) 
        <naoya.horiguchi@....com>,
        Joao Martins <joao.m.martins@...cle.com>,
        Xiongchun duan <duanxiongchun@...edance.com>,
        fam.zheng@...edance.com, zhengqi.arch@...edance.com,
        linux-doc@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
        Linux Memory Management List <linux-mm@...ck.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [External] Re: [PATCH v21 6/9] mm: hugetlb: alloc the vmemmap
 pages associated with each HugeTLB page

On Thu, Apr 29, 2021 at 10:43 AM Mike Kravetz <mike.kravetz@...cle.com> wrote:
>
> On 4/25/21 12:07 AM, Muchun Song wrote:
> > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > index d523a345dc86..d3abaaec2a22 100644
> > --- a/include/linux/hugetlb.h
> > +++ b/include/linux/hugetlb.h
> > @@ -525,6 +525,7 @@ unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
> >   *   code knows it has only reference.  All other examinations and
> >   *   modifications require hugetlb_lock.
> >   * HPG_freed - Set when page is on the free lists.
> > + * HPG_vmemmap_optimized - Set when the vmemmap pages of the page are freed.
> >   *   Synchronization: hugetlb_lock held for examination and modification.
> >   */
> >  enum hugetlb_page_flags {
> > @@ -532,6 +533,7 @@ enum hugetlb_page_flags {
> >       HPG_migratable,
> >       HPG_temporary,
> >       HPG_freed,
> > +     HPG_vmemmap_optimized,
> >       __NR_HPAGEFLAGS,
> >  };
> >
> > @@ -577,6 +579,7 @@ HPAGEFLAG(RestoreReserve, restore_reserve)
> >  HPAGEFLAG(Migratable, migratable)
> >  HPAGEFLAG(Temporary, temporary)
> >  HPAGEFLAG(Freed, freed)
> > +HPAGEFLAG(VmemmapOptimized, vmemmap_optimized)
> >
> >  #ifdef CONFIG_HUGETLB_PAGE
> >
>
> During migration, the page->private field of the original page may be
> cleared.  This will clear all hugetlb specific flags.  Prior to this
> new flag that was OK, as the only flag which could be set during migration
> was the Temporary flag and that is transfered to the target page.

I didn't realize this when I introduce the VmemmapOptimized.
Anyway, thanks for you pointed out this. I will fix this.

>
> If VmemmapOptimized optimized flag is cleared in the original page, we
> will get an addressing exception as shown below.

Thanks for your test.

>
> We should preserve page->private with something like this:
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index b234c3f3acb7..128e3e4126a2 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -625,7 +625,9 @@ void migrate_page_states(struct page *newpage, struct page *page)
>         if (PageSwapCache(page))
>                 ClearPageSwapCache(page);
>         ClearPagePrivate(page);
> -       set_page_private(page, 0);
> +       /* page->private contains hugetlb specific flags */
> +       if (!PageHuge(page))
> +               set_page_private(page, 0);
>
>         /*
>          * If any waiters have accumulated on the new page then
>
> --
> Mike Kravetz
>
>
> [  209.568110] BUG: unable to handle page fault for address: ffffea0004a5a000
> [  209.569417] #PF: supervisor write access in kernel mode
> [  209.570932] #PF: error_code(0x0003) - permissions violation
> [  209.572059] PGD 23fff8067 P4D 23fff8067 PUD 23fff7067 PMD 23ffd9067 PTE 800000021c98e061
> [  209.573679] Oops: 0003 [#1] SMP PTI
> [  209.574410] CPU: 1 PID: 1011 Comm: bash Not tainted 5.12.0-rc8-mm1+ #3
> [  209.575730] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
> [  209.577530] RIP: 0010:__update_and_free_page+0x58/0x2c0
> [  209.578618] Code: a3 01 00 00 49 b8 00 00 00 00 00 16 00 00 4c 89 e0 bf 01 00 00 00 49 b9 00 00 00 00 00 ea ff ff 4d 01 e0 49 c1 f8 06 83 c2 01 <48> 81 20 d4 5e ff ff 48 83 c0 40 f7 c2 ff 03 00 00 0f 84 f3 00 00
> [  209.582603] RSP: 0018:ffffc90001fdfa60 EFLAGS: 00010206
> [  209.583629] RAX: ffffea0004a5a000 RBX: 0000000000000000 RCX: 0000000000000009
> [  209.585148] RDX: 0000000000000081 RSI: 0000000000000200 RDI: 0000000000000001
> [  209.586649] RBP: ffffffff839ada30 R08: 0000000000129600 R09: ffffea0000000000
> [  209.588096] R10: 0000000000000001 R11: 0000000000000001 R12: ffffea0004a58000
> [  209.589643] R13: 0000000000000200 R14: ffffea0005ff8000 R15: ffffc90001fdfba0
> [  209.591194] FS:  00007f1e50065740(0000) GS:ffff888237d00000(0000) knlGS:0000000000000000
> [  209.592989] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  209.594222] CR2: ffffea0004a5a000 CR3: 000000018cd46004 CR4: 0000000000370ee0
> [  209.595762] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  209.597302] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  209.598925] Call Trace:
> [  209.599496]  migrate_pages+0xd8f/0x1030
> [  209.600372]  ? trace_event_raw_event_mm_migrate_pages_start+0xa0/0xa0
> [  209.601745]  ? alloc_migration_target+0x1c0/0x1c0
> [  209.602787]  alloc_contig_range+0x1e3/0x3d0
> [  209.603718]  cma_alloc+0x1ae/0x5f0
> [  209.604486]  alloc_fresh_huge_page+0x67/0x190
> [  209.605481]  alloc_pool_huge_page+0x72/0xf0
> [  209.606423]  set_max_huge_pages+0x128/0x2c0
> [  209.607369]  __nr_hugepages_store_common+0x3d/0xb0
> [  209.608442]  ? _kstrtoull+0x35/0xd0
> [  209.609225]  nr_hugepages_store+0x73/0x80
> [  209.610140]  kernfs_fop_write_iter+0x127/0x1c0
> [  209.611162]  new_sync_write+0x11f/0x1b0
> [  209.612069]  vfs_write+0x26f/0x380
> [  209.612880]  ksys_write+0x68/0xe0
> [  209.613628]  do_syscall_64+0x40/0x80
> [  209.614456]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [  209.615589] RIP: 0033:0x7f1e50155ff8
> [  209.616474] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 77 0d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
> [  209.620629] RSP: 002b:00007ffd7e3f97c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [  209.622319] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f1e50155ff8
> [  209.623966] RDX: 0000000000000002 RSI: 00005585ef557960 RDI: 0000000000000001
> [  209.625568] RBP: 00005585ef557960 R08: 000000000000000a R09: 00007f1e501e7e80
> [  209.627262] R10: 000000000000000a R11: 0000000000000246 R12: 00007f1e50229780
> [  209.628916] R13: 0000000000000002 R14: 00007f1e50224740 R15: 0000000000000002
> [  209.630457] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack rfkill nf_defrag_ipv6 nf_defrag_ipv4 ebtable_filter ebtables 9p ip6table_filter ip6_tables sunrpc snd_hda_codec_generic crct10dif_pclmul crc32_pclmul snd_hda_intel snd_intel_dspcfg ghash_clmulni_intel snd_hda_codec snd_hwdep joydev snd_hda_core snd_seq snd_seq_device snd_pcm virtio_balloon snd_timer snd soundcore 9pnet_virtio i2c_piix4 9pnet virtio_blk virtio_console virtio_net net_failover failover 8139too qxl drm_ttm_helper ttm drm_kms_helper crc32c_intel serio_raw drm 8139cp mii ata_generic virtio_pci virtio_pci_modern_dev virtio_ring pata_acpi virtio
> [  209.647105] CR2: ffffea0004a5a000
> [  209.647913] ---[ end trace 48e9b007521233a7 ]---

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ