lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 25 Apr 2019 16:15:10 +0800
From:   Ming Lei <tom.leiming@...il.com>
To:     Qian Cai <cai@....pw>
Cc:     Jens Axboe <axboe@...nel.dk>, Christoph Hellwig <hch@....de>,
        linux-block <linux-block@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>,
        Dan Williams <dan.j.williams@...el.com>
Subject: Re: bio_iov_iter_get_pages() + page_alloc.shuffle=1 migrating failures

On Thu, Apr 25, 2019 at 4:13 PM Qian Cai <cai@....pw> wrote:
>
> Memory offline [1] starts to fail on linux-next on ppc64le with
> page_alloc.shuffle=1 where the "echo offline" command hangs with lots of
> migrating failures below. It seems in migrate_page_move_mapping()
>
>         if (!mapping) {
>                 /* Anonymous page without mapping */
>                 if (page_count(page) != expected_count)
>                         return -EAGAIN;
>
> It expected count=1 but actual count=2.
>
> There are two ways to make the problem go away. One is to remove this line in
> __shuffle_free_memory(),
>
>         shuffle_zone(z);
>
> The other is reverting some bio commits. Bisecting so far indicates the culprit
> is in one of those (the 3rd commit looks more suspicious than the others).
>
> block: only allow contiguous page structs in a bio_vec
> block: don't allow multiple bio_iov_iter_get_pages calls per bio
> block: change how we get page references in bio_iov_iter_get_pages
>
> [  446.578064] migrating pfn 2003d5eaa failed ret:22
> [  446.578066] page:c00a00800f57aa80 count:2 mapcount:0 mapping:c000001db4c827e9
> index:0x13c08a
> [  446.578220] anon
> [  446.578222] flags: 0x83fffc00008002e(referenced|uptodate|dirty|active|swapbacked)
> [  446.578347] raw: 083fffc00008002e c00a00800f57f808 c00a00800f579f88
> c000001db4c827e9
> [  446.944807] raw: 000000000013c08a 0000000000000000 00000002ffffffff
> c00020141a738008
> [  446.944883] page dumped because: migration failure
> [  446.944948] page->mem_cgroup:c00020141a738008
> [  446.945024] page allocated via order 0, migratetype Movable, gfp_mask
> 0x100cca(GFP_HIGHUSER_MOVABLE)
> [  446.945148]  prep_new_page+0x390/0x3a0
> [  446.945228]  get_page_from_freelist+0xd9c/0x1bf0
> [  446.945292]  __alloc_pages_nodemask+0x1cc/0x1780
> [  446.945335]  alloc_pages_vma+0xc0/0x360
> [  446.945401]  do_anonymous_page+0x244/0xb20
> [  446.945472]  __handle_mm_fault+0xcf8/0xfb0
> [  446.945532]  handle_mm_fault+0x1c0/0x2b0
> [  446.945615]  __get_user_pages+0x3ec/0x690
> [  446.945652]  get_user_pages_unlocked+0x104/0x2f0
> [  446.945693]  get_user_pages_fast+0xb0/0x200
> [  446.945762]  iov_iter_get_pages+0xf4/0x6a0
> [  446.945802]  bio_iov_iter_get_pages+0xc0/0x450
> [  446.945876]  blkdev_direct_IO+0x2e0/0x630
> [  446.945941]  generic_file_read_iter+0xbc/0x230
> [  446.945990]  blkdev_read_iter+0x50/0x80
> [  446.946031]  aio_read+0x128/0x1d0
> [  446.946082] migrating pfn 2003d5fe0 failed ret:22
> [  446.946084] page:c00a00800f57f800 count:2 mapcount:0 mapping:c000001db4c827e9
> index:0x13c19e
> [  446.946239] anon
> [  446.946241] flags: 0x83fffc00008002e(referenced|uptodate|dirty|active|swapbacked)
> [  446.946384] raw: 083fffc00008002e c000200deb3dfa28 c00a00800f57aa88
> c000001db4c827e9
> [  446.946497] raw: 000000000013c19e 0000000000000000 00000002ffffffff
> c00020141a738008
> [  446.946605] page dumped because: migration failure
> [  446.946662] page->mem_cgroup:c00020141a738008
> [  446.946724] page allocated via order 0, migratetype Movable, gfp_mask
> 0x100cca(GFP_HIGHUSER_MOVABLE)
> [  446.946846]  prep_new_page+0x390/0x3a0
> [  446.946899]  get_page_from_freelist+0xd9c/0x1bf0
> [  446.946959]  __alloc_pages_nodemask+0x1cc/0x1780
> [  446.947047]  alloc_pages_vma+0xc0/0x360
> [  446.947101]  do_anonymous_page+0x244/0xb20
> [  446.947143]  __handle_mm_fault+0xcf8/0xfb0
> [  446.947200]  handle_mm_fault+0x1c0/0x2b0
> [  446.947256]  __get_user_pages+0x3ec/0x690
> [  446.947306]  get_user_pages_unlocked+0x104/0x2f0
> [  446.947366]  get_user_pages_fast+0xb0/0x200
> [  446.947458]  iov_iter_get_pages+0xf4/0x6a0
> [  446.947515]  bio_iov_iter_get_pages+0xc0/0x450
> [  446.947588]  blkdev_direct_IO+0x2e0/0x630
> [  446.947636]  generic_file_read_iter+0xbc/0x230
> [  446.947703]  blkdev_read_iter+0x50/0x80
> [  446.947758]  aio_read+0x128/0x1d0
>
> [1]
> i=0
> found=0
> for mem in $(ls -d /sys/devices/system/memory/memory*); do
>         i=$((i + 1))
>         echo "iteration: $i"
>         echo offline > $mem/state
>         if [ $? -eq 0 ] && [ $found -eq 0 ]; then
>                 found=1
>                 continue
>         fi
>         echo online > $mem/state
> done

Please try the following patch:

https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/commit/?h=for-5.2/block&id=0257c0ed5ea3de3e32cb322852c4c40bc09d1b97

Thanks,
Ming Lei

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ