lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180416155832.GB12015@roeck-us.net>
Date:   Mon, 16 Apr 2018 08:58:32 -0700
From:   Guenter Roeck <linux@...ck-us.net>
To:     Vitaly Wool <vitalywool@...il.com>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        mawilcox@...rosoft.com, asavery@...omium.org, gwendal@...omium.org
Subject: Re: Crashes/hung tasks with z3pool under memory pressure

On Mon, Apr 16, 2018 at 02:43:01PM +0200, Vitaly Wool wrote:
> Hey Guenter,
> 
> On 04/13/2018 07:56 PM, Guenter Roeck wrote:
> 
> >On Fri, Apr 13, 2018 at 05:40:18PM +0000, Vitaly Wool wrote:
> >>On Fri, Apr 13, 2018, 7:35 PM Guenter Roeck <linux@...ck-us.net> wrote:
> >>
> >>>On Fri, Apr 13, 2018 at 05:21:02AM +0000, Vitaly Wool wrote:
> >>>>Hi Guenter,
> >>>>
> >>>>
> >>>>Den fre 13 apr. 2018 kl 00:01 skrev Guenter Roeck <linux@...ck-us.net>:
> >>>>
> >>>>>Hi all,
> >>>>>we are observing crashes with z3pool under memory pressure. The kernel
> >>>>version
> >>>>>used to reproduce the problem is v4.16-11827-g5d1365940a68, but the
> >>>>problem was
> >>>>>also seen with v4.14 based kernels.
> >>>>
> >>>>just before I dig into this, could you please try reproducing the errors
> >>>>you see with https://patchwork.kernel.org/patch/10210459/ applied?
> >>>>
> >>>As mentioned above, I tested with v4.16-11827-g5d1365940a68, which already
> >>>includes this patch.
> >>>
> >>Bah. Sorry. Expect an update after the weekend.
> >>
> >NP; easy to miss. Thanks a lot for looking into it.
> >
> I wonder if the following patch would make a difference:
> 
> diff --git a/mm/z3fold.c b/mm/z3fold.c
> index c0bca6153b95..5e547c2d5832 100644
> --- a/mm/z3fold.c
> +++ b/mm/z3fold.c
> @@ -887,19 +887,21 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries)
>  				goto next;
>  		}
>  next:
> -		spin_lock(&pool->lock);
>  		if (test_bit(PAGE_HEADLESS, &page->private)) {
>  			if (ret == 0) {
> -				spin_unlock(&pool->lock);
>  				free_z3fold_page(page);
>  				return 0;
>  			}
> -		} else if (kref_put(&zhdr->refcount, release_z3fold_page)) {
> -			atomic64_dec(&pool->pages_nr);
> -			spin_unlock(&pool->lock);
> -			return 0;
> +		} else {
> +			spin_lock(&zhdr->page_lock);
> +			if (kref_put(&zhdr->refcount, release_z3fold_page_locked)) {
> +				atomic64_dec(&pool->pages_nr);
> +				return 0;
> +			}
> +			spin_unlock(&zhdr->page_lock);
>  		}
> +		spin_lock(&pool->lock);
>  		/*
>  		 * Add to the beginning of LRU.
>  		 * Pool lock has to be kept here to ensure the page has
> 
No, it doesn't. Same crash.

BUG: MAX_LOCK_DEPTH too low!
turning off the locking correctness validator.
depth: 48  max: 48!
48 locks held by kswapd0/51:
 #0: 000000004d7a35a9 (&(&pool->lock)->rlock#3){+.+.}, at: z3fold_zpool_shrink+0x47/0x3e0
 #1: 000000007739f49e (&(&zhdr->page_lock)->rlock){+.+.}, at: z3fold_zpool_shrink+0xb7/0x3e0
 #2: 00000000ff6cd4c8 (&(&zhdr->page_lock)->rlock){+.+.}, at: z3fold_zpool_shrink+0xb7/0x3e0
 #3: 000000004cffc6cb (&(&zhdr->page_lock)->rlock){+.+.}, at: z3fold_zpool_shrink+0xb7/0x3e0
...
PU: 0 PID: 51 Comm: kswapd0 Not tainted 4.17.0-rc1-yocto-standard+ #11
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014
Call Trace:
 dump_stack+0x67/0x9b
 __lock_acquire+0x429/0x18f0
 ? __lock_acquire+0x2af/0x18f0
 ? __lock_acquire+0x2af/0x18f0
 ? lock_acquire+0x93/0x230
 lock_acquire+0x93/0x230
 ? z3fold_zpool_shrink+0xb7/0x3e0
 _raw_spin_trylock+0x65/0x80
 ? z3fold_zpool_shrink+0xb7/0x3e0
 ? z3fold_zpool_shrink+0x47/0x3e0
 z3fold_zpool_shrink+0xb7/0x3e0
 zswap_frontswap_store+0x180/0x7c0
...
BUG: sleeping function called from invalid context at mm/page_alloc.c:4320
in_atomic(): 1, irqs_disabled(): 0, pid: 51, name: kswapd0
INFO: lockdep is turned off.
Preemption disabled at:
[<0000000000000000>]           (null)
CPU: 0 PID: 51 Comm: kswapd0 Not tainted 4.17.0-rc1-yocto-standard+ #11
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014
Call Trace:
 dump_stack+0x67/0x9b
 ___might_sleep+0x16c/0x250
 __alloc_pages_nodemask+0x1e7/0x1490
 ? lock_acquire+0x93/0x230
 ? lock_acquire+0x93/0x230
 __read_swap_cache_async+0x14d/0x260
 zswap_writeback_entry+0xdb/0x340
 z3fold_zpool_shrink+0x2b1/0x3e0
 zswap_frontswap_store+0x180/0x7c0
 ? page_vma_mapped_walk+0x22/0x230
 __frontswap_store+0x6e/0xf0
 swap_writepage+0x49/0x70
...

This is with your patch applied on top of v4.17-rc1.

Guenter

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ