[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20191028.161817.126838643568293118.davem@davemloft.net>
Date: Mon, 28 Oct 2019 16:18:17 -0700 (PDT)
From: David Miller <davem@...emloft.net>
To: tj@...nel.org
Cc: netdev@...r.kernel.org, kernel-team@...com,
linux-kernel@...r.kernel.org, josef@...icpanda.com,
eric.dumazet@...il.com, jakub.kicinski@...ronome.com,
hannes@...xchg.org, linux-mm@...ck.org, mgorman@...e.de,
akpm@...ux-foundation.org
Subject: Re: [PATCH v2] net: fix sk_page_frag() recursion from memory
reclaim
From: Tejun Heo <tj@...nel.org>
Date: Thu, 24 Oct 2019 13:50:27 -0700
> sk_page_frag() optimizes skb_frag allocations by using per-task
> skb_frag cache when it knows it's the only user. The condition is
> determined by seeing whether the socket allocation mask allows
> blocking - if the allocation may block, it obviously owns the task's
> context and ergo exclusively owns current->task_frag.
>
> Unfortunately, this misses recursion through memory reclaim path.
> Please take a look at the following backtrace.
...
> In [0], tcp_send_msg_locked() was using current->page_frag when it
> called sk_wmem_schedule(). It already calculated how many bytes can
> be fit into current->page_frag. Due to memory pressure,
> sk_wmem_schedule() called into memory reclaim path which called into
> xfs and then IO issue path. Because the filesystem in question is
> backed by nbd, the control goes back into the tcp layer - back into
> tcp_sendmsg_locked().
>
> nbd sets sk_allocation to (GFP_NOIO | __GFP_MEMALLOC) which makes
> sense - it's in the process of freeing memory and wants to be able to,
> e.g., drop clean pages to make forward progress. However, this
> confused sk_page_frag() called from [2]. Because it only tests
> whether the allocation allows blocking which it does, it now thinks
> current->page_frag can be used again although it already was being
> used in [0].
>
> After [2] used current->page_frag, the offset would be increased by
> the used amount. When the control returns to [0],
> current->page_frag's offset is increased and the previously calculated
> number of bytes now may overrun the end of allocated memory leading to
> silent memory corruptions.
>
> Fix it by adding gfpflags_normal_context() which tests sleepable &&
> !reclaim and use it to determine whether to use current->task_frag.
>
> v2: Eric didn't like gfp flags being tested twice. Introduce a new
> helper gfpflags_normal_context() and combine the two tests.
>
> Signed-off-by: Tejun Heo <tj@...nel.org>
Applied and queued up for -stable, thanks Tejun.
Powered by blists - more mailing lists