linux-kernel - Re: [PATCH v2] net: fix sk_page_frag() recursion from memory reclaim

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20191028.161817.126838643568293118.davem@davemloft.net>
Date:   Mon, 28 Oct 2019 16:18:17 -0700 (PDT)
From:   David Miller <davem@...emloft.net>
To:     tj@...nel.org
Cc:     netdev@...r.kernel.org, kernel-team@...com,
        linux-kernel@...r.kernel.org, josef@...icpanda.com,
        eric.dumazet@...il.com, jakub.kicinski@...ronome.com,
        hannes@...xchg.org, linux-mm@...ck.org, mgorman@...e.de,
        akpm@...ux-foundation.org
Subject: Re: [PATCH v2] net: fix sk_page_frag() recursion from memory
 reclaim

From: Tejun Heo <tj@...nel.org>
Date: Thu, 24 Oct 2019 13:50:27 -0700

> sk_page_frag() optimizes skb_frag allocations by using per-task
> skb_frag cache when it knows it's the only user.  The condition is
> determined by seeing whether the socket allocation mask allows
> blocking - if the allocation may block, it obviously owns the task's
> context and ergo exclusively owns current->task_frag.
> 
> Unfortunately, this misses recursion through memory reclaim path.
> Please take a look at the following backtrace.
 ...
> In [0], tcp_send_msg_locked() was using current->page_frag when it
> called sk_wmem_schedule().  It already calculated how many bytes can
> be fit into current->page_frag.  Due to memory pressure,
> sk_wmem_schedule() called into memory reclaim path which called into
> xfs and then IO issue path.  Because the filesystem in question is
> backed by nbd, the control goes back into the tcp layer - back into
> tcp_sendmsg_locked().
> 
> nbd sets sk_allocation to (GFP_NOIO | __GFP_MEMALLOC) which makes
> sense - it's in the process of freeing memory and wants to be able to,
> e.g., drop clean pages to make forward progress.  However, this
> confused sk_page_frag() called from [2].  Because it only tests
> whether the allocation allows blocking which it does, it now thinks
> current->page_frag can be used again although it already was being
> used in [0].
> 
> After [2] used current->page_frag, the offset would be increased by
> the used amount.  When the control returns to [0],
> current->page_frag's offset is increased and the previously calculated
> number of bytes now may overrun the end of allocated memory leading to
> silent memory corruptions.
> 
> Fix it by adding gfpflags_normal_context() which tests sleepable &&
> !reclaim and use it to determine whether to use current->task_frag.
> 
> v2: Eric didn't like gfp flags being tested twice.  Introduce a new
>     helper gfpflags_normal_context() and combine the two tests.
> 
> Signed-off-by: Tejun Heo <tj@...nel.org>

Applied and queued up for -stable, thanks Tejun.