netdev - Re: [PATCH] SUNRPC: Fix svc_flush

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <EFDE7D43-5E0D-4484-9B8C-CBED30EA4E04@oracle.com>
Date:   Tue, 22 Sep 2020 10:14:03 -0400
From:   Chuck Lever <chuck.lever@...cle.com>
To:     He Zhe <zhe.he@...driver.com>
Cc:     Linux NFS Mailing List <linux-nfs@...r.kernel.org>,
        "open list:NETWORKING DRIVERS" <netdev@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Bruce Fields <bfields@...ldses.org>,
        Trond Myklebust <trond.myklebust@...merspace.com>,
        Anna Schumaker <anna.schumaker@...app.com>,
        "David S . Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>
Subject: Re: [PATCH] SUNRPC: Fix svc_flush_dcache()



> On Sep 22, 2020, at 3:13 AM, He Zhe <zhe.he@...driver.com> wrote:
> 
> 
> 
> On 9/21/20 3:51 AM, Chuck Lever wrote:
>> On platforms that implement flush_dcache_page(), a large NFS WRITE
>> triggers the WARN_ONCE in bvec_iter_advance():
>> 
>> Sep 20 14:01:05 klimt.1015granger.net kernel: Attempted to advance past end of bvec iter
>> Sep 20 14:01:05 klimt.1015granger.net kernel: WARNING: CPU: 0 PID: 1032 at include/linux/bvec.h:101 bvec_iter_advance.isra.0+0xa7/0x158 [sunrpc]
>> 
>> Sep 20 14:01:05 klimt.1015granger.net kernel: Call Trace:
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  svc_tcp_recvfrom+0x60c/0x12c7 [sunrpc]
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? bvec_iter_advance.isra.0+0x158/0x158 [sunrpc]
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? del_timer_sync+0x4b/0x55
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? test_bit+0x1d/0x27 [sunrpc]
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  svc_recv+0x1193/0x15e4 [sunrpc]
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? try_to_freeze.isra.0+0x6f/0x6f [sunrpc]
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? refcount_sub_and_test.constprop.0+0x13/0x40 [sunrpc]
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? svc_xprt_put+0x1e/0x29f [sunrpc]
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? svc_send+0x39f/0x3c1 [sunrpc]
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  nfsd+0x282/0x345 [nfsd]
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? __kthread_parkme+0x74/0xba
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  kthread+0x2ad/0x2bc
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? nfsd_destroy+0x124/0x124 [nfsd]
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? test_bit+0x1d/0x27
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? kthread_mod_delayed_work+0x115/0x115
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ret_from_fork+0x22/0x30
>> 
>> Reported-by: He Zhe <zhe.he@...driver.com>
>> Fixes: ca07eda33e01 ("SUNRPC: Refactor svc_recvfrom()")
>> Signed-off-by: Chuck Lever <chuck.lever@...cle.com>
>> ---
>> net/sunrpc/svcsock.c |    2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> Hi Zhe-
>> 
>> If you confirm this fixes your issue and there are no other
>> objections or regressions, I can submit this for v5.9-rc.
> 
> I don't quite get why we add "seek" to "size". It seems this action does not
> reflect the actual scenario and forcedly neutralizes the WARN_ONCE check in
> bvec_iter_advance, so that it may "advance past end of bvec iter" and thus
> introduces overflow.

> Why don't we avoid this problem at the very begginning like my v1? That is, call
> svc_flush_bvec only when we have received more than we want to seek.
> 
>         len = sock_recvmsg(svsk->sk_sock, &msg, MSG_DONTWAIT);
> -       if (len > 0)
> +       if (len > 0 && (size_t)len > (seek & PAGE_MASK))
>                 svc_flush_bvec(bvec, len, seek);

Because this doesn't fix the underlying bug that triggered the
WARN_ONCE.

svc_tcp_recvfrom() attempts to assemble a possibly large RPC Call
from a sequence of sock_recvmsg's.

@seek is the running number of bytes that has been received so
far for the RPC Call we are assembling. @size is the number of
bytes that was just received in the most recent sock_recvmsg.

We want svc_flush_bvec to flush just the area of @bvec that
hasn't been flushed yet.

Thus: the current size of the partial Call message in @bvec is
@seek + @size. The starting location of the flush is
@seek & PAGE_MASK. This aligns the flush so it starts on a page
boundary.

This:

 230         struct bvec_iter bi = {
 231                 .bi_size        = size + seek,
 232         };

 235         bvec_iter_advance(bvec, &bi, seek & PAGE_MASK);

advances the bvec_iter to the part of @bvec that hasn't been
flushed yet.

This loop:

 236         for_each_bvec(bv, bvec, bi, bi)
 237                 flush_dcache_page(bv.bv_page);

flushes each page starting at that point to the end of the bytes
that have been received so far

In other words, ca07eda33e01 was wrong because it always flushed
the first section of @bvec, never the later parts of it.


> Regards,
> Zhe
> 
>> 
>> 
>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>> index d5805fa1d066..c2752e2b9ce3 100644
>> --- a/net/sunrpc/svcsock.c
>> +++ b/net/sunrpc/svcsock.c
>> @@ -228,7 +228,7 @@ static int svc_one_sock_name(struct svc_sock *svsk, char *buf, int remaining)
>> static void svc_flush_bvec(const struct bio_vec *bvec, size_t size, size_t seek)
>> {
>> 	struct bvec_iter bi = {
>> -		.bi_size	= size,
>> +		.bi_size	= size + seek,
>> 	};
>> 	struct bio_vec bv;

--
Chuck Lever