linux-kernel - Re: [PATCH] scsi: target: iscsi: cxgbit: fix sleep-in-atomic-context bug in cxgbit_abort

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20221003144602.GA10901@yadro.com>
Date:   Mon, 3 Oct 2022 17:46:02 +0300
From:   Dmitry Bogdanov <d.bogdanov@...ro.com>
To:     Duoming Zhou <duoming@....edu.cn>
CC:     <linux-kernel@...r.kernel.org>, <martin.petersen@...cle.com>,
        <kuba@...nel.org>, <davem@...emloft.net>, <andrii@...nel.org>,
        <gregkh@...uxfoundation.org>, <linux-scsi@...r.kernel.org>,
        <target-devel@...r.kernel.org>
Subject: Re: [PATCH] scsi: target: iscsi: cxgbit: fix sleep-in-atomic-context
 bug in cxgbit_abort_conn

On Sun, Oct 02, 2022 at 09:40:47AM +0800, Duoming Zhou wrote:
> 
> The function iscsit_handle_time2retain_timeout() is a timer handler that
> runs in an atomic context, but it calls "alloc_skb(0, GFP_KERNEL | ...)"
> that may sleep. As a result, the sleep-in-atomic-context bug will happen.
> The process is shown below:
> 
> iscsit_handle_time2retain_timeout()
>  iscsit_close_session()
>   iscsit_free_connection_recovery_entries()
>    iscsit_free_cmd()
>     __iscsit_free_cmd()
>      cxgbit_unmap_cmd()
>       cxgbit_abort_conn()
>        alloc_skb(0, GFP_KERNEL | ...) //may sleep
> 
> This patch changes the gfp_t parameter of alloc_skb() from GFP_KERNEL to
> GFP_ATOMIC in order to mitigate the bug.
> 
> Fixes: 1ae01724ae92 ("cxgbit: Abort the TCP connection in case of data out timeout")
> Signed-off-by: Duoming Zhou <duoming@....edu.cn>
> ---
>  drivers/target/iscsi/cxgbit/cxgbit_cm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/target/iscsi/cxgbit/cxgbit_cm.c b/drivers/target/iscsi/cxgbit/cxgbit_cm.c
> index 3336d2b78bf..eb3da6d2c62 100644
> --- a/drivers/target/iscsi/cxgbit/cxgbit_cm.c
> +++ b/drivers/target/iscsi/cxgbit/cxgbit_cm.c
> @@ -697,7 +697,7 @@ __cxgbit_abort_conn(struct cxgbit_sock *csk, struct sk_buff *skb)
> 
>  void cxgbit_abort_conn(struct cxgbit_sock *csk)
>  {
> -       struct sk_buff *skb = alloc_skb(0, GFP_KERNEL | __GFP_NOFAIL);
> +       struct sk_buff *skb = alloc_skb(0, GFP_ATOMIC | __GFP_NOFAIL);
> 
>         cxgbit_get_csk(csk);
>         cxgbit_init_wr_wait(&csk->com.wr_wait);
> --
> 2.17.1
>

The last line in cxgbit_abort_conn is cxgbit_wait_for_reply() which
also should not be called in interrupt context.

Anyway this issue is not due to cxgbit, it is common for iSCSI itself:
iscsit_close_session()
  iscsit_free_connection_recovery_entries()
    iscsit_free_cmd()
      transport_generic_free_cmd()
        target_wait_free_cmd()
          wait_for_completion_timeout()

IMHO, there is no reason to call iscsit_close_session in an atomic context.
I have two patches relaited Time2Retain timer. I will share them today.

BR,
 Dmitry