linux-kernel - Re: [PATCH v2 2/2] scsi: sd: Rework asynchronous resume support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <54e20a27-a10b-b77a-e950-1d3398e2e907@acm.org>
Date:   Thu, 21 Jul 2022 11:14:55 -0700
From:   Bart Van Assche <bvanassche@....org>
To:     Geert Uytterhoeven <geert@...ux-m68k.org>
Cc:     "Martin K . Petersen" <martin.petersen@...cle.com>,
        Jaegeuk Kim <jaegeuk@...nel.org>,
        scsi <linux-scsi@...r.kernel.org>,
        Ming Lei <ming.lei@...hat.com>, Hannes Reinecke <hare@...e.de>,
        John Garry <john.garry@...wei.com>, ericspero@...oud.com,
        jason600.groome@...il.com,
        Linux-Renesas <linux-renesas-soc@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 2/2] scsi: sd: Rework asynchronous resume support

On 7/21/22 01:07, Geert Uytterhoeven wrote:
> On Wed, Jul 20, 2022 at 8:04 PM Bart Van Assche <bvanassche@....org> wrote:
>> That's surprising. Is there anything unusual about the test setup that I
>> should know, e.g. very small number of CPU cores or a very small queue
>> depth of the SATA device? How about adding pr_info() statements at the
>> start and end of the following functions and also before the return
>> statements in these functions to determine where execution of the START
>> command hangs?
>> * sd_start_done().
>> * sd_start_done_work().
> 
> None of these functions seem to be called at all?
That's weird. This means that either sd_submit_start() hangs or that the 
execution of the START command never finishes. The latter is unlikely 
since the SCSI error handler is assumed to abort commands that hang. It 
would also be weird if sd_submit_start() would hang before the START 
command is submitted since the code flow for submitting the START 
command is very similar to the code flow for submitting the START 
command without patch "scsi: sd: Rework asynchronous resume support" 
(calling scsi_execute()).

What is also weird is that there are at least two SATA setups on which 
this code works fine, including my Qemu setup.

Although it is possible to enable tracing at boot time, adding the 
following parameters to the kernel command line would generate too much 
logging data:

tp_printk 
trace_event=block_rq_complete,block_rq_error,block_rq_insert,block_rq_issue,block_rq_merge,block_rq_remap,block_rq_requeue,scsi_dispatch_cmd_done,scsi_dispatch_cmd_start,scsi_eh_wakeup,scsi_dispatch_cmd_error,scsi_dispatch_cmd_timeout 
scsi_mod.scsi_logging_level=32256

I'm not sure what the best way is to proceed since I cannot reproduce 
this issue.

Bart.