linux-kernel - Re: [PATCH RESEND] scsi: megaraid_sas: make module parameter scmd

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <CAKcXpBxZWxVY4dmBQWOv9_oACY1u4NZLCrmpxeorQZeCtM2imw@mail.gmail.com>
Date: Tue, 9 Apr 2024 15:12:03 +0800
From: Lei Chen <lei.chen@...rtx.com>
To: John Garry <john.g.garry@...cle.com>
Cc: Kashyap Desai <kashyap.desai@...adcom.com>, Sumit Saxena <sumit.saxena@...adcom.com>, 
	Shivasharan S <shivasharan.srikanteshwara@...adcom.com>, 
	Chandrakanth patil <chandrakanth.patil@...adcom.com>, 
	"James E.J. Bottomley" <jejb@...ux.ibm.com>, "Martin K. Petersen" <martin.petersen@...cle.com>, 
	megaraidlinux.pdl@...adcom.com, linux-scsi@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH RESEND] scsi: megaraid_sas: make module parameter
 scmd_timeout writable

Sorry for the non plain text format.

On Mon, Apr 8, 2024 at 8:30 PM John Garry <john.g.garry@...cle.com> wrote:
>
> On 08/04/2024 11:05, Lei Chen wrote:
> > When an scmd times out, block layer calls megasas_reset_timer to
> > make further decisions. scmd_timeout indicates when an scmd is really
> > timed-out.
>
> What does really timed-out mean?

scsi_times_out will call eh_timed_out (in megaraid driver, this
indicates megasas_reset_timer),
megasas_reset_timer determines whether a scmd is timed out. If not, it
will return
BLK_EH_RESET_TIMER to tell the block layer to reset the timer and do nothing.
>
>
> > If we want to make this process more fast, we can decrease
> > this value. This patch allows users to change this value in run-time.
> >
> > Signed-off-by: Lei Chen <lei.chen@...rtx.com>
> > ---
> >   drivers/scsi/megaraid/megaraid_sas_base.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
> > index 3d4f13da1ae8..2a165e5dc7a3 100644
> > --- a/drivers/scsi/megaraid/megaraid_sas_base.c
> > +++ b/drivers/scsi/megaraid/megaraid_sas_base.c
> > @@ -91,7 +91,7 @@ module_param(dual_qdepth_disable, int, 0444);
> >   MODULE_PARM_DESC(dual_qdepth_disable, "Disable dual queue depth feature. Default: 0");
> >
> >   static unsigned int scmd_timeout = MEGASAS_DEFAULT_CMD_TIMEOUT;
> > -module_param(scmd_timeout, int, 0444);
> > +module_param(scmd_timeout, int, 0644);
> >   MODULE_PARM_DESC(scmd_timeout, "scsi command timeout (10-90s), default 90s. See megasas_reset_timer.");
> >
> >   int perf_mode = -1;
>
> I don't know why megaraid_sas has special handling here (and bypasses
> SCSI midlayer).
>
> If the host is overloaded and you get a time-out as a command simply
> could not be handled in time, can you alternatively try reducing the
> scsi device queue depth?


Yeah, scsi layer and drivers already have some methods to control the
queue depth. For megaraid driver,
it will throttle queue depth in megasas_reset_timer. But since scsi
disks on the same megaraid card share
 the queue depth,  that will impact other scsi disks.
In most cases, a scsi disk is more likely to be misworking than a RAID
card, which makes scmd wrong and retry.
We want to adjust scmd_timeout without reloading the driver to make
scmds against abnormal scsi disks completed faster.