[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <159a881fd0393cc390a4597a1d0c39b1903fe906.camel@HansenPartnership.com>
Date: Tue, 06 Jan 2026 22:35:20 -0500
From: James Bottomley <James.Bottomley@...senPartnership.com>
To: duoming@....edu.cn
Cc: linux-scsi@...r.kernel.org, linux-kernel@...r.kernel.org,
martin.petersen@...cle.com, stable@...nel.org
Subject: Re: [PATCH RESEND] scsi: ppa: Fix use-after-free caused by
unfinished delayed work
On Tue, 2026-01-06 at 13:35 +0800, duoming@....edu.cn wrote:
> On Sun, 04 Jan 2026 18:30:48 -0500 James Bottomley wrote:
[...]
> > I know how timeouts work ... I don't need the AI summary. if there
> > is an outstanding command, the timeout will fire long after you've
> > disabled the queue and run through ppa_detach and the next thing
> > that will happen is the error handler will try to abort the
> > command, eventually causing ppa_abort to be called, which is going
> > to dereference the ppa_struct that ppa_detach freed.
>
> What do you think of removing the ppa_abort() callback function?
> This callback function is not necessary. The eh_abort_handler
> function pointer is checked in scsi_abort_command() function,
> and if the function pointer does not exist, it directly returns
> FAILED. The subsequent process will be handled by SCSI error
> handler thread - scsi_error_handler(). Therefore the issue you
> mentioned could be avoided.
Well, no, because that would lose us runtime error handling which is
pretty essential for a flakey device like parport. Also, if you remove
that callback, it will escalate to the reset handler, which does
exactly the same thing with ppa_struct.
I was originally looking at this as a reference counting problem, but
there is another way of solving it and that's to stop requests and
drain the queue before freeing the resources. It turns out that's
exactly what scsi_remove_host() does (via __scsi_remove_device which
calls blk_mq_destroy_queue) so, since there can't be any outstanding
commands, it's actually impossible the delayed work queue is active
after scsi_remove_host() is called and thus the original race you
identified can't actually happen.
Regards,
James
Powered by blists - more mailing lists