linux-kernel - Re: 答复: Re: [PATCH v2] scsi: Introduce sdev_printk

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20180406084901.3353tnfbup6gblfx@pathway.suse.cz>
Date:   Fri, 6 Apr 2018 10:49:01 +0200
From:   Petr Mladek <pmladek@...e.com>
To:     wen.yang99@....com.cn
Cc:     sergey.senozhatsky.work@...il.com, jejb@...ux.vnet.ibm.com,
        martin.petersen@...cle.com, linux-scsi@...r.kernel.org,
        linux-kernel@...r.kernel.org, Bart.VanAssche@....com,
        tj@...nel.org, jiang.biao2@....com.cn, zhong.weidong@....com.cn,
        tan.hu@....com.cn
Subject: Re: 答复: Re: [PATCH v2] scsi: Introduce sdev_printk_ratelimited to
 throttlefrequent printk

On Fri 2018-04-06 10:30:16, Petr Mladek wrote:
> On Tue 2018-04-03 14:19:43, wen.yang99@....com.cn wrote:
> > On the other hand，queue_lock is big, looping doing something under spinlock 
> > 
> > may locked many things and taking a long time, may cause some problems.
> > 
> > So This code needs to be optimized later:
> > 
> > scsi_request_fn()
> > {
> > 	for (;;) {
> > 		int rtn;
> > 		/*
> > 		 * get next queueable request.  We do this early to make sure
> > 		 * that the request is fully prepared even if we cannot
> > 		 * accept it.
> > 		 */
> > 
> > 		req = blk_peek_request(q);
> > 
> > 		if (!req)
> > 			break;
> > 
> > 		if (unlikely(!scsi_device_online(sdev))) {
> > 			sdev_printk(KERN_ERR, sdev,
> > 				    "rejected I/O to offline device\n");
> > 			scsi_kill_request(req, q);
> > 			continue;
> > 
> > 			^^^^^^^^^ still under spinlock
> > 		}
> 
> I wonder if the following might be the best solution after all:
> 
> 		if (unlikely(!scsi_device_online(sdev))) {
> 			scsi_kill_request(req, q);
> 
> 			/*
> 			 * printk() might take a while on slow consoles.
> 			 * Prevent solftlockups by releasing the lock.
> 			 */
> 			spin_unlock_irq(q->queue_lock);
> 			sdev_printk(KERN_ERR, sdev,
> 				    "rejecting I/O to offline device\n");
> 			spin_lock_irq(q->queue_lock);
> 			continue;
> 		}
> 
> I see that the lock is released also in several other situations.
> Therefore it looks safe. Also handling too many requests without
> releasing the lock seems to be a bad idea in general. I think
> that this solution was already suggested earlier.

Just to be sure. Is it safe to kill first few requests and proceed
the others?

I wonder if the device could actually get online without releasing
the queue lock. If not, we normally killed all requests.

I wonder if a local flag might actually help to reduce the number
of messages but keep the existing behavior. I mean something like

static void scsi_request_fn(struct request_queue *q)
{
	struct scsi_device *sdev = q->queuedata;
			   ^^^^^
			   The device is the same for each request
			   in this queue.


	struct request *req;
+	bool offline_reported = false;

	/*
	 * To start with, we keep looping until the queue is empty, or until
	 * the host is no longer able to accept any more requests.
	 */
	shost = sdev->host;
	for (;;) {
		int rtn;
		req = blk_peek_request(q);
		if (!req)
			break;

		if (unlikely(!scsi_device_online(sdev))) {
+			if (!offline_reported) {
				sdev_printk(KERN_ERR, sdev,
					"rejecting I/O to offline device\n");
+				offline_reported = true;
+			}
			scsi_kill_request(req, q);
			continue;
		}


Please, note that I am not familiar with the scsi code. I am involved
because this is printk related. Unfortunately, we could not make
printk() faster. The main principle is to get messages on the console
ASAP. Nobody knows when the system might die and any message might
be important.

Best Regards,
Petr