linux-kernel - Re: [PATCH rc8-mm1] hotfix libata-scsi corruption

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0801221813540.8693@blonde.site>
Date:	Tue, 22 Jan 2008 18:36:18 +0000 (GMT)
From:	Hugh Dickins <hugh@...itas.com>
To:	James Bottomley <James.Bottomley@...senPartnership.com>
cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Jeff Garzik <jeff@...zik.org>,
	Alan Stern <stern@...land.harvard.edu>,
	linux-kernel@...r.kernel.org, linux-ide@...r.kernel.org,
	linux-scsi <linux-scsi@...r.kernel.org>
Subject: Re: [PATCH rc8-mm1] hotfix libata-scsi corruption

On Tue, 22 Jan 2008, James Bottomley wrote:
> > --- 2.6.24-rc8-mm1/drivers/ata/libata-scsi.c	2008-01-17 16:49:47.000000000 +0000
> > +++ linux/drivers/ata/libata-scsi.c	2008-01-22 15:45:40.000000000 +0000
> > @@ -826,7 +826,7 @@ static void ata_scsi_sdev_config(struct 
> >  	sdev->max_device_blocked = 1;
> >  
> >  	/* set the min alignment */
> > -	blk_queue_update_dma_alignment(sdev->request_queue, ATA_DMA_PAD_SZ - 1);
> > +	blk_queue_update_dma_alignment(sdev->request_queue, ATA_SECT_SIZE - 1);
> >  }
> >  
> >  static void ata_scsi_dev_config(struct scsi_device *sdev,
> 
> Unfortunately, that's likely not the entire hot fix ... the implication
> is that we have some mapping error in the way we do direct SG_IO.

Quite possibly, I'm not sure.

> What the fix you propose does is make it far more likely that block will
> copy, perform I/O then uncopy (almost certain, since most smartd data
> transfers are well under ATA_SECT_SIZE, which is 512).  However,
> implicating a generic path like this implies that we would get the same
> problem for SCSI commands as well, so the correct hot fix is below.

I've not noticed any problems from the normal activity of the system,
only from smartd's sg_ioctl.  My impression was that it's a libata
issue, because it's going through ata_pio_sector, which does 

	ap->ops->data_xfer(qc->dev, buf + offset, qc->sect_size, do_write);

referring to sect_size, without considering the possibility of any smaller
I/O size.  (Me, I don't even know why it's going PIO rather than DMA:
I'm assuming smartd does things that way, but there's no limit to my
ignorance here.)

> However, I'd like to see if we can track the problem through the SG_IO
> direct path ... how many adjacent page bytes are corrupt?  Just a few or
> a large number (I'm wondering if it's an off by one or off by alignment
> type bug)?

I've assumed it's just the one next page: because ata_pio_sector is
doing a data_xfer of sect_size ATA_SECT_SIZE 512 to an offset above
0xe00 in the smartd stack page.  The time I actually saw corruption
rather than an oops at startup, it was in a tmpfs swap vector page
running 64-bit kernel, and I didn't examine any further pages (just
checked the page before and matched it up to smartd's stack, already
suspecting that).

I don't believe it's an off-by-one at your SCSI end.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/