lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 05 Mar 2009 16:54:19 +0000
From:	Andrew Patterson <andrew.patterson@...com>
To:	scameron@...rdog.cca.cpqcorp.net
Cc:	FUJITA Tomonori <fujita.tomonori@....ntt.co.jp>,
	linux-kernel@...r.kernel.org, mike.miller@...com,
	jens.axboe@...cle.com, akpm@...ux-foundation.org,
	linux-scsi@...r.kernel.org, coldwell@...hat.com, hare@...ell.com,
	iss_storagedev@...com
Subject: Re: [PATCH] hpsa: SCSI driver for HP Smart Array controllers

On Thu, 2009-03-05 at 08:21 -0600, scameron@...rdog.cca.cpqcorp.net
wrote:
> On Thu, Mar 05, 2009 at 02:48:09PM +0900, FUJITA Tomonori wrote:
> > On Tue, 3 Mar 2009 10:28:21 -0600
> > scameron@...rdog.cca.cpqcorp.net wrote:
> > 
> > > On Tue, Mar 03, 2009 at 03:35:26PM +0900, FUJITA Tomonori wrote:
> > > > On Mon, 2 Mar 2009 08:56:50 -0600
> > > > scameron@...rdog.cca.cpqcorp.net wrote:
> > > > 
> > > > > [...]
> > > > > > > +     .this_id                = -1,
> > > > > > > +     .sg_tablesize           = MAXSGENTRIES,
> > > > > > 
> > > > > > MAXSGENTRIES (32) is the limitation of hardware? If not, it might be
> > > > > > better to enlarge this for better performance?
> > > > > 
> > > > > Yes, definitely, though this value varies from controller to controller,
> > > > > so this is just a default value that needs to be overridden, probably
> > > > > in hpsa_scsi_detect().
> > > > 
> > > > I see. If we override this in hpsa_scsi_detect(), we need a trick for
> > > > SG in CommandList_struct, I guess.
> > > 
> > > Yes.  There are some limits to what can be put into CommandList_struct
> > > directly, but there is also scatter gather chaining, in which we use
> > > the last element in the CommandList_struct to point to another buffer
> > > of SG entries.
> > > 
> > > If you have a system with a lot of controllers, having a large number of 
> > > scatter gathers can be a bit of a memory hog, and since this memory is all
> > > via pci_alloc_consistent, that can be a concern.  It would be nice if
> > > there was a way for the user to specify differing amounts of scatter
> > > gathers for different controller instances so for instance the controller
> > > which he's running his big oracle database, or webserver or whatever on
> > > gets lots, while the controller he's booted from that's mostly idle
> > > gets not so many.  I don't know what a good way for a user to identify
> > > what controller he's talking about in a module parameter would be 
> > > though.  Maybe by pci domain/bus/device/function?  Maybe something along
> > > the lines of:
> > > 
> > > 	modprobe hpsa dev1=0:0e:00.0 sg1=1000 dev2=0:0b:00.0 sg2=31
> > > 
> > > to say that one controller gets 1000 scatter gather elements, but
> > > another gets only 31.  But PCI busses can change if hardware 
> > > configuration changes, and this isn't exactly obvious, so seems less
> > > than ideal.  Any bright ideas on that front?
> > 
> > We have /sys/class/scsi_host/host*/sg_tablesize:
> > 
> > How about modifying this value on the fly?
> > 
> > fujita@...ver:/sys/class/scsi_host/host3$ echo 1000 > sg_tablesize
> > 
> 
> We pci_alloc_consistent that space, so... I think that would mean
> we'd have to do things considerably differently.  I think we'd have
> to quit allocating commands in big chunks, and instead of indexing
> into that chunk we'd probably have to have an array of pointers or
> something.  If we wanted sg_tablesize adjustable down to single
> command counts, we'd probably have to allocate each command separately
> and have an array of pointers to those...
> 
> e.g. if you did 
> 
> 	echo 1000 > sg_tablesize
> 	echo 999 > sg_tablesize
> 
> you probably wouldn't want to keep the 1000 commands around,
> and then allocate 999 additional, then let all the outstanding 
> commands using the first 1000 block complete, then finally free
> the first block of 1000, leaving just the 999.  You'd probably want
> instead to free one of the 1000 to get to 999.
> 
> Likewise with this:
> 
> 	echo 999 > sg_tablesize
> 	echo 1000 > sg_tablesize
> 
> These are somewhat pathological cases, granted.
> 
> I'm not sure dynamically modifying the number of SGs a controller
> can do is something that comes up enough to be worth implementing
> something so complicated.
> 
> If it's settable at init time, that would probably be enough for
> the vast majority of uses (and more flexible than what we have now)
> and a lot easier to implement.
> 
> > 
> > Well, this needs more changes (to both the block layer and the scsi
> > mid layer) but is it nice to change this value dynamically?
> > 
> > Anyway, I think that it's better to address this fancy feature later
> > on (after the mainline inclusion). Let's put hpsa driver into mainline
> > first.
> 
> Agreed, we can think about all that stuff later.
> 
> Another fancy feature to think about later which would be nice:
> 
> On Smart arrays you can expand logical drives on the fly by
> adding physical disks, or portions of physical disks into them.
> Would be nice if there was a non-i/o-interrupting way to notify
> the scsi layer of this new space (maybe there already is?) so
> that if there's, say, a filesystem which can also dynamically
> grow on the fly on that embiggened logical drive, it can take
> advantage of that extra space.
> 
> Right now, the driver will do scsi_remove_device() and then
> scsi_add_device() if a logical drive changes size, which isn't
> very nice.
> 

The following might help:

There are several ways to "kick off" a device size change:

1. For SCSI devices do:

  # echo 1 > /sys/class/scsi_device/<device>/device/rescan

  or

  # blockdev --rereadpt <device file>

2. Other devices (not device mapper)

  # blockdev --rereadpt <device file>

See http://marc.info/?l=linux-kernel&m=122056065131792&w=2

Andrew

> > 
> > 
> > > > > Hmm, this doesn't seem all that complicated to me, and this code snippet
> > > > > has been pretty stable for about 10 years. it's nearly identical to what's in
> > > > > cpqarray in the 2.2.13 kernel from 1999:
> > > > > 
> > > > >                 do {
> > > > >                         i = find_first_zero_bit(h->cmd_pool_bits, NR_CMDS);
> > > > >                         if (i == NR_CMDS)
> > > > >                                 return NULL;
> > > > >                 } while(test_and_set_bit(i%32, h->cmd_pool_bits+(i/32)) != 0)
> > > > > 
> > > > > It's fast, works well, and has needed very little maintenance over the 
> > > > > years.  Without knowing what you have in mind specifically, I don't see a
> > > > > big need to change this.
> > > > 
> > > > I see. Seems that some drivers want something similar. I might come
> > > > back later on with a patch to replace this with library
> > > > functions.
> > > 
> > > There was some other discussion about pushing this sort of thing to 
> > > upper layers, using a tag generated in the scsi layer as a means
> > > of allocating driver command buffers, since, presumably there's a
> > > one to one mapping.  (I didn't completely grok it all though.)
> > 
> > Oops, I meant that I might come back with a patch to convert hpsa to
> > use the the block layer tagging, which you and Mike Christie are
> > talking about (yeah, my first suggestion to use lists was wrong. using
> > the block layer tagging looks much better).
> > 
> > 
> > By the way, have you guys started to work on the review comments for
> 
> We haven't really done much.  It's obvious that there's a lot to do
> based on the comments, and it's also obvious how to do most of it,
> and not hard, (e.g. ripping out /proc stuff, etc.), there's just a
> lot of other non-kernel related work keeping us busy at the moment.
> 
> > the next submission? The driver has some minor style issues that have
> > not been mentioned yet. For example, the comment style in the driver
> > is not preferred:
> > 
> > /* If this device a non-zero lun of a multi-lun device */
> > /* byte 4 of the 8-byte LUN addr will contain the logical */
> > /* unit no, zero otherise. */
> > 
> > The preferred style is:
> > 
> > /*
> >  * If this device a non-zero lun of a multi-lun device
> >  * byte 4 of the 8-byte LUN addr will contain the logical
> >  * unit no, zero otherise.
> >  */
> 
> ok.
> 
> > 
> > Another example, I think that the SCSI-ml preferred style is (not
> > documented in CodingStyle though):
> > 
> > 'if (!ptr)' rather than 'if (ptr == NULL)'
> > 'if (!value)' rather than 'if (value == 0)'
> > 'if (ptr)' rather than 'if (ptr != NULL)'
> > 'if (value)' rather than 'if (value != 0)'
> 
> Ok.
> 
> > 
> > 
> > If you are already addressing the review comments, I just wait for the
> > next submission, then I'll send such minor patches. If you are not,
> > I'll send patches to address the review comments (including such minor
> > patches).
> 
> Ok, thanks.
> 
> -- steve
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ