lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Thu, 14 Mar 2013 16:30:16 -0000
From:	"David Laight" <David.Laight@...LAB.COM>
To:	"Vipul Pandya" <vipul@...lsio.com>, <netdev@...r.kernel.org>,
	<linux-rdma@...r.kernel.org>, <linux-scsi@...r.kernel.org>
Cc:	<davem@...emloft.net>, <roland@...estorage.com>,
	<JBottomley@...allels.com>, <dm@...lsio.com>,
	<swise@...ngridcomputing.com>, <leedom@...lsio.com>,
	<naresh@...lsio.com>, <divy@...lsio.com>, <santosh@...lsio.com>,
	<arvindb@...lsio.com>, <abhishek@...lsio.com>
Subject: RE: [PATCH v2 net-next 05/22] cxgb4: Add T5 write combining support

> This patch implements a low latency Write Combining (aka Write Coalescing) work
> request path. PCIE maps User Space Doorbell BAR2 region writes to the new
> interface to SGE. SGE pulls a new message from PCIE new interface and if its a
> coalesced write work request then pushes it for processing. This patch copies
> coalesced work request to memory mapped BAR2 space.
...
> +	} else {
> +		memset(data, 0, 2 * sizeof(u64));
> +		*data += 2;
> +	}

Using memset is overkill (or rather a big overhead if it isn't
detected by the compiler). Nothing wrong with:
		(*data)[0] = 0;
		(*data)[1] = 0;
		*data += 2;
Actually, typing that, I realise that you should probably have
read *data into a local variable, and then updated it when finished.
Otherwise some of the accesses might be ones which force the
compiler to reload the value.
		
>  static inline void ring_tx_db(struct adapter *adap, struct sge_txq *q, int n)
>  {
> +	unsigned int *wr, index;
> +
>  	wmb();            /* write descriptors before telling HW */
>  	spin_lock(&q->db_lock);
>  	if (!q->db_disabled) {
> -		t4_write_reg(adap, MYPF_REG(SGE_PF_KDOORBELL),
> -			     QID(q->cntxt_id) | PIDX(n));
> +		if (is_t4(adap->chip)) {
> +			t4_write_reg(adap, MYPF_REG(SGE_PF_KDOORBELL),
> +				     QID(q->cntxt_id) | PIDX(n));
> +		} else {
> +			if (n == 1) {
> +				index = q->pidx ? (q->pidx - 1) : (q->size - 1);
> +				wr = (unsigned int *)&q->desc[index];
> +				cxgb_pio_copy((u64 __iomem *)
> +					      (adap->bar2 + q->udb + 64),
> +					      (u64 *)wr);

Why all the casts on 'wr' ?

> +			} else
> +				writel(n,  adap->bar2 + q->udb + 8);
> +			wmb();

Since you actually need memory barriers here on x86 you definitely
need a comment saying so, and it would (IMHO) better to use a
different define in the source (even if it is currently converted
to wmb() in a local header file).

Thinking further, for portability you might want to find some way
of abstracting the multi-word writes somehow.
For example, some of the ppc have a dma engine associated with the
PCIe master interface that can be used to generate large TLP.
The code would still want to spin waiting for the dma to complete
(since the transfer would be far faster than any interrupt path).

	David



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ