linux-kernel - RE: [PATCH] rtw88: pci: Use general byte arrays as the elements of RX ring

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <F7CD281DE3E379468C6D07993EA72F84D1881C54@RTITMBSVM04.realtek.com.tw>
Date:   Tue, 30 Jul 2019 03:11:39 +0000
From:   Tony Chuang <yhchuang@...ltek.com>
To:     Jian-Hong Pan <jian-hong@...lessm.com>,
        David Laight <David.Laight@...lab.com>
CC:     Kalle Valo <kvalo@...eaurora.org>,
        "David S . Miller" <davem@...emloft.net>,
        "linux-wireless@...r.kernel.org" <linux-wireless@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux@...lessm.com" <linux@...lessm.com>,
        "stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: RE: [PATCH] rtw88: pci: Use general byte arrays as the elements of RX ring

> > > > While allocating all 512 buffers in one block (just over 4MB)
> > > > is probably not a good idea, you may need to allocated (and dma map)
> > > > then in groups.
> > >
> > > Thanks for reviewing.  But got questions here to double confirm the
> idea.
> > > According to original code, it allocates 512 skbs for RX ring and dma
> > > mapping one by one.  So, the new code allocates memory buffer 512
> > > times to get 512 buffer arrays.  Will the 512 buffers arrays be in one
> > > block?  Do you mean aggregate the buffers as a scatterlist and use
> > > dma_map_sg?
> >
> > If you malloc a buffer of size (8192+32) the allocator will either
> > round it up to a whole number of (often 4k) pages or to a power of
> > 2 of pages - so either 12k of 16k.
> > I think the Linux allocator does the latter.
> > Some of the allocators also 'steal' a bit from the front of the buffer
> > for 'red tape'.
> >
> > OTOH malloc the space 15 buffers and the allocator will round the
> > 15*(8192 + 32) up to 32*4k - and you waste under 8k across all the
> > buffers.
> >
> > You then dma_map the large buffer and split into the actual rx buffers.
> > Repeat until you've filled the entire ring.
> > The only complication is remembering the base address (and size) for
> > the dma_unmap and free.
> > Although there is plenty of padding to extend the buffer structure
> > significantly without using more memory.
> > Allocate in 15's and you (probably) have 512 bytes per buffer.
> > Allocate in 31's and you have 256 bytes.
> >
> > The problem is that larger allocates are more likely to fail
> > (especially if the system has been running for some time).
> > So you almost certainly want to be able to fall back to smaller
> > allocates even though they use more memory.
> >
> > I also wonder if you actually need 512 8k rx buffers to cover
> > interrupt latency?
> > I've not done any measurements for 20 years!
> 
> Thanks for the explanation.
> I am not sure the combination of 512 8k RX buffers.  Maybe Realtek
> folks can give us some idea.
> Tony Chuang any comment?
> 
> Jian-Hong Pan
> 

512 RX buffers is not necessary I think. But I haven't had a chance to
test if reduce the number of RX SKBs could affect the latency.
I can run some throughput tests and then decide a minimum numbers
that RX ring requires. Or if you can try it.

Thanks.
Yan-Hsuan