lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <20221125145336.GB9892@thinkpad> Date: Fri, 25 Nov 2022 20:23:36 +0530 From: Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org> To: Johan Hovold <johan@...nel.org> Cc: Johan Hovold <johan+linaro@...nel.org>, Bjorn Andersson <andersson@...nel.org>, Andy Gross <agross@...nel.org>, Konrad Dybcio <konrad.dybcio@...ainline.org>, Rob Herring <robh+dt@...nel.org>, Krzysztof Kozlowski <krzysztof.kozlowski+dt@...aro.org>, Will Deacon <will@...nel.org>, Robin Murphy <robin.murphy@....com>, Christoph Hellwig <hch@....de>, Ard Biesheuvel <ardb@...nel.org>, Catalin Marinas <catalin.marinas@....com>, linux-arm-msm@...r.kernel.org, linux-arm-kernel@...ts.infradead.org, devicetree@...r.kernel.org, linux-kernel@...r.kernel.org Subject: Re: [PATCH] arm64: dts: qcom: sc8280xp: fix PCIe DMA coherency On Fri, Nov 25, 2022 at 03:43:59PM +0100, Johan Hovold wrote: > On Fri, Nov 25, 2022 at 07:56:25PM +0530, Manivannan Sadhasivam wrote: > > On Thu, Nov 24, 2022 at 03:25:01PM +0100, Johan Hovold wrote: > > > The devices on the SC8280XP PCIe buses are cache coherent and must be > > > marked as such to avoid data corruption. > > > > > > A coherent device can, for example, end up snooping stale data from the > > > caches instead of using data written by the CPU through the > > > non-cacheable mapping which is used for consistent DMA buffers for > > > non-coherent devices. > > > > > > > Also, the device may write into the L2 cache (or whatever cache that is > > accessible) if there is an entry and the CPU may invalidate it before reading > > from the DMA buffer. This will end up in a data loss. > > I mentioned the above as an example, but clearly it can affect also the > other direction (e.g. as described below). > > > > Note that this is much more likely to happen since commit c44094eee32f > > > ("arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()") > > > that was added in 6.1 and which removed the cache invalidation when > > > setting up the non-cacheable mapping. > > > > > > Marking the PCIe devices as coherent specifically fixes the intermittent > > > NVMe probe failures observed on the Thinkpad X13s, which was due to > > > corruption of the submission and completion queues. This was typically > > > observed as corruption of the admin submission queue (with well-formed > > > completion): > > > > > > could not locate request for tag 0x0 > > > nvme nvme0: invalid id 0 completed on queue 0 > > > > > > or corruption of the admin or I/O completion queues (malformed > > > completion): > > > > > > could not locate request for tag 0x45f > > > nvme nvme0: invalid id 25695 completed on queue 25965 > > > > > > presumably as these queues are small enough to not be allocated using > > > CMA which in turn make them more likely to be cached (e.g. due to > > > accesses to nearby pages through the cacheable linear map). Increasing > > > the buffer sizes to two pages to force CMA allocation also appears to > > > make the problem go away. > > > > > > > I don't think the problem will go away if the allocation happens from CMA > > region. It may just decrease the chances of cache hit but it could always > > happen due to the existence of linear mapping with cacheable attribute. > > I never claimed it would fix the problem, I explicitly wrote that it > made it less likely to occur (to the point where my reproducer no longer > triggers). > > Increasing the buffer sizes to two pages to force CMA allocation also appears > to make the problem go away. The "go away" part sounded like a claim to me and hence I added the statement. But no worries :) Thanks, Mani > Johan -- மணிவண்ணன் சதாசிவம்
Powered by blists - more mailing lists