From 1db9716d44875d31acf29255710e82338560c177 Mon Sep 17 00:00:00 2001 From: Rong Qianfeng Date: Mon, 2 Sep 2024 10:39:35 +0800 Subject: [PATCH 1/5] arm64/mm: Delete __init region from memblock.reserved If CONFIG_ARCH_KEEP_MEMBLOCK is enabled, the memory information in memblock will be retained. We release the __init memory here, and we should also delete the corresponding region in memblock.reserved, which allows debugfs/memblock/reserved to display correct memory information. Signed-off-by: Rong Qianfeng Link: https://lore.kernel.org/r/20240902023940.43227-1-rongqianfeng@vivo.com Signed-off-by: Will Deacon --- arch/arm64/mm/init.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 9b5ab6818f7f..aea834a9691a 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -414,6 +414,12 @@ void __init mem_init(void) void free_initmem(void) { + unsigned long aligned_begin = ALIGN_DOWN((u64)__init_begin, PAGE_SIZE); + unsigned long aligned_end = ALIGN((u64)__init_end, PAGE_SIZE); + + /* Delete __init region from memblock.reserved. */ + memblock_free((void *)aligned_begin, aligned_end - aligned_begin); + free_reserved_area(lm_alias(__init_begin), lm_alias(__init_end), POISON_FREE_INITMEM, "unused kernel"); From 7eced90b202d63cdc1b9b11b1353adb1389830f9 Mon Sep 17 00:00:00 2001 From: Fares Mehanna Date: Mon, 2 Sep 2024 16:33:08 +0000 Subject: [PATCH 2/5] arm64: trans_pgd: mark PTEs entries as valid to avoid dead kexec() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The reasons for PTEs in the kernel direct map to be marked invalid are not limited to kfence / debug pagealloc machinery. In particular, memfd_secret() also steals pages with set_direct_map_invalid_noflush(). When building the transitional page tables for kexec from the current kernel's page tables, those pages need to become regular writable pages, otherwise, if the relocation places kexec segments over such pages, a fault will occur during kexec, leading to host going dark during kexec. This patch addresses the kexec issue by marking any PTE as valid if it is not none. While this fixes the kexec crash, it does not address the security concern that if processes owning secret memory are not terminated before kexec, the secret content will be mapped in the new kernel without being scrubbed. Suggested-by: Jan H. Schönherr Signed-off-by: Fares Mehanna Link: https://lore.kernel.org/r/20240902163309.97113-1-faresx@amazon.de Signed-off-by: Will Deacon --- arch/arm64/mm/trans_pgd.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/arm64/mm/trans_pgd.c b/arch/arm64/mm/trans_pgd.c index 5139a28130c0..0f7b484cb2ff 100644 --- a/arch/arm64/mm/trans_pgd.c +++ b/arch/arm64/mm/trans_pgd.c @@ -42,14 +42,16 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr) * the temporary mappings we use during restore. */ __set_pte(dst_ptep, pte_mkwrite_novma(pte)); - } else if ((debug_pagealloc_enabled() || - is_kfence_address((void *)addr)) && !pte_none(pte)) { + } else if (!pte_none(pte)) { /* * debug_pagealloc will removed the PTE_VALID bit if * the page isn't in use by the resume kernel. It may have * been in use by the original kernel, in which case we need * to put it back in our copy to do the restore. * + * Other cases include kfence / vmalloc / memfd_secret which + * may call `set_direct_map_invalid_noflush()`. + * * Before marking this entry valid, check the pfn should * be mapped. */ From eeb8fdfcf0901578c26ecfb11e814f36bc9a92f5 Mon Sep 17 00:00:00 2001 From: D Scott Phillips Date: Tue, 3 Sep 2024 09:45:32 -0700 Subject: [PATCH 3/5] arm64: Expose the end of the linear map in PHYSMEM_END The memory hot-plug and resource management code needs to know the largest address which can fit in the linear map, so set PHYSMEM_END for that purpose. This fixes a crash at boot when amdgpu tries to create DEVICE_PRIVATE_MEMORY and is given a physical address by the resource management code which is outside the range which can have a `struct page` | Unable to handle kernel paging request at virtual address 000001ffa6000034 | user pgtable: 4k pages, 48-bit VAs, pgdp=000008000287c000 | [000001ffa6000034] pgd=0000000000000000, p4d=0000000000000000 | Call trace: | __init_zone_device_page.constprop.0+0x2c/0xa8 | memmap_init_zone_device+0xf0/0x210 | pagemap_range+0x1e0/0x410 | memremap_pages+0x18c/0x2e0 | devm_memremap_pages+0x30/0x90 | kgd2kfd_init_zone_device+0xf0/0x200 [amdgpu] | amdgpu_device_ip_init+0x674/0x888 [amdgpu] | amdgpu_device_init+0x7a4/0xea0 [amdgpu] | amdgpu_driver_load_kms+0x28/0x1c0 [amdgpu] | amdgpu_pci_probe+0x1a0/0x560 [amdgpu] Signed-off-by: D Scott Phillips Link: https://lore.kernel.org/r/20240903164532.3874988-1-scott@os.amperecomputing.com Signed-off-by: Will Deacon --- arch/arm64/include/asm/memory.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h index 54fb014eba05..0480c61dbb4f 100644 --- a/arch/arm64/include/asm/memory.h +++ b/arch/arm64/include/asm/memory.h @@ -110,6 +110,8 @@ #define PAGE_END (_PAGE_END(VA_BITS_MIN)) #endif /* CONFIG_KASAN */ +#define PHYSMEM_END __pa(PAGE_END - 1) + #define MIN_THREAD_SHIFT (14 + KASAN_THREAD_SHIFT) /* From 70565f2be8807e5ea24dfb421197b881a02af5e2 Mon Sep 17 00:00:00 2001 From: Barry Song Date: Thu, 5 Sep 2024 20:11:24 +1200 Subject: [PATCH 4/5] mm: arm64: document why pte is not advanced in contpte_ptep_set_access_flags() According to David and Ryan, there isn't a bug here, even though we don't advance the PTE entry, because __ptep_set_access_flags() only uses the access flags from the entry. However, we always check pte_same(pte, entry) using the first entry in __ptep_set_access_flags(). This means that the checks from 1 to nr - 1 are not comparing the same PTE indexes (thus, they always return false), which can be a bit confusing. To clarify the code, let's add some comments. Reviewed-by: Ryan Roberts Signed-off-by: Barry Song Cc: Ard Biesheuvel Cc: John Hubbard Cc: Mark Rutland Cc: Catalin Marinas Cc: David Hildenbrand Cc: Will Deacon Reviewed-by: David Hildenbrand Link: https://lore.kernel.org/r/20240905081124.9576-1-21cnbao@gmail.com Signed-off-by: Will Deacon --- arch/arm64/mm/contpte.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c index a3edced29ac1..55107d27d3f8 100644 --- a/arch/arm64/mm/contpte.c +++ b/arch/arm64/mm/contpte.c @@ -421,6 +421,12 @@ int contpte_ptep_set_access_flags(struct vm_area_struct *vma, ptep = contpte_align_down(ptep); start_addr = addr = ALIGN_DOWN(addr, CONT_PTE_SIZE); + /* + * We are not advancing entry because __ptep_set_access_flags() + * only consumes access flags from entry. And since we have checked + * for the whole contpte block and returned early, pte_same() + * within __ptep_set_access_flags() is likely false. + */ for (i = 0; i < CONT_PTES; i++, ptep++, addr += PAGE_SIZE) __ptep_set_access_flags(vma, addr, ptep, entry, 0); From c02e7c5c6da8c637fec60158b0d4b330841de5ce Mon Sep 17 00:00:00 2001 From: Joey Gouly Date: Thu, 5 Sep 2024 16:29:35 +0100 Subject: [PATCH 5/5] arm64/mm: use lm_alias() with addresses passed to memblock_free() The pointer argument to memblock_free() needs to be a linear map address, but in mem_init() we pass __init_begin/__init_end, which is a kernel image address. This results in warnings when building with CONFIG_DEBUG_VIRTUAL=y: virt_to_phys used for non-linear address: ffff800081270000 (set_reset_devices+0x0/0x10) WARNING: CPU: 0 PID: 1 at arch/arm64/mm/physaddr.c:12 __virt_to_phys+0x54/0x70 Modules linked in: CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-rc6-next-20240905 #5810 b1ebb0ad06653f35ce875413d5afad24668df3f3 Hardware name: FVP Base RevC (DT) pstate: 2161402005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) pc : __virt_to_phys+0x54/0x70 lr : __virt_to_phys+0x54/0x70 sp : ffff80008169be20 ... Call trace: __virt_to_phys+0x54/0x70 memblock_free+0x18/0x30 free_initmem+0x3c/0x9c kernel_init+0x30/0x1cc ret_from_fork+0x10/0x20 Fix this by having mem_init() convert the pointers via lm_alias(). Fixes: 1db9716d4487 ("arm64/mm: Delete __init region from memblock.reserved") Signed-off-by: Joey Gouly Suggested-by: Mark Rutland Cc: Will Deacon Cc: Catalin Marinas Cc: Rong Qianfeng Reviewed-by: Mark Rutland Link: https://lore.kernel.org/r/20240905152935.4156469-1-joey.gouly@arm.com Signed-off-by: Will Deacon --- arch/arm64/mm/init.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index aea834a9691a..a0400b9aa814 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -414,14 +414,16 @@ void __init mem_init(void) void free_initmem(void) { - unsigned long aligned_begin = ALIGN_DOWN((u64)__init_begin, PAGE_SIZE); - unsigned long aligned_end = ALIGN((u64)__init_end, PAGE_SIZE); + void *lm_init_begin = lm_alias(__init_begin); + void *lm_init_end = lm_alias(__init_end); + + WARN_ON(!IS_ALIGNED((unsigned long)lm_init_begin, PAGE_SIZE)); + WARN_ON(!IS_ALIGNED((unsigned long)lm_init_end, PAGE_SIZE)); /* Delete __init region from memblock.reserved. */ - memblock_free((void *)aligned_begin, aligned_end - aligned_begin); + memblock_free(lm_init_begin, lm_init_end - lm_init_begin); - free_reserved_area(lm_alias(__init_begin), - lm_alias(__init_end), + free_reserved_area(lm_init_begin, lm_init_end, POISON_FREE_INITMEM, "unused kernel"); /* * Unmap the __init region but leave the VM area in place. This