twx-linux/drivers
Oscar Salvador a08a2ae346 mm,memory_hotplug: allocate memmap from the added memory range
Physical memory hotadd has to allocate a memmap (struct page array) for
the newly added memory section.  Currently, alloc_pages_node() is used
for those allocations.

This has some disadvantages:
 a) an existing memory is consumed for that purpose
    (eg: ~2MB per 128MB memory section on x86_64)
    This can even lead to extreme cases where system goes OOM because
    the physically hotplugged memory depletes the available memory before
    it is onlined.
 b) if the whole node is movable then we have off-node struct pages
    which has performance drawbacks.
 c) It might be there are no PMD_ALIGNED chunks so memmap array gets
    populated with base pages.

This can be improved when CONFIG_SPARSEMEM_VMEMMAP is enabled.

Vmemap page tables can map arbitrary memory.  That means that we can
reserve a part of the physically hotadded memory to back vmemmap page
tables.  This implementation uses the beginning of the hotplugged memory
for that purpose.

There are some non-obviously things to consider though.

Vmemmap pages are allocated/freed during the memory hotplug events
(add_memory_resource(), try_remove_memory()) when the memory is
added/removed.  This means that the reserved physical range is not
online although it is used.  The most obvious side effect is that
pfn_to_online_page() returns NULL for those pfns.  The current design
expects that this should be OK as the hotplugged memory is considered a
garbage until it is onlined.  For example hibernation wouldn't save the
content of those vmmemmaps into the image so it wouldn't be restored on
resume but this should be OK as there no real content to recover anyway
while metadata is reachable from other data structures (e.g.  vmemmap
page tables).

The reserved space is therefore (de)initialized during the {on,off}line
events (mhp_{de}init_memmap_on_memory).  That is done by extracting page
allocator independent initialization from the regular onlining path.
The primary reason to handle the reserved space outside of
{on,off}line_pages is to make each initialization specific to the
purpose rather than special case them in a single function.

As per above, the functions that are introduced are:

 - mhp_init_memmap_on_memory:
   Initializes vmemmap pages by calling move_pfn_range_to_zone(), calls
   kasan_add_zero_shadow(), and onlines as many sections as vmemmap pages
   fully span.

 - mhp_deinit_memmap_on_memory:
   Offlines as many sections as vmemmap pages fully span, removes the
   range from zhe zone by remove_pfn_range_from_zone(), and calls
   kasan_remove_zero_shadow() for the range.

The new function memory_block_online() calls mhp_init_memmap_on_memory()
before doing the actual online_pages().  Should online_pages() fail, we
clean up by calling mhp_deinit_memmap_on_memory().  Adjusting of
present_pages is done at the end once we know that online_pages()
succedeed.

On offline, memory_block_offline() needs to unaccount vmemmap pages from
present_pages() before calling offline_pages().  This is necessary because
offline_pages() tears down some structures based on the fact whether the
node or the zone become empty.  If offline_pages() fails, we account back
vmemmap pages.  If it succeeds, we call mhp_deinit_memmap_on_memory().

Hot-remove:

 We need to be careful when removing memory, as adding and
 removing memory needs to be done with the same granularity.
 To check that this assumption is not violated, we check the
 memory range we want to remove and if a) any memory block has
 vmemmap pages and b) the range spans more than a single memory
 block, we scream out loud and refuse to proceed.

 If all is good and the range was using memmap on memory (aka vmemmap pages),
 we construct an altmap structure so free_hugepage_table does the right
 thing and calls vmem_altmap_free instead of free_pagetable.

Link: https://lkml.kernel.org/r/20210421102701.25051-5-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-05 11:27:26 -07:00
..
accessibility TTY/Serial driver updates for 5.13-rc1 2021-04-26 11:20:10 -07:00
acpi CFI on arm64 series for v5.13-rc1 2021-04-27 10:16:46 -07:00
amba
android selinux/stable-5.13 PR 20210426 2021-04-27 13:42:11 -07:00
ata SCSI misc on 20210428 2021-04-28 17:22:10 -07:00
atm Networking changes for 5.13. 2021-04-29 11:57:23 -07:00
auxdisplay
base mm,memory_hotplug: allocate memmap from the added memory range 2021-05-05 11:27:26 -07:00
bcma bcma: remove unused function 2021-04-18 09:36:56 +03:00
block for-5.13/drivers-2021-04-27 2021-04-28 14:39:37 -07:00
bluetooth Networking changes for 5.13. 2021-04-29 11:57:23 -07:00
bus ARM: SoC drivers for v5.13 2021-04-26 12:11:52 -07:00
cdrom gdrom: fix compilation error 2021-04-11 19:32:06 -06:00
char A bunch of little cleanups 2021-04-28 15:54:57 -07:00
clk Here's a collection of largely clk driver updates for the merge window. The 2021-04-28 17:13:56 -07:00
clocksource ARM: platform support for Apple M1 2021-04-26 12:30:36 -07:00
comedi staging: comedi: move out of staging directory 2021-04-15 09:26:25 +02:00
connector
counter
cpufreq Power management updates for 5.13-rc1 2021-04-26 15:10:25 -07:00
cpuidle Merge back earlier cpuidle updates for v5.13. 2021-04-08 20:05:49 +02:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2021-04-26 08:51:23 -07:00
cxl cxl/mem: Fix memory device capacity probing 2021-04-16 18:21:56 -07:00
dax
dca
devfreq PM / devfreq: imx8m-ddrc: Remove unneeded of_match_ptr() 2021-04-08 13:14:51 +09:00
dio
dma ARM: SoC drivers for v5.13 2021-04-26 12:11:52 -07:00
dma-buf drm/syncobj: use newly allocated stub fences 2021-04-08 12:21:13 +02:00
edac
eisa
extcon - Core Frameworks 2021-04-28 15:59:13 -07:00
firewire The usual updates from the irq departement: 2021-04-26 09:43:16 -07:00
firmware - removed get_fs/set_fs 2021-04-29 11:28:08 -07:00
fpga ARM: SoC drivers for v5.13 2021-04-26 12:11:52 -07:00
fsi
gnss
gpio - Core Frameworks 2021-04-28 15:59:13 -07:00
gpu i915: fix remap_io_sg to verify the pgprot 2021-04-30 11:20:39 -07:00
greybus greybus: es2: fix kernel-doc warnings 2021-04-16 07:26:50 +02:00
hid
hsi HSI: core: fix resource leaks in hsi_add_client_from_dt() 2021-04-16 00:14:49 +02:00
hv printk changes for 5.13 2021-04-27 18:09:44 -07:00
hwmon ACPI updates for 5.13-rc1 2021-04-26 15:03:23 -07:00
hwspinlock
hwtracing coresight: etm-perf: Fix define build issue when built as module 2021-04-16 09:34:57 +02:00
i2c - Core Frameworks 2021-04-28 15:59:13 -07:00
i3c
ide
idle intel_idle: add Iclelake-D support 2021-04-08 19:18:07 +02:00
iio spi: Updates for v5.13 2021-04-26 16:32:11 -07:00
infiniband RDMA/umem: batch page unpin in __ib_umem_release() 2021-04-30 11:20:37 -07:00
input - Core Frameworks 2021-04-28 15:59:13 -07:00
interconnect CFI on arm64 series for v5.13-rc1 2021-04-27 10:16:46 -07:00
iommu
ipack
irqchip - removed get_fs/set_fs 2021-04-29 11:28:08 -07:00
isdn
leds treewide: change my e-mail address, fix my name 2021-04-09 14:54:23 -07:00
lightnvm lightnvm: deprecated OCSSD support and schedule it for removal in Linux 5.15 2021-04-13 09:16:12 -06:00
macintosh
mailbox - qcom: enable support for SM8350 and SC7280 2021-04-28 16:10:33 -07:00
mcb
md for-5.13/drivers-2021-04-27 2021-04-28 14:39:37 -07:00
media drm for 5.13-rc1 2021-04-28 10:01:40 -07:00
memory Power management updates for 5.13-rc1 2021-04-26 15:10:25 -07:00
memstick memstick: r592: ignore kfifo_out() return code again 2021-04-26 11:08:23 +02:00
message scsi: message: fusion: Remove unused local variable 'vtarget' 2021-04-13 01:39:12 -04:00
mfd - Core Frameworks 2021-04-28 15:59:13 -07:00
misc CFI on arm64 series for v5.13-rc1 2021-04-27 10:16:46 -07:00
mmc MMC core: 2021-04-28 15:56:51 -07:00
most Staging/IIO driver updates for 5.13-rc1 2021-04-26 11:14:21 -07:00
mtd printk changes for 5.13 2021-04-27 18:09:44 -07:00
mux mux: gpio: Simplify code by using dev_err_probe() 2021-04-02 16:28:53 +02:00
net Networking changes for 5.13. 2021-04-29 11:57:23 -07:00
nfc nfc: st-nci: remove unnecessary label 2021-04-13 14:50:57 -07:00
ntb
nubus
nvdimm libnvdimm/region: Fix nvdimm_has_flush() to handle ND_REGION_ASYNC 2021-04-09 21:56:01 -07:00
nvme for-5.13/drivers-2021-04-27 2021-04-28 14:39:37 -07:00
nvmem nvmem: qfprom: Add support for fuse blowing on sc7280 2021-04-02 16:28:10 +02:00
of Networking changes for 5.13. 2021-04-29 11:57:23 -07:00
opp
parisc
parport
pci mm/vmalloc: remove unmap_kernel_range 2021-04-30 11:20:40 -07:00
pcmcia
perf
phy Networking changes for 5.13. 2021-04-29 11:57:23 -07:00
pinctrl ARM: SoC drivers for v5.13 2021-04-26 12:11:52 -07:00
platform USB/Thunderbolt patches for 5.13-rc1 2021-04-26 11:32:23 -07:00
pnp
power power supply and reset changes for the v5.13 series 2021-04-28 15:43:58 -07:00
powercap
pps TTY/Serial driver updates for 5.13-rc1 2021-04-26 11:20:10 -07:00
ps3
ptp
pwm - Core Frameworks 2021-04-28 15:59:13 -07:00
rapidio
ras RAS/CEC: Correct ce_add_elem()'s returned values 2021-04-07 11:52:26 +02:00
regulator - Core Frameworks 2021-04-28 15:59:13 -07:00
remoteproc
reset ARM SCMI updates for v5.13 2021-04-08 17:38:20 +02:00
rpmsg
rtc - Core Frameworks 2021-04-28 15:59:13 -07:00
s390 Networking changes for 5.13. 2021-04-29 11:57:23 -07:00
sbus
scsi Networking changes for 5.13. 2021-04-29 11:57:23 -07:00
sh The usual updates from the irq departement: 2021-04-26 09:43:16 -07:00
siox
slimbus
soc ARM: SoC drivers for v5.13 2021-04-26 12:11:52 -07:00
soundwire soundwire: intel_init: test link->cdns 2021-04-06 10:26:44 +05:30
spi CFI on arm64 series for v5.13-rc1 2021-04-27 10:16:46 -07:00
spmi
ssb
staging Networking changes for 5.13. 2021-04-29 11:57:23 -07:00
target SCSI misc on 20210428 2021-04-28 17:22:10 -07:00
tc
tee ARM: SoC drivers for v5.13 2021-04-26 12:11:52 -07:00
thermal
thunderbolt thunderbolt: Changes for v5.13 merge window 2021-04-13 12:17:14 +02:00
tty Power management updates for 5.13-rc1 2021-04-26 15:10:25 -07:00
uio
usb SCSI misc on 20210428 2021-04-28 17:22:10 -07:00
vdpa vdpa/mlx5: Set err = -ENOMEM in case dma_map_sg_attrs fails 2021-04-22 18:15:31 -04:00
vfio VFIO updates for v5.13-rc1 2021-04-28 17:19:47 -07:00
vhost SCSI misc on 20210428 2021-04-28 17:22:10 -07:00
video - New Device Support 2021-04-28 16:02:58 -07:00
virt
virtio
visorbus
vlynq
vme
w1 w1: ds28e17: Use module_w1_family to simplify the code 2021-04-10 10:58:21 +02:00
watchdog - Core Frameworks 2021-04-28 15:59:13 -07:00
xen SCSI misc on 20210428 2021-04-28 17:22:10 -07:00
zorro
Kconfig staging: comedi: move out of staging directory 2021-04-15 09:26:25 +02:00
Makefile staging: comedi: move out of staging directory 2021-04-15 09:26:25 +02:00