twx-linux/include
Yu Zhao e98337d11b mm/contig_alloc: support __GFP_COMP
Patch series "mm/hugetlb: alloc/free gigantic folios", v2.

Use __GFP_COMP for gigantic folios can greatly reduce not only the amount
of code but also the allocation and free time.

Approximate LOC to mm/hugetlb.c: +60, -240

Allocate and free 500 1GB hugeTLB memory without HVO by:
  time echo 500 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
  time echo 0 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages

       Before  After
Alloc  ~13s    ~10s
Free   ~15s    <1s

The above magnitude generally holds for multiple x86 and arm64 CPU
models.

Perf profile before:
  Alloc
    - 99.99% alloc_pool_huge_folio
       - __alloc_fresh_hugetlb_folio
          - 83.23% alloc_contig_pages_noprof
             - 47.46% alloc_contig_range_noprof
                - 20.96% isolate_freepages_range
                     16.10% split_page
                - 14.10% start_isolate_page_range
                - 12.02% undo_isolate_page_range

  Free
    - update_and_free_pages_bulk
       - 87.71% free_contig_range
          - 76.02% free_unref_page
             - 41.30% free_unref_page_commit
                - 32.58% free_pcppages_bulk
                   - 24.75% __free_one_page
               13.96% _raw_spin_trylock
         12.27% __update_and_free_hugetlb_folio

Perf profile after:
  Alloc
    - 99.99% alloc_pool_huge_folio
         alloc_gigantic_folio
       - alloc_contig_pages_noprof
          - 59.15% alloc_contig_range_noprof
             - 20.72% start_isolate_page_range
               20.64% prep_new_page
             - 17.13% undo_isolate_page_range

  Free
    - update_and_free_pages_bulk
       - __folio_put
       - __free_pages_ok
            7.46% free_tail_page_prepare
          - 1.97% free_one_page
               1.86% __free_one_page

This patch (of 3):

Support __GFP_COMP in alloc_contig_range().  When the flag is set, upon
success the function returns a large folio prepared by prep_new_page(),
rather than a range of order-0 pages prepared by split_free_pages() (which
is renamed from split_map_pages()).

alloc_contig_range() can be used to allocate folios larger than
MAX_PAGE_ORDER, e.g., gigantic hugeTLB folios.  So on the free path,
free_one_page() needs to handle that by split_large_buddy().

[akpm@linux-foundation.org: fix folio_alloc_gigantic_noprof() WARN expression, per Yu Liao]
Link: https://lkml.kernel.org/r/20240814035451.773331-1-yuzhao@google.com
Link: https://lkml.kernel.org/r/20240814035451.773331-2-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Frank van der Linden <fvdl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03 21:15:36 -07:00
..
acpi ACPI: video: Add Dell UART backlight controller detection 2024-08-19 15:58:35 +02:00
asm-generic arch_numa: switch over to numa_memblks 2024-09-03 21:15:32 -07:00
clocksource
crypto
drm A revert for a previous TTM commit causing stuttering, 3 fixes for 2024-08-30 11:28:11 +10:00
dt-bindings
keys
kunit
kvm
linux mm/contig_alloc: support __GFP_COMP 2024-09-03 21:15:36 -07:00
math-emu
media
memory
misc
net netfilter pull request 24-08-28 2024-08-29 11:35:54 +02:00
pcmcia
ras
rdma
rv
scsi scsi: core: Fix the return value of scsi_logical_block_count() 2024-08-16 21:02:06 -04:00
soc net: mscc: ocelot: treat 802.1ad tagged traffic as 802.1Q-untagged 2024-08-16 09:59:32 +01:00
sound ASoC: Fixes for v6.11 2024-08-09 09:58:07 +02:00
target
trace filemap: add trace events for get_pages, map_pages, and fault 2024-09-01 20:26:10 -07:00
uapi mm: remove PG_error 2024-09-01 20:26:05 -07:00
ufs scsi: ufs: core: Add a quirk for handling broken LSDBS field in controller capabilities register 2024-08-16 21:09:17 -04:00
vdso
video
xen