mm/pagewalk: introduce folio_walk_start() + folio_walk_end()

We want to get rid of follow_page(), and have a more reasonable way to
just lookup a folio mapped at a certain address, perform some checks while
still under PTL, and then only conditionally grab a folio reference if
really required.

Further, we might want to get rid of some walk_page_range*() users that
really only want to temporarily lookup a single folio at a single address.

So let's add a new page table walker that does exactly that, similarly to
GUP also being able to walk hugetlb VMAs.

Add folio_walk_end() as a macro for now: the compiler is not easy to
please with the pte_unmap()->kunmap_local().

Note that one difference between follow_page() and get_user_pages(1) is
that follow_page() will not trigger faults to get something mapped.  So
folio_walk is at least currently not a replacement for get_user_pages(1),
but could likely be extended/reused to achieve something similar in the
future.

Link: https://lkml.kernel.org/r/20240802155524.517137-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This commit is contained in:
David Hildenbrand
2024-08-02 17:55:15 +02:00
committed by Andrew Morton
parent 3523a37e65
commit aa39ca6940
2 changed files with 260 additions and 0 deletions
+58
View File
@@ -130,4 +130,62 @@ int walk_page_mapping(struct address_space *mapping, pgoff_t first_index,
pgoff_t nr, const struct mm_walk_ops *ops,
void *private);
typedef int __bitwise folio_walk_flags_t;
/*
* Walk migration entries as well. Careful: a large folio might get split
* concurrently.
*/
#define FW_MIGRATION ((__force folio_walk_flags_t)BIT(0))
/* Walk shared zeropages (small + huge) as well. */
#define FW_ZEROPAGE ((__force folio_walk_flags_t)BIT(1))
enum folio_walk_level {
FW_LEVEL_PTE,
FW_LEVEL_PMD,
FW_LEVEL_PUD,
};
/**
* struct folio_walk - folio_walk_start() / folio_walk_end() data
* @page: exact folio page referenced (if applicable)
* @level: page table level identifying the entry type
* @pte: pointer to the page table entry (FW_LEVEL_PTE).
* @pmd: pointer to the page table entry (FW_LEVEL_PMD).
* @pud: pointer to the page table entry (FW_LEVEL_PUD).
* @ptl: pointer to the page table lock.
*
* (see folio_walk_start() documentation for more details)
*/
struct folio_walk {
/* public */
struct page *page;
enum folio_walk_level level;
union {
pte_t *ptep;
pud_t *pudp;
pmd_t *pmdp;
};
union {
pte_t pte;
pud_t pud;
pmd_t pmd;
};
/* private */
struct vm_area_struct *vma;
spinlock_t *ptl;
};
struct folio *folio_walk_start(struct folio_walk *fw,
struct vm_area_struct *vma, unsigned long addr,
folio_walk_flags_t flags);
#define folio_walk_end(__fw, __vma) do { \
spin_unlock((__fw)->ptl); \
if (likely((__fw)->level == FW_LEVEL_PTE)) \
pte_unmap((__fw)->ptep); \
vma_pgtable_walk_end(__vma); \
} while (0)
#endif /* _LINUX_PAGEWALK_H */