230671533d64631116be3ff9d407bd9ca5a58e1b
761522 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
230671533d |
mm: memory.low hierarchical behavior
This patch aims to address an issue in current memory.low semantics,
which makes it hard to use it in a hierarchy, where some leaf memory
cgroups are more valuable than others.
For example, there are memcgs A, A/B, A/C, A/D and A/E:
A A/memory.low = 2G, A/memory.current = 6G
//\\
BC DE B/memory.low = 3G B/memory.current = 2G
C/memory.low = 1G C/memory.current = 2G
D/memory.low = 0 D/memory.current = 2G
E/memory.low = 10G E/memory.current = 0
If we apply memory pressure, B, C and D are reclaimed at the same pace
while A's usage exceeds 2G. This is obviously wrong, as B's usage is
fully below B's memory.low, and C has 1G of protection as well. Also, A
is pushed to the size, which is less than A's 2G memory.low, which is
also wrong.
A simple bash script (provided below) can be used to reproduce
the problem. Current results are:
A: 1430097920
A/B: 711929856
A/C: 717426688
A/D: 741376
A/E: 0
To address the issue a concept of effective memory.low is introduced.
Effective memory.low is always equal or less than original memory.low.
In a case, when there is no memory.low overcommittment (and also for
top-level cgroups), these two values are equal.
Otherwise it's a part of parent's effective memory.low, calculated as a
cgroup's memory.low usage divided by sum of sibling's memory.low usages
(under memory.low usage I mean the size of actually protected memory:
memory.current if memory.current < memory.low, 0 otherwise). It's
necessary to track the actual usage, because otherwise an empty cgroup
with memory.low set (A/E in my example) will affect actual memory
distribution, which makes no sense. To avoid traversing the cgroup tree
twice, page_counters code is reused.
Calculating effective memory.low can be done in the reclaim path, as we
conveniently traversing the cgroup tree from top to bottom and check
memory.low on each level. So, it's a perfect place to calculate
effective memory low and save it to use it for children cgroups.
This also eliminates a need to traverse the cgroup tree from bottom to
top each time to check if parent's guarantee is not exceeded.
Setting/resetting effective memory.low is intentionally racy, but it's
fine and shouldn't lead to any significant differences in actual memory
distribution.
With this patch applied results are matching the expectations:
A: 2147930112
A/B: 1428721664
A/C: 718393344
A/D: 815104
A/E: 0
Test script:
#!/bin/bash
CGPATH="/sys/fs/cgroup"
truncate /file1 --size 2G
truncate /file2 --size 2G
truncate /file3 --size 2G
truncate /file4 --size 50G
mkdir "${CGPATH}/A"
echo "+memory" > "${CGPATH}/A/cgroup.subtree_control"
mkdir "${CGPATH}/A/B" "${CGPATH}/A/C" "${CGPATH}/A/D" "${CGPATH}/A/E"
echo 2G > "${CGPATH}/A/memory.low"
echo 3G > "${CGPATH}/A/B/memory.low"
echo 1G > "${CGPATH}/A/C/memory.low"
echo 0 > "${CGPATH}/A/D/memory.low"
echo 10G > "${CGPATH}/A/E/memory.low"
echo $$ > "${CGPATH}/A/B/cgroup.procs" && vmtouch -qt /file1
echo $$ > "${CGPATH}/A/C/cgroup.procs" && vmtouch -qt /file2
echo $$ > "${CGPATH}/A/D/cgroup.procs" && vmtouch -qt /file3
echo $$ > "${CGPATH}/cgroup.procs" && vmtouch -qt /file4
echo "A: " `cat "${CGPATH}/A/memory.current"`
echo "A/B: " `cat "${CGPATH}/A/B/memory.current"`
echo "A/C: " `cat "${CGPATH}/A/C/memory.current"`
echo "A/D: " `cat "${CGPATH}/A/D/memory.current"`
echo "A/E: " `cat "${CGPATH}/A/E/memory.current"`
rmdir "${CGPATH}/A/B" "${CGPATH}/A/C" "${CGPATH}/A/D" "${CGPATH}/A/E"
rmdir "${CGPATH}/A"
rm /file1 /file2 /file3 /file4
Link: http://lkml.kernel.org/r/20180405185921.4942-2-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
bbec2e1517 |
mm: rename page_counter's count/limit into usage/max
This patch renames struct page_counter fields: count -> usage limit -> max and the corresponding functions: page_counter_limit() -> page_counter_set_max() mem_cgroup_get_limit() -> mem_cgroup_get_max() mem_cgroup_resize_limit() -> mem_cgroup_resize_max() memcg_update_kmem_limit() -> memcg_update_kmem_max() memcg_update_tcp_limit() -> memcg_update_tcp_max() The idea behind this renaming is to have the direct matching between memory cgroup knobs (low, high, max) and page_counters API. This is pure renaming, this patch doesn't bring any functional change. Link: http://lkml.kernel.org/r/20180405185921.4942-1-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
1c4bc43ddf |
mm/memblock: introduce PHYS_ADDR_MAX
So far code was using ULLONG_MAX and type casting to obtain a phys_addr_t with all bits set. The typecast is necessary to silence compiler warnings on 32-bit platforms. Use the simpler but still type safe approach "~(phys_addr_t)0" to create a preprocessor define for all bits set. Link: http://lkml.kernel.org/r/20180406213809.566-1-stefan@agner.ch Signed-off-by: Stefan Agner <stefan@agner.ch> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
00b3a331fd |
mm: remove odd HAVE_PTE_SPECIAL
Remove the additional define HAVE_PTE_SPECIAL and rely directly on CONFIG_ARCH_HAS_PTE_SPECIAL. There is no functional change introduced by this patch Link: http://lkml.kernel.org/r/1523533733-25437-1-git-send-email-ldufour@linux.vnet.ibm.com Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Christophe LEROY <christophe.leroy@c-s.fr> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
3010a5ea66 |
mm: introduce ARCH_HAS_PTE_SPECIAL
Currently the PTE special supports is turned on in per architecture header files. Most of the time, it is defined in arch/*/include/asm/pgtable.h depending or not on some other per architecture static definition. This patch introduce a new configuration variable to manage this directly in the Kconfig files. It would later replace __HAVE_ARCH_PTE_SPECIAL. Here notes for some architecture where the definition of __HAVE_ARCH_PTE_SPECIAL is not obvious: arm __HAVE_ARCH_PTE_SPECIAL which is currently defined in arch/arm/include/asm/pgtable-3level.h which is included by arch/arm/include/asm/pgtable.h when CONFIG_ARM_LPAE is set. So select ARCH_HAS_PTE_SPECIAL if ARM_LPAE. powerpc __HAVE_ARCH_PTE_SPECIAL is defined in 2 files: - arch/powerpc/include/asm/book3s/64/pgtable.h - arch/powerpc/include/asm/pte-common.h The first one is included if (PPC_BOOK3S & PPC64) while the second is included in all the other cases. So select ARCH_HAS_PTE_SPECIAL all the time. sparc: __HAVE_ARCH_PTE_SPECIAL is defined if defined(__sparc__) && defined(__arch64__) which are defined through the compiler in sparc/Makefile if !SPARC32 which I assume to be if SPARC64. So select ARCH_HAS_PTE_SPECIAL if SPARC64 There is no functional change introduced by this patch. Link: http://lkml.kernel.org/r/1523433816-14460-2-git-send-email-ldufour@linux.vnet.ibm.com Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Suggested-by: Jerome Glisse <jglisse@redhat.com> Reviewed-by: Jerome Glisse <jglisse@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: David S. Miller <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Albert Ou <albert@sifive.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Christophe LEROY <christophe.leroy@c-s.fr> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
e69438596b |
mm/page_alloc: remove realsize in free_area_init_core()
Highmem's realsize always equals to freesize, so it is not necessary to spare a variable to record this. Link: http://lkml.kernel.org/r/20180413083859.65888-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
5d752600a8 |
mm: restructure memfd code
With the addition of memfd hugetlbfs support, we now have the situation where memfd depends on TMPFS -or- HUGETLBFS. Previously, memfd was only supported on tmpfs, so it made sense that the code resided in shmem.c. In the current code, memfd is only functional if TMPFS is defined. If HUGETLFS is defined and TMPFS is not defined, then memfd functionality will not be available for hugetlbfs. This does not cause BUGs, just a lack of potentially desired functionality. Code is restructured in the following way: - include/linux/memfd.h is a new file containing memfd specific definitions previously contained in shmem_fs.h. - mm/memfd.c is a new file containing memfd specific code previously contained in shmem.c. - memfd specific code is removed from shmem_fs.h and shmem.c. - A new config option MEMFD_CREATE is added that is defined if TMPFS or HUGETLBFS is defined. No functional changes are made to the code: restructuring only. Link: http://lkml.kernel.org/r/20180415182119.4517-4-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Herrmann <dh.herrmann@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Marc-Andr Lureau <marcandre.lureau@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
c49fcfcda8 |
mm/shmem: update file sealing comments and file checking
In preparation for memfd code restructure, update comments, definitions and function names dealing with file sealing to indicate that tmpfs and hugetlbfs are the supported filesystems. Also, change file pointer checks in memfd_file_seals_ptr to use defined interfaces instead of directly referencing file_operation structs. Link: http://lkml.kernel.org/r/20180415182119.4517-3-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Herrmann <dh.herrmann@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Marc-Andr Lureau <marcandre.lureau@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
5b9c98f308 |
mm/shmem: add __rcu annotations and properly deref radix entry
Patch series "restructure memfd code", v4. This patch (of 3): In preparation for memfd code restucture, clean up sparse warnings. Most changes required adding __rcu annotations. The routine find_swap_entry was modified to properly deference radix tree entries. Link: http://lkml.kernel.org/r/20180415182119.4517-2-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Matthew Wilcox <willy@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Marc-Andr Lureau <marcandre.lureau@gmail.com> Cc: David Herrmann <dh.herrmann@gmail.com> Cc: Khalid Aziz <khalid.aziz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
c0265342bf |
zram: introduce zram memory tracking
zRam as swap is useful for small memory device. However, swap means
those pages on zram are mostly cold pages due to VM's LRU algorithm.
Especially, once init data for application are touched for launching,
they tend to be not accessed any more and finally swapped out. zRAM can
store such cold pages as compressed form but it's pointless to keep in
memory. Better idea is app developers free them directly rather than
remaining them on heap.
This patch tell us last access time of each block of zram via "cat
/sys/kernel/debug/zram/zram0/block_state".
The output is as follows,
300 75.033841 .wh
301 63.806904 s..
302 63.806919 ..h
First column is zram's block index and 3rh one represents symbol (s:
same page w: written page to backing store h: huge page) of the block
state. Second column represents usec time unit of the block was last
accessed. So above example means the 300th block is accessed at
75.033851 second and it was huge so it was written to the backing store.
Admin can leverage this information to catch cold|incompressible pages
of process with *pagemap* once part of heaps are swapped out.
I used the feature a few years ago to find memory hoggers in userspace
to notify them what memory they have wasted without touch for a long
time. With it, they could reduce unnecessary memory space. However, at
that time, I hacked up zram for the feature but now I need the feature
again so I decided it would be better to upstream rather than keeping it
alone. I hope I submit the userspace tool to use the feature soon.
[akpm@linux-foundation.org: fix i386 printk warning]
[minchan@kernel.org: use ktime_get_boottime() instead of sched_clock()]
Link: http://lkml.kernel.org/r/20180420063525.GA253739@rodete-desktop-imager.corp.google.com
[akpm@linux-foundation.org: documentation tweak]
[akpm@linux-foundation.org: fix i386 printk warning]
[minchan@kernel.org: fix compile warning]
Link: http://lkml.kernel.org/r/20180508104849.GA8209@rodete-desktop-imager.corp.google.com
[rdunlap@infradead.org: fix printk formats]
Link: http://lkml.kernel.org/r/3652ccb1-96ef-0b0b-05d1-f661d7733dcc@infradead.org
Link: http://lkml.kernel.org/r/20180416090946.63057-5-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
d7eac6b6e1 |
zram: record accessed second
zRam as swap is useful for small memory device. However, swap means those pages on zram are mostly cold pages due to VM's LRU algorithm. Especially, once init data for application are touched for launching, they tend to be not accessed any more and finally swapped out. zRAM can store such cold pages as compressed form but it's pointless to keep in memory. Better idea is app developers free them directly rather than remaining them on heap. This patch records last access time of each block of zram so that With upcoming zram memory tracking, it could help userspace developers to reduce memory footprint. Link: http://lkml.kernel.org/r/20180416090946.63057-4-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
89e85bce4b |
zram: mark incompressible page as ZRAM_HUGE
Mark incompressible pages so that we could investigate who is the owner of the incompressible pages once the page is swapped out via using upcoming zram memory tracker feature. With it, we could prevent such pages to be swapped out by using mlock. Otherwise we might remove them. This patch exposes new stat for huge pages via mm_stat. Link: http://lkml.kernel.org/r/20180416090946.63057-3-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
c4d6c4cc7b |
zram: correct flag name of ZRAM_ACCESS
Patch series "zram memory tracking", v5.
zRam as swap is useful for small memory device. However, swap means
those pages on zram are mostly cold pages due to VM's LRU algorithm.
Especially, once init data for application are touched for launching,
they tend to be not accessed any more and finally swapped out. zRAM can
store such cold pages as compressed form but it's pointless to keep in
memory. As well, it's pointless to store incompressible pages to zram
so better idea is app developers manages them directly like free or
mlock rather than remaining them on heap.
This patch provides a debugfs /sys/kernel/debug/zram/zram0/block_state
to represent each block's state so admin can investigate what memory is
cold|incompressible|same page with using pagemap once the pages are
swapped out.
The output is as follows:
300 75.033841 .wh
301 63.806904 s..
302 63.806919 ..h
First column is zram's block index and 3rh one represents symbol (s:
same page w: written page to backing store h: huge page) of the block
state. Second column represents usec time unit of the block was last
accessed. So above example means the 300th block is accessed at
75.033851 second and it was huge so it was written to the backing store.
This patch (of 4):
ZRAM_ACCESS is used for locking a slot of zram so correct the name. It
is also not a common flag to indicate status of the block so move the
declare position on top of the flag. Lastly, let's move the function to
the top of source code to be able to use it easily without forward
declaration.
Link: http://lkml.kernel.org/r/20180416090946.63057-2-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
f3a53a3a1e |
mm, memcontrol: implement memory.swap.events
Add swap max and fail events so that userland can monitor and respond to running out of swap. I'm not too sure about the fail event. Right now, it's a bit confusing which stats / events are recursive and which aren't and also which ones reflect events which originate from a given cgroup and which targets the cgroup. No idea what the right long term solution is and it could just be that growing them organically is actually the only right thing to do. Link: http://lkml.kernel.org/r/20180416231151.GI1911913@devbig577.frc2.facebook.com Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Roman Gushchin <guro@fb.com> Cc: Rik van Riel <riel@surriel.com> Cc: <linux-api@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
bb98f2c5ac |
mm, memcontrol: move swap charge handling into get_swap_page()
Patch series "mm, memcontrol: Implement memory.swap.events", v2. This patchset implements memory.swap.events which contains max and fail events so that userland can monitor and respond to swap running out. This patch (of 2): get_swap_page() is always followed by mem_cgroup_try_charge_swap(). This patch moves mem_cgroup_try_charge_swap() into get_swap_page() and makes get_swap_page() call the function even after swap allocation failure. This simplifies the callers and consolidates memcg related logic and will ease adding swap related memcg events. Link: http://lkml.kernel.org/r/20180416230934.GH1911913@devbig577.frc2.facebook.com Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Roman Gushchin <guro@fb.com> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
88aa7cc688 |
mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct
mmap_sem is on the hot path of kernel, and it very contended, but it is
abused too. It is used to protect arg_start|end and evn_start|end when
reading /proc/$PID/cmdline and /proc/$PID/environ, but it doesn't make
sense since those proc files just expect to read 4 values atomically and
not related to VM, they could be set to arbitrary values by C/R.
And, the mmap_sem contention may cause unexpected issue like below:
INFO: task ps:14018 blocked for more than 120 seconds.
Tainted: G E 4.9.79-009.ali3000.alios7.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
ps D 0 14018 1 0x00000004
Call Trace:
schedule+0x36/0x80
rwsem_down_read_failed+0xf0/0x150
call_rwsem_down_read_failed+0x18/0x30
down_read+0x20/0x40
proc_pid_cmdline_read+0xd9/0x4e0
__vfs_read+0x37/0x150
vfs_read+0x96/0x130
SyS_read+0x55/0xc0
entry_SYSCALL_64_fastpath+0x1a/0xc5
Both Alexey Dobriyan and Michal Hocko suggested to use dedicated lock
for them to mitigate the abuse of mmap_sem.
So, introduce a new spinlock in mm_struct to protect the concurrent
access to arg_start|end, env_start|end and others, as well as replace
write map_sem to read to protect the race condition between prctl and
sys_brk which might break check_data_rlimit(), and makes prctl more
friendly to other VM operations.
This patch just eliminates the abuse of mmap_sem, but it can't resolve
the above hung task warning completely since the later
access_remote_vm() call needs acquire mmap_sem. The mmap_sem
scalability issue will be solved in the future.
[yang.shi@linux.alibaba.com: add comment about mmap_sem and arg_lock]
Link: http://lkml.kernel.org/r/1524077799-80690-1-git-send-email-yang.shi@linux.alibaba.com
Link: http://lkml.kernel.org/r/1523730291-109696-1-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
05fec35ebb |
slab: clean up the code comment in slab kmem_cache struct
In commit
|
||
|
|
05088e5de0 |
mm/slub: remove obsolete comment
The obsolete comment removed in this patch was introduced by
|
||
|
|
a38965bf94 |
mm/slub.c: add __printf verification to slab_err()
__printf is useful to verify format and arguments. Remove the following warning (with W=1): mm/slub.c:721:2: warning: function might be possible candidate for `gnu_printf' format attribute [-Wsuggest-attribute=format] Link: http://lkml.kernel.org/r/20180505200706.19986-1-malat@debian.org Signed-off-by: Mathieu Malaterre <malat@debian.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
128227e7fe |
slab: __GFP_ZERO is incompatible with a constructor
__GFP_ZERO requests that the object be initialised to all-zeroes, while
the purpose of a constructor is to initialise an object to a particular
pattern. We cannot do both. Add a warning to catch any users who
mistakenly pass a __GFP_ZERO flag when allocating a slab with a
constructor.
Link: http://lkml.kernel.org/r/20180412191322.GA21205@bombadil.infradead.org
Fixes:
|
||
|
|
e56ee574bc |
net/9p/trans_xen.c: don't inclide rwlock.h directly
rwlock.h should not be included directly. Instead linux/splinlock.h should be included. One thing it does is to break the RT build. Link: http://lkml.kernel.org/r/20180504100319.11880-1-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
478ae0ca08 |
fs/9p: detect invalid options as much as possible
Currently when detecting invalid options in option parsing, some options(e.g. msize) just set errno and allow to continuously validate other options so that it can detect invalid options as much as possible and give proper error messages together. This patch applies same rule to option 'cache' and 'access' when detecting -EINVAL. Link: http://lkml.kernel.org/r/1525340676-34072-2-git-send-email-cgxu519@gmx.com Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
8d856c72b4 |
net/9p: detect invalid options as much as possible
Currently when detecting invalid options in option parsing, some options(e.g. msize) just set errno and allow to continuously validate other options so that it can detect invalid options as much as possible and give proper error messages together. This patch applies same rule to option 'trans' and 'version' when detecting -EINVAL. Link: http://lkml.kernel.org/r/1525340676-34072-1-git-send-email-cgxu519@gmx.com Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
c6137fe36d |
fs: ocfs2: use new return type vm_fault_t
Use new return type vm_fault_t for fault handler. For now, this is just
documenting that the function returns a VM_FAULT value rather than an
errno. Once all instances are converted, vm_fault_t will become a
distinct type.
Ref-> commit
|
||
|
|
64202a21a4 |
ocfs2: drop a VLA in ocfs2_orphan_del()
Avoid a VLA by using a real constant expression instead of a variable. The compiler should be able to optimize the original code and avoid using an actual VLA. Anyway this change is useful because it will avoid a false positive with -Wvla, it might also help the compiler generating better code. Link: http://lkml.kernel.org/r/1520970710-19732-1-git-send-email-s.mesoraca16@gmail.com Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
f3797d8ae5 |
ocfs2: correct the comments position of struct ocfs2_dir_block_trailer
Correct the comments position of the structure ocfs2_dir_block_trailer. Link: http://lkml.kernel.org/r/71604351584F6A4EBAE558C676F37CA401071C5FDE@H3CMLB12-EX.srv.huawei-3com.com Signed-off-by: guozhonghua <guozhonghua@h3c.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
731a40fab1 |
ocfs2: eliminate a misreported warning
The warning is invalid because the parameter chunksize passed from
ocfs2_info_freefrag_scan_chain-->ocfs2_info_update_ffg is guaranteed to
be positive. So __ilog2_u32 cannot return -1.
fs/ocfs2/ioctl.c: In function 'ocfs2_info_update_ffg':
fs/ocfs2/ioctl.c:411:17: warning: array subscript is below array bounds [-Warray-bounds]
hist->fc_chunks[index]++;
^
fs/ocfs2/ioctl.c:411:17: warning: array subscript is below array bounds [-Warray-bounds]
Link: http://lkml.kernel.org/r/1524655799-12112-1-git-send-email-thunder.leizhen@huawei.com
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
133b81f28e |
ocfs2: ocfs2_inode_lock_tracker does not distinguish lock level
ocfs2_inode_lock_tracker as a variant of ocfs2_inode_lock, is used to
prevent deadlock due to recursive lock acquisition.
But this function does not distinguish whether the requested level is EX
or PR.
If a RP lock has been attained, this function will immediately return
success afterwards even an EX lock is requested.
But actually the return value does not mean that the process got a EX
lock, because ocfs2_inode_lock has not been called.
When taking lock levels into account, we face some different situations:
1. no lock is held
In this case, just lock the inode and return 0
2. We are holding a lock
For this situation, things diverges into several cases
wanted holding what to do
ex ex see 2.1 below
ex pr see 2.2 below
pr ex see 2.1 below
pr pr see 2.1 below
2.1 lock level that is been held is compatible
with the wanted level, so no lock action will be tacken.
2.2 Otherwise, an upgrade is needed, but it is forbidden.
Reason why upgrade within a process is forbidden is that lock upgrade
may cause dead lock. The following illustrate how it happens.
process 1 process 2
ocfs2_inode_lock_tracker(ex=0)
<====== ocfs2_inode_lock_tracker(ex=1)
ocfs2_inode_lock_tracker(ex=1)
For the status quo of ocfs2, without this patch, neither a bug nor
end-user impact will be caused because the wrong logic is avoided.
But I'm afraid this generic interface, may be called by other developers
in future and used in this situation.
a process
ocfs2_inode_lock_tracker(ex=0)
ocfs2_inode_lock_tracker(ex=1)
Link: http://lkml.kernel.org/r/20180510053230.17217-1-lchen@suse.com
Signed-off-by: Larry Chen <lchen@suse.com>
Reviewed-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
5bc55d654b |
ocfs2: clean up redundant function declarations
ocfs2_extend_allocation() has been deleted, clean up its declaration.
Also change the static function name from __ocfs2_extend_allocation() to
ocfs2_extend_allocation() to be consistent with the corresponding trace
events as well as comments for ocfs2_lock_allocators().
Link: http://lkml.kernel.org/r/09cf7125-6f12-e53e-20f5-e606b2c16b48@huawei.com
Fixes:
|
||
|
|
882ea1d64e |
scripts: use SPDX tag in get_maintainer and checkpatch
Add the appropriate SPDX tag to these scripts. Miscellanea: o Add my copyright to checkpatch Link: http://lkml.kernel.org/r/d08e49e8f6562c58a63792aa64306d1851f81f4b.camel@perches.com Signed-off-by: Joe Perches <joe@perches.com> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
ab77dab462 |
fs/dax.c: use new return type vm_fault_t
Use new return type vm_fault_t for fault handler. For now, this is just
documenting that the function returns a VM_FAULT value rather than an
errno. Once all instances are converted, vm_fault_t will become a
distinct type.
commit
|
||
|
|
3036bc4536 |
Merge tag 'media/v4.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab: - remove of atomisp driver from staging, as nobody would have time to dedicate huge efforts to fix all the problems there. Also, we have a feeling that the driver may not even run the way it is. - move Zoran driver to staging, in order to be either fixed to use VB2 and the proper media kAPIs or to be removed - remove videobuf-dvb driver, with is unused for a while - some V4L2 documentation fixes/improvements - new sensor drivers: imx258 and ov7251 - a new driver was added to allow using I2C transparent drivers - several improvements at the ddbridge driver - several improvements at the ISDB pt1 driver, making it more coherent with the DVB framework - added a new platform driver for MIPI CSI-2 RX: cadence - now, all media drivers can be compiled on x86 with COMPILE_TEST - almost all media drivers now build on non-x86 architectures with COMPILE_TEST - lots of other random stuff: cleanups, support for new board models, bug fixes, etc * tag 'media/v4.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (464 commits) media: omap2: fix compile-testing with FB_OMAP2=m media: media/radio/Kconfig: add back RADIO_ISA media: v4l2-ioctl.c: fix missing unlock in __video_do_ioctl() media: pxa_camera: ignore -ENOIOCTLCMD from v4l2_subdev_call for s_power media: arch: sh: migor: Fix TW9910 PDN gpio media: staging: tegra-vde: Reset VDE regardless of memory client resetting failure media: marvel-ccic: mmp: select VIDEOBUF2_VMALLOC/DMA_CONTIG media: marvel-ccic: allow ccic and mmp drivers to coexist media: uvcvideo: Prevent setting unavailable flags media: ddbridge: conditionally enable fast TS for stv0910-equipped bridges media: dvb-frontends/stv0910: make TS speed configurable media: ddbridge/mci: add identifiers to function definition arguments media: ddbridge/mci: protect against out-of-bounds array access in stop() media: rc: ensure input/lirc device can be opened after register media: rc: nuvoton: Keep device enabled during reg init media: rc: nuvoton: Keep track of users on CIR enable/disable media: rc: nuvoton: Tweak the interrupt enabling dance media: uvcvideo: Support realtek's UVC 1.5 device media: uvcvideo: Fix driver reference counting media: gspca_zc3xx: Enable short exposure times for OV7648 ... |
||
|
|
c90fca951e |
Merge tag 'powerpc-4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Notable changes:
- Support for split PMD page table lock on 64-bit Book3S (Power8/9).
- Add support for HAVE_RELIABLE_STACKTRACE, so we properly support
live patching again.
- Add support for patching barrier_nospec in copy_from_user() and
syscall entry.
- A couple of fixes for our data breakpoints on Book3S.
- A series from Nick optimising TLB/mm handling with the Radix MMU.
- Numerous small cleanups to squash sparse/gcc warnings from Mathieu
Malaterre.
- Several series optimising various parts of the 32-bit code from
Christophe Leroy.
- Removal of support for two old machines, "SBC834xE" and "C2K"
("GEFanuc,C2K"), which is why the diffstat has so many deletions.
And many other small improvements & fixes.
There's a few out-of-area changes. Some minor ftrace changes OK'ed by
Steve, and a fix to our powernv cpuidle driver. Then there's a series
touching mm, x86 and fs/proc/task_mmu.c, which cleans up some details
around pkey support. It was ack'ed/reviewed by Ingo & Dave and has
been in next for several weeks.
Thanks to: Akshay Adiga, Alastair D'Silva, Alexey Kardashevskiy, Al
Viro, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Arnd
Bergmann, Balbir Singh, Cédric Le Goater, Christophe Leroy, Christophe
Lombard, Colin Ian King, Dave Hansen, Fabio Estevam, Finn Thain,
Frederic Barrat, Gautham R. Shenoy, Haren Myneni, Hari Bathini, Ingo
Molnar, Jonathan Neuschäfer, Josh Poimboeuf, Kamalesh Babulal,
Madhavan Srinivasan, Mahesh Salgaonkar, Mark Greer, Mathieu Malaterre,
Matthew Wilcox, Michael Neuling, Michal Suchanek, Naveen N. Rao,
Nicholas Piggin, Nicolai Stange, Olof Johansson, Paul Gortmaker, Paul
Mackerras, Peter Rosin, Pridhiviraj Paidipeddi, Ram Pai, Rashmica
Gupta, Ravi Bangoria, Russell Currey, Sam Bobroff, Samuel
Mendoza-Jonas, Segher Boessenkool, Shilpasri G Bhat, Simon Guo,
Souptick Joarder, Stewart Smith, Thiago Jung Bauermann, Torsten Duwe,
Vaibhav Jain, Wei Yongjun, Wolfram Sang, Yisheng Xie, YueHaibing"
* tag 'powerpc-4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (251 commits)
powerpc/64s/radix: Fix missing ptesync in flush_cache_vmap
cpuidle: powernv: Fix promotion from snooze if next state disabled
powerpc: fix build failure by disabling attribute-alias warning in pci_32
ocxl: Fix missing unlock on error in afu_ioctl_enable_p9_wait()
powerpc-opal: fix spelling mistake "Uniterrupted" -> "Uninterrupted"
powerpc: fix spelling mistake: "Usupported" -> "Unsupported"
powerpc/pkeys: Detach execute_only key on !PROT_EXEC
powerpc/powernv: copy/paste - Mask SO bit in CR
powerpc: Remove core support for Marvell mv64x60 hostbridges
powerpc/boot: Remove core support for Marvell mv64x60 hostbridges
powerpc/boot: Remove support for Marvell mv64x60 i2c controller
powerpc/boot: Remove support for Marvell MPSC serial controller
powerpc/embedded6xx: Remove C2K board support
powerpc/lib: optimise PPC32 memcmp
powerpc/lib: optimise 32 bits __clear_user()
powerpc/time: inline arch_vtime_task_switch()
powerpc/Makefile: set -mcpu=860 flag for the 8xx
powerpc: Implement csum_ipv6_magic in assembly
powerpc/32: Optimise __csum_partial()
powerpc/lib: Adjust .balign inside string functions for PPC32
...
|
||
|
|
c0ab85267e |
Merge tag 'microblaze-v4.18-rc1' of git://git.monstr.eu/linux-2.6-microblaze
Pull microblaze updates from Michal Simek: - Fix simpleImage format generation - Remove earlyprintk support and replace it by earlycon * tag 'microblaze-v4.18-rc1' of git://git.monstr.eu/linux-2.6-microblaze: microblaze: dts: replace 'linux,stdout-path' with 'stdout-path' microblaze: remove redundant early_printk support microblaze: remove unnecessary prom.h includes microblaze: Fix simpleImage format generation |
||
|
|
d987f62cce |
Merge tag 'udf_for_v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull udf updates from Jan Kara: "UDF support for UTF-16 characters in file names" * tag 'udf_for_v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: Add support for decoding UTF-16 characters udf: Add support for encoding UTF-16 characters udf: Push sb argument to udf_name_[to|from]_CS0() udf: Convert ident strings to proper charset udf: Use UTF-32 <-> UTF-8 conversion functions from NLS udf: Always require NLS support |
||
|
|
091a0f2785 |
Merge tag 'for-linus-4.18-ofs' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull orangefs updates from Mike Marshall: "Fixes and cleanups: - fix some sparse warnings - cleanup some code formatting - fix up some attribute/meta-data related code" * tag 'for-linus-4.18-ofs' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: orangefs: use sparse annotations for holding locks across function calls. orangefs: make debug_help_fops static orangefs: remove unused function orangefs_get_bufmap_init orangefs: specify user pointers when using dev_map_desc and bufmap orangefs: formatting cleanups orangefs: set i_size on new symlink orangefs: report attributes_mask and attributes for statx orangefs: make struct orangefs_file_vm_ops static orangefs: revamp block sizes |
||
|
|
70f2ae1f00 |
Merge tag 'ovl-fixes-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs fixes from Miklos Szeredi: "This contains a fix for the vfs_mkdir() issue discovered by Al, as well as other fixes and cleanups" * tag 'ovl-fixes-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: use inode_insert5() to hash a newly created inode ovl: Pass argument to ovl_get_inode() in a structure vfs: factor out inode_insert5() ovl: clean up copy-up error paths ovl: return EIO on internal error ovl: make ovl_create_real() cope with vfs_mkdir() safely ovl: create helper ovl_create_temp() ovl: return dentry from ovl_create_real() ovl: struct cattr cleanups ovl: strip debug argument from ovl_do_ helpers ovl: remove WARN_ON() real inode attributes mismatch ovl: Kconfig documentation fixes ovl: update documentation for unionmount-testsuite |
||
|
|
da315f6e03 |
Merge tag 'fuse-update-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi: "The most interesting part of this update is user namespace support, mostly done by Eric Biederman. This enables safe unprivileged fuse mounts within a user namespace. There are also a couple of fixes for bugs found by syzbot and miscellaneous fixes and cleanups" * tag 'fuse-update-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: don't keep dead fuse_conn at fuse_fill_super(). fuse: fix control dir setup and teardown fuse: fix congested state leak on aborted connections fuse: Allow fully unprivileged mounts fuse: Ensure posix acls are translated outside of init_user_ns fuse: add writeback documentation fuse: honor AT_STATX_FORCE_SYNC fuse: honor AT_STATX_DONT_SYNC fuse: Restrict allow_other to the superblock's namespace or a descendant fuse: Support fuse filesystems outside of init_user_ns fuse: Fail all requests with invalid uids or gids fuse: Remove the buggy retranslation of pids in fuse_dev_do_read fuse: return -ECONNABORTED on /dev/fuse read after abort fuse: atomic_o_trunc should truncate pagecache |
||
|
|
1c8c5a9d38 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
1) Add Maglev hashing scheduler to IPVS, from Inju Song.
2) Lots of new TC subsystem tests from Roman Mashak.
3) Add TCP zero copy receive and fix delayed acks and autotuning with
SO_RCVLOWAT, from Eric Dumazet.
4) Add XDP_REDIRECT support to mlx5 driver, from Jesper Dangaard
Brouer.
5) Add ttl inherit support to vxlan, from Hangbin Liu.
6) Properly separate ipv6 routes into their logically independant
components. fib6_info for the routing table, and fib6_nh for sets of
nexthops, which thus can be shared. From David Ahern.
7) Add bpf_xdp_adjust_tail helper, which can be used to generate ICMP
messages from XDP programs. From Nikita V. Shirokov.
8) Lots of long overdue cleanups to the r8169 driver, from Heiner
Kallweit.
9) Add BTF ("BPF Type Format"), from Martin KaFai Lau.
10) Add traffic condition monitoring to iwlwifi, from Luca Coelho.
11) Plumb extack down into fib_rules, from Roopa Prabhu.
12) Add Flower classifier offload support to igb, from Vinicius Costa
Gomes.
13) Add UDP GSO support, from Willem de Bruijn.
14) Add documentation for eBPF helpers, from Quentin Monnet.
15) Add TLS tx offload to mlx5, from Ilya Lesokhin.
16) Allow applications to be given the number of bytes available to read
on a socket via a control message returned from recvmsg(), from
Soheil Hassas Yeganeh.
17) Add x86_32 eBPF JIT compiler, from Wang YanQing.
18) Add AF_XDP sockets, with zerocopy support infrastructure as well.
From Björn Töpel.
19) Remove indirect load support from all of the BPF JITs and handle
these operations in the verifier by translating them into native BPF
instead. From Daniel Borkmann.
20) Add GRO support to ipv6 gre tunnels, from Eran Ben Elisha.
21) Allow XDP programs to do lookups in the main kernel routing tables
for forwarding. From David Ahern.
22) Allow drivers to store hardware state into an ELF section of kernel
dump vmcore files, and use it in cxgb4. From Rahul Lakkireddy.
23) Various RACK and loss detection improvements in TCP, from Yuchung
Cheng.
24) Add TCP SACK compression, from Eric Dumazet.
25) Add User Mode Helper support and basic bpfilter infrastructure, from
Alexei Starovoitov.
26) Support ports and protocol values in RTM_GETROUTE, from Roopa
Prabhu.
27) Support bulking in ->ndo_xdp_xmit() API, from Jesper Dangaard
Brouer.
28) Add lots of forwarding selftests, from Petr Machata.
29) Add generic network device failover driver, from Sridhar Samudrala.
* ra.kernel.org:/pub/scm/linux/kernel/git/davem/net-next: (1959 commits)
strparser: Add __strp_unpause and use it in ktls.
rxrpc: Fix terminal retransmission connection ID to include the channel
net: hns3: Optimize PF CMDQ interrupt switching process
net: hns3: Fix for VF mailbox receiving unknown message
net: hns3: Fix for VF mailbox cannot receiving PF response
bnx2x: use the right constant
Revert "net: sched: cls: Fix offloading when ingress dev is vxlan"
net: dsa: b53: Fix for brcm tag issue in Cygnus SoC
enic: fix UDP rss bits
netdev-FAQ: clarify DaveM's position for stable backports
rtnetlink: validate attributes in do_setlink()
mlxsw: Add extack messages for port_{un, }split failures
netdevsim: Add extack error message for devlink reload
devlink: Add extack to reload and port_{un, }split operations
net: metrics: add proper netlink validation
ipmr: fix error path when ipmr_new_table fails
ip6mr: only set ip6mr_table from setsockopt when ip6mr_new_table succeeds
net: hns3: remove unused hclgevf_cfg_func_mta_filter
netfilter: provide udp*_lib_lookup for nf_tproxy
qed*: Utilize FW 8.37.2.0
...
|
||
|
|
2857676045 |
Merge tag 'overflow-v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull overflow updates from Kees Cook:
"This adds the new overflow checking helpers and adds them to the
2-factor argument allocators. And this adds the saturating size
helpers and does a treewide replacement for the struct_size() usage.
Additionally this adds the overflow testing modules to make sure
everything works.
I'm still working on the treewide replacements for allocators with
"simple" multiplied arguments:
*alloc(a * b, ...) -> *alloc_array(a, b, ...)
and
*zalloc(a * b, ...) -> *calloc(a, b, ...)
as well as the more complex cases, but that's separable from this
portion of the series. I expect to have the rest sent before -rc1
closes; there are a lot of messy cases to clean up.
Summary:
- Introduce arithmetic overflow test helper functions (Rasmus)
- Use overflow helpers in 2-factor allocators (Kees, Rasmus)
- Introduce overflow test module (Rasmus, Kees)
- Introduce saturating size helper functions (Matthew, Kees)
- Treewide use of struct_size() for allocators (Kees)"
* tag 'overflow-v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
treewide: Use struct_size() for devm_kmalloc() and friends
treewide: Use struct_size() for vmalloc()-family
treewide: Use struct_size() for kmalloc()-family
device: Use overflow helpers for devm_kmalloc()
mm: Use overflow helpers in kvmalloc()
mm: Use overflow helpers in kmalloc_array*()
test_overflow: Add memory allocation overflow tests
overflow.h: Add allocation size calculation helpers
test_overflow: Report test failures
test_overflow: macrofy some more, do more tests for free
lib: add runtime test of check_*_overflow functions
compiler.h: enable builtin overflow checkers and add fallback code
|
||
|
|
5eb6eed7e0 |
Merge tag 'trace-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"One new feature was added to ftrace, which is the trace_marker now
supports triggers. For example:
# cd /sys/kernel/debug/tracing
# echo 'snapshot' > events/ftrace/print/trigger
# echo 'cause snapshot' > trace_marker
The rest of the changes are various clean ups and also one stable fix
that was added late in the cycle"
* tag 'trace-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (21 commits)
tracing: Use match_string() instead of open coding it in trace_set_options()
branch-check: fix long->int truncation when profiling branches
ring-buffer: Fix typo in comment
ring-buffer: Fix a bunch of typos in comments
tracing/selftest: Add test to test simple snapshot trigger for trace_marker
tracing/selftest: Add test to test hist trigger between kernel event and trace_marker
tracing/selftest: Add selftests to test trace_marker histogram triggers
ftrace/selftest: Fix reset_trigger() to handle triggers with filters
ftrace/selftest: Have the reset_trigger code be a bit more careful
tracing: Document trace_marker triggers
tracing: Allow histogram triggers to access ftrace internal events
tracing: Prevent further users of zero size static arrays in trace events
tracing: Have zero size length in filter logic be full string
tracing: Add trigger file for trace_markers tracefs/ftrace/print
tracing: Do not show filter file for ftrace internal events
tracing: Add brackets in ftrace event dynamic arrays
tracing: Have event_trace_init() called by trace_init_tracefs()
tracing: Add __find_event_file() to find event files without restrictions
tracing: Do not reference event data in post call triggers
tracepoints: Fix the descriptions of tracepoint_probe_register{_prio}
...
|
||
|
|
8b5c6a3a49 |
Merge tag 'audit-pr-20180605' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore: "Another reasonable chunk of audit changes for v4.18, thirteen patches in total. The thirteen patches can mostly be broken down into one of four categories: general bug fixes, accessor functions for audit state stored in the task_struct, negative filter matches on executable names, and extending the (relatively) new seccomp logging knobs to the audit subsystem. The main driver for the accessor functions from Richard are the changes we're working on to associate audit events with containers, but I think they have some standalone value too so I figured it would be good to get them in now. The seccomp/audit patches from Tyler apply the seccomp logging improvements from a few releases ago to audit's seccomp logging; starting with this patchset the changes in /proc/sys/kernel/seccomp/actions_logged should apply to both the standard kernel logging and audit. As usual, everything passes the audit-testsuite and it happens to merge cleanly with your tree" [ Heh, except it had trivial merge conflicts with the SELinux tree that also came in from Paul - Linus ] * tag 'audit-pr-20180605' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: Fix wrong task in comparison of session ID audit: use existing session info function audit: normalize loginuid read access audit: use new audit_context access funciton for seccomp_actions_logged audit: use inline function to set audit context audit: use inline function to get audit context audit: convert sessionid unset to a macro seccomp: Don't special case audited processes when logging seccomp: Audit attempts to modify the actions_logged sysctl seccomp: Configurable separator for the actions_logged string seccomp: Separate read and write code for actions_logged sysctl audit: allow not equal op for audit by executable audit: add syscall information to FEATURE_CHANGE records |
||
|
|
8b70543e9a |
Merge tag 'selinux-pr-20180605' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull SELinux updates from Paul Moore: "SELinux is back with a quiet pull request for v4.18. Three patches, all small: two cleanups of the SELinux audit records, and one to migrate to a newly defined type (vm_fault_t). Everything passes our test suite, and as of about five minutes ago it merged cleanly with your tree" * tag 'selinux-pr-20180605' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: audit: normalize MAC_POLICY_LOAD record audit: normalize MAC_STATUS record security: selinux: Change return type to vm_fault_t |
||
|
|
10b1eb7d8c |
Merge branch 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security system updates from James Morris:
- incorporate new socketpair() hook into LSM and wire up the SELinux
and Smack modules. From David Herrmann:
"The idea is to allow SO_PEERSEC to be called on AF_UNIX sockets
created via socketpair(2), and return the same information as if
you emulated socketpair(2) via a temporary listener socket.
Right now SO_PEERSEC will return the unlabeled credentials for a
socketpair, rather than the actual credentials of the creating
process."
- remove the unused security_settime LSM hook (Sargun Dhillon).
- remove some stack allocated arrays from the keys code (Tycho
Andersen)
* 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
dh key: get rid of stack allocated array for zeroes
dh key: get rid of stack allocated array
big key: get rid of stack array allocation
smack: provide socketpair callback
selinux: provide socketpair callback
net: hook socketpair() into LSM
security: add hook for socketpair()
security: remove security_settime
|
||
|
|
d75ae5bdf2 |
Merge tag 'printk-for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk
Pull printk updates from Petr Mladek: - Help userspace log daemons to catch up with a flood of messages. They will get woken after each message even if the console is far behind and handled by another process. - Flush printk safe buffers safely even when panic() happens in the normal context. - Fix possible va_list reuse when race happened in printk_safe(). - Remove %pCr printf format to prevent sleeping in the atomic context. - Misc vsprintf code cleanup. * tag 'printk-for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: printk: drop in_nmi check from printk_safe_flush_on_panic() lib/vsprintf: Remove atomic-unsafe support for %pCr serial: sh-sci: Stop using printk format %pCr thermal: bcm2835: Stop using printk format %pCr clk: renesas: cpg-mssr: Stop using printk format %pCr printk: fix possible reuse of va_list variable printk: wake up klogd in vprintk_emit vsprintf: Tweak pF/pf comment lib/vsprintf: Mark expected switch fall-through lib/vsprintf: Replace space with '_' before crng is ready lib/vsprintf: Deduplicate pointer_string() lib/vsprintf: Move pointer_string() upper lib/vsprintf: Make flag_spec global lib/vsprintf: Make strspec global lib/vsprintf: Make dec_spec global lib/test_printf: Mark big constant with UL |
||
|
|
0eb0061381 |
Merge tag 'for-linus-4.18' of git://github.com/cminyard/linux-ipmi
Pull IPMI updates from Corey Minyard:
"It's been a busy release for the IPMI driver. Some notable changes:
- A user was running into timeout issues doing maintenance commands
over the IPMB network behind an IPMI controller.
Extend the maintenance mode concept to messages over IPMB and allow
the timeouts to be tuned.
- Lots of cleanup, style fixing, some bugfixes, and such.
- At least one user was having trouble with the way the IPMI driver
would lock the i2c driver module it used.
The IPMI driver was not designed for hotplug. However, hotplug is a
reality now, so the IPMI driver was modified to support hotplug.
- The proc interface code is now completely removed. Long live sysfs!"
* tag 'for-linus-4.18' of git://github.com/cminyard/linux-ipmi: (35 commits)
ipmi: Properly release srcu locks on error conditions
ipmi: NPCM7xx KCS BMC: enable interrupt to the host
ipmi:bt: Set the timeout before doing a capabilities check
ipmi: Remove the proc interface
ipmi_ssif: Fix uninitialized variable issue
ipmi: add an NPCM7xx KCS BMC driver
ipmi_si: Clean up shutdown a bit
ipmi_si: Rename intf_num to si_num
ipmi: Remove smi->intf checks
ipmi_ssif: Get rid of unused intf_num
ipmi: Get rid of ipmi_user_t and ipmi_smi_t in include files
ipmi: ipmi_unregister_smi() cannot fail, have it return void
ipmi_devintf: Add an error return on invalid ioctls
ipmi: Remove usecount function from interfaces
ipmi_ssif: Remove usecount handling
ipmi: Remove condition on interface shutdown
ipmi_ssif: Convert over to a shutdown handler
ipmi_si: Convert over to a shutdown handler
ipmi: Rework locking and shutdown for hot remove
ipmi: Fix some counter issues
...
|
||
|
|
8450493076 |
Merge tag 'edac_for_4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC updates from Borislav Petkov: - Stratix10 SDRAM support to altera_edac (Thor Thayer) - the usual misc fixes all over the place [ Also, shared branch for socfpga_stratix10.dtsi file changes with the socfpga tree ] * tag 'edac_for_4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: EDAC, ghes: Make platform-based whitelisting x86-only EDAC, altera: Fix ARM64 build warning EDAC, skx: Fix skx_edac build error when ACPI_NFIT=m EDAC, ghes: Use BIT() macro EDAC, ghes: Add DDR4 and NVDIMM memory types EDAC, altera: Handle SDRAM Uncorrectable Errors on Stratix10 Documentation: dt: edac: Move Altera SOCFPGA EDAC file EDAC, altera: Add support for Stratix10 SDRAM EDAC Documentation: dt: socfpga: Add Stratix10 ECC Manager binding EDAC, ghes: Remove unused argument to ghes_edac_report_mem_error() arm64: dts: stratix10: add sdram ecc EDAC, i7core: Fix spelling mistake: "redundacy" -> "redundancy" EDAC, ghes: Add a null pointer check in ghes_edac_unregister() ghes, EDAC: Fix ghes_edac registration arm64: dts: stratix10: Change pad skew values for EMAC0 PHY driver ARM: dts: consistently use 'atmel' as at24 manufacturer in cyclone5 arm64: dts: stratix10: Add PL330 DMAC to Stratix10 dts arm64: dts: stratix10: enable i2c, add i2c periperals arm64: dts: stratix10: use clock bindings for the Stratix10 platform |
||
|
|
311da49758 |
Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM updates from Russell King: - Initial round of Spectre variant 1 and variant 2 fixes for 32-bit ARM - Clang support improvements - nommu updates for v8 MPU - enable ARM_MODULE_PLTS by default to avoid problems loading modules with larger kernels - vmlinux.lds and dma-mapping cleanups * 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: (31 commits) ARM: spectre-v1: fix syscall entry ARM: spectre-v1: add array_index_mask_nospec() implementation ARM: spectre-v1: add speculation barrier (csdb) macros ARM: KVM: report support for SMCCC_ARCH_WORKAROUND_1 ARM: KVM: Add SMCCC_ARCH_WORKAROUND_1 fast handling ARM: spectre-v2: KVM: invalidate icache on guest exit for Brahma B15 ARM: KVM: invalidate icache on guest exit for Cortex-A15 ARM: KVM: invalidate BTB on guest exit for Cortex-A12/A17 ARM: spectre-v2: warn about incorrect context switching functions ARM: spectre-v2: add firmware based hardening ARM: spectre-v2: harden user aborts in kernel space ARM: spectre-v2: add Cortex A8 and A15 validation of the IBE bit ARM: spectre-v2: harden branch predictor on context switches ARM: spectre: add Kconfig symbol for CPUs vulnerable to Spectre ARM: bugs: add support for per-processor bug checking ARM: bugs: hook processor bug checking into SMP and suspend paths ARM: bugs: prepare processor bug infrastructure ARM: add more CPU part numbers for Cortex and Brahma B15 CPUs ARM: 8774/1: remove no-op macro VMLINUX_SYMBOL() ARM: 8773/1: amba: Export amba_bustype ... |
||
|
|
ca95bf62fc |
Merge tag 'linux-kselftest-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest update from Shuah Khan: - Work to restructure timers test suite to move PIE out of rtctest from Alexandre Belloni. - Several minor spelling and bug fixes. - New cgroup tests from Roman Gushchin and Mike Rapoport. - Kselftest framework changes to handle and report skipped tests correctly. Prior to these changes, framework treated all non-zero return codes from tests as failures. When tests are skipped with non-zero return code, due to unmet dependencies and/or unsupported configuration, reporting them as failed lead to false negatives on the tests that couldn't be run. - Fixes to test Makefiles to remove unnecessary RUN_TESTS and EMIT_TESTS overrides and use common defines from lib.mk. - Fixes to several tests to return correct Kselftest skip code. - Changes to improve test output. * tag 'linux-kselftest-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (55 commits) selftests: lib: fix prime_numbers module search and skip logic selftests: intel_pstate: notification about privilege required to run intel_pstate testing script selftests: cgroup/memcontrol: add basic test for socket accounting selftest: intel_pstate: debug support message from aperf.c and return value kselftest/cgroup: fix variable dereferenced before check warning selftests/intel_pstate: Enhance table printing selftests/intel_pstate: Improve test, minor fixes selftests: cgroup/memcontrol: add basic test for swap controls selftests: cgroup: add memory controller self-tests selftests: memfd: split regular and hugetlbfs tests selftests: net: return Kselftest Skip code for skipped tests selftests: mqueue: return Kselftest Skip code for skipped tests selftests: memory-hotplug: return Kselftest Skip code for skipped tests selftests: memfd: return Kselftest Skip code for skipped tests selftests: membarrier: return Kselftest Skip code for skipped tests selftests: media_tests: return Kselftest Skip code for skipped tests selftests: locking: return Kselftest Skip code for skipped tests selftests: locking: add Makefile for locking test selftests: lib: return Kselftest Skip code for skipped tests selftests: lib: add prime_numbers.sh test to Makefile ... |
||
|
|
48a8bbc7ca |
media: omap2: fix compile-testing with FB_OMAP2=m
Compile-testing with FB_OMAP2=m results in a link error:
drivers/media/platform/omap/omap_vout.o: In function `vidioc_streamoff':
omap_vout.c:(.text+0x1028): undefined reference to `omap_dispc_unregister_isr'
drivers/media/platform/omap/omap_vout.o: In function `omap_vout_release':
omap_vout.c:(.text+0x1330): undefined reference to `omap_dispc_unregister_isr'
drivers/media/platform/omap/omap_vout.o: In function `vidioc_streamon':
omap_vout.c:(.text+0x2dd4): undefined reference to `omap_dispc_register_isr'
drivers/media/platform/omap/omap_vout.o: In function `omap_vout_remove':
In order to enable compile-testing but still keep the correct dependency,
this changes the Kconfig logic so we only allow CONFIG_COMPILE_TEST
building when FB_OMAP is completely disabled, or have use the old
dependency on FB_OMAP to ensure VIDEO_OMAP2_VOUT is also a loadable
module when FB_OMAP2 is.
Fixes:
|