twx-linux

Author	SHA1	Message	Date
Sebastian Siewior	11e11d0a7d	printk: replace local_irq_save with local_lock for safe mode Safe mode disables interrupts in order to minimize the window where printk calls use deferred printing. Currently local_irq_save() is used for this, however on PREEMPT_RT this can lead to large latencies because safe mode is enabled for the duration of printing a record. Use a local_lock instead of local_irq_save(). For !PREEMPT_RT it has the same affect of disabling interrupts for that CPU. For PREEMPT_RT it will disable preemption, which is enough to prevent interruption from the irq threads. Note that disabling preemption for PREEMPT_RT is also very bad since it is still blocking RT tasks. The atomic/threaded (NOBKL) consoles were developed such that safe mode is not needed. So it is expected that a PREEMPT_RT machine does not run with any legacy consoles registered. Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:31 +02:00
John Ogness	07173eb6b3	printk: Add threaded printing support for BKL consoles. Add threaded printing support for BKL consoles on PREEMPT_RT. Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:31 +02:00
John Ogness	5cce771b3a	printk: only disable if actually unregistered Currently in unregister_console() a printk message is generated and the console is disabled, even it was never registered. There are code paths (such as uart_remove_one_port()) that call unregister_console() even if the console is not registered. It is confusing to see messages about consoles being disabled that were never disabled. Move the printk and disabling later, when it is known that the console is actually registered. Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:31 +02:00
John Ogness	587e5b5882	printk: Perform atomic flush in console_flush_on_panic() Typically the panic() function will take care of atomic flushing the non-BKL consoles on panic. However, there are several users of console_flush_on_panic() outside of panic(). Also perform atomic flushing in console_flush_on_panic(). A new function cons_force_seq() is implemented to support the mode=CONSOLE_REPLAY_ALL feature. Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:31 +02:00
John Ogness	37552a17fa	rcu: Add atomic write enforcement for rcu stalls Invoke the atomic write enforcement functions for rcu stalls to ensure that the information gets out to the consoles. It is important to note that if there are any legacy consoles registered, they will be attempting to directly print from the printk-caller context, which may jeopardize the reliability of the atomic consoles. Optimally there should be no legacy consoles registered. Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:30 +02:00
Thomas Gleixner	b793716044	kernel/panic: Add atomic write enforcement to warn/panic Invoke the atomic write enforcement functions for warn/panic to ensure that the information gets out to the consoles. For the panic case, add explicit intermediate atomic flush calls to ensure immediate flushing at important points. Otherwise the atomic flushing only occurs when dropping out of the elevated priority, which for panic may never happen. It is important to note that if there are any legacy consoles registered, they will be attempting to directly print from the printk-caller context, which may jeopardize the reliability of the atomic consoles. Optimally there should be no legacy consoles registered. Co-developed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:30 +02:00
John Ogness	078e0c5cf0	proc: consoles: Add support for non-BKL consoles Show 'W' if a non-BKL console implements write_thread() or write_atomic(). Add a new flag 'N' to show if it is a non-BKL console. Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:30 +02:00
John Ogness	52f01b5cb0	tty: tty_io: Show non-BKL consoles as active /sys/class/tty/console/active shows the consoles that are currently registered and enabled and are able to print (i.e. have implemented a write() callback). This is used by userspace programs such as systemd's systemd-getty-generator to determine where consoles are in order to automatically start a getty instance. The non-BKL consoles do not implement write() but also should be shown as an active console. Expand the conditions to also check if the callbacks write_thread() or write_atomic() are implemented for non-BKL consoles. Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:30 +02:00
John Ogness	cf3b42a10d	printk: nobkl: Stop threads on shutdown/reboot Register a syscore_ops shutdown function to stop all threaded printers on shutdown/reboot. This allows printk to transition back to atomic printing in order to provide a robust mechanism for outputting the final messages. Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:30 +02:00
Thomas Gleixner	ab8a7d8f04	printk: nobkl: Provide functions for atomic write enforcement Threaded printk is the preferred mechanism to tame the noisyness of printk, but WARN/OOPS/PANIC require printing out immediately since the printer threads might not be able to run. Add per CPU state to denote the priority/urgency of the output and provide functions to flush the printk backlog for priority elevated contexts and when the printing threads are not available (such as early boot). Note that when a CPU is in a priority elevated state, flushing only occurs when dropping back to a lower priority. This allows the full set of printk records (WARN/OOPS/PANIC output) to be stored in the ringbuffer before beginning to flush the backlog. Co-developed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:30 +02:00
Thomas Gleixner	6f124e0c58	printk: nobkl: Add write context storage for atomic writes The number of consoles is unknown at compile time and allocating write contexts on stack in emergency/panic situations is not desired either. Allocate a write context array (one for each priority level) along with the per CPU output buffers, thus allowing atomic contexts on multiple CPUs and priority levels to execute simultaneously without clobbering each other's write context. Co-developed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:29 +02:00
Thomas Gleixner	f99552d8e4	printk: nobkl: Add printer thread wakeups Add a function to wakeup the printer threads. Use the new function when: - records are added to the printk ringbuffer - consoles are started - consoles are resumed The actual waking is performed via irq_work so that the wakeup can be triggered from any context. Co-developed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:29 +02:00
Thomas Gleixner	3f58ef3573	printk: nobkl: Introduce printer threads Add the infrastructure to create a printer thread per console along with the required thread function, which is takeover/handover aware. Co-developed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:29 +02:00
Thomas Gleixner	fdf2dc1e54	printk: nobkl: Add emit function and callback functions for atomic printing Implement an emit function for non-BKL consoles to output printk messages. It utilizes the lockless printk_get_next_message() and console_prepend_dropped() functions to retrieve/build the output message. The emit function includes the required safety points to check for handover/takeover and calls a new write_atomic callback of the console driver to output the message. It also includes proper handling for updating the non-BKL console sequence number. Co-developed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:29 +02:00
Thomas Gleixner	d803bcd90a	printk: nobkl: Add print state functions Provide three functions which are related to the safe handover mechanism and allow console drivers to denote takeover unsafe sections: - console_can_proceed() Invoked by a console driver to check whether a handover request is pending or whether the console was taken over in a hostile fashion. - console_enter/exit_unsafe() Invoked by a console driver to denote that the driver output function is about to enter or to leave an critical region where a hostile take over is unsafe. These functions are also cancellation points. The unsafe state is stored in the console state and allows a takeover attempt to make informed decisions whether to take over and/or output on such a console at all. The unsafe state is also available to the driver in the write context for the atomic_write() output function so the driver can make informed decisions about the required actions or take a special emergency path. Co-developed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:29 +02:00
Thomas Gleixner	54388ba872	printk: nobkl: Add sequence handling On 64bit systems the sequence tracking is embedded into the atomic console state, on 32bit it has to be stored in a separate atomic member. The latter needs to handle the non-atomicity in hostile takeover cases, while 64bit can completely rely on the state atomicity. The ringbuffer sequence number is 64bit, but having a 32bit representation in the console is sufficient. If a console ever gets more than 2^31 records behind the ringbuffer then this is the least of the problems. On acquire() the atomic 32bit sequence number is expanded to 64 bit by folding the ringbuffer's sequence into it carefully. Co-developed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:28 +02:00
Thomas Gleixner	377c6aa303	printk: nobkl: Add buffer management In case of hostile takeovers it must be ensured that the previous owner cannot scribble over the output buffer of the emergency/panic context. This is achieved by: - Adding a global output buffer instance for early boot (pre per CPU data being available). - Allocating an output buffer per console for threaded printers once printer threads become available. - Allocating per CPU output buffers per console for printing from all contexts not covered by the other buffers. - Choosing the appropriate buffer is handled in the acquire/release functions. The output buffer is wrapped into a separate data structure so other context related fields can be added in later steps. Co-developed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:28 +02:00
Thomas Gleixner	d387cf62ad	printk: nobkl: Add acquire/release logic Add per console acquire/release functionality. The console 'locked' state is a combination of several state fields: - The 'locked' bit - The 'cpu' field that denotes on which CPU the console is locked - The 'cur_prio' field that contains the severity of the printk context that owns the console. This field is used for decisions whether to attempt friendly handovers and also prevents takeovers from a less severe context, e.g. to protect the panic CPU. The acquire mechanism comes with several flavours: - Straight forward acquire when the console is not contended - Friendly handover mechanism based on a request/grant handshake The requesting context: 1) Puts the desired handover state (CPU nr, prio) into a separate handover state 2) Sets the 'req_prio' field in the real console state 3) Waits (with a timeout) for the owning context to handover The owning context: 1) Observes the 'req_prio' field set 2) Hands the console over to the requesting context by switching the console state to the handover state that was provided by the requester - Hostile takeover The new owner takes the console over without handshake This is required when friendly handovers are not possible, i.e. the higher priority context interrupted the owning context on the same CPU or the owning context is not able to make progress on a remote CPU. The release is the counterpart which either releases the console directly or hands it gracefully over to a requester. All operations on console::atomic_state[CUR\|REQ] are atomic cmpxchg based to handle concurrency. The acquire/release functions implement only minimal policies: - Preference for higher priority contexts - Protection of the panic CPU All other policy decisions have to be made at the call sites. The design allows to implement the well known: acquire() output_one_line() release() algorithm, but also allows to avoid the per line acquire/release for e.g. panic situations by doing the acquire once and then relying on the panic CPU protection for the rest. Co-developed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:28 +02:00
Thomas Gleixner	fe2ba02c78	printk: Add non-BKL console basic infrastructure The current console/printk subsystem is protected by a Big Kernel Lock, (aka console_lock) which has ill defined semantics and is more or less stateless. This puts severe limitations on the console subsystem and makes forced takeover and output in emergency and panic situations a fragile endavour which is based on try and pray. The goal of non-BKL consoles is to break out of the console lock jail and to provide a new infrastructure that avoids the pitfalls and allows console drivers to be gradually converted over. The proposed infrastructure aims for the following properties: - Per console locking instead of global locking - Per console state which allows to make informed decisions - Stateful handover and takeover As a first step state is added to struct console. The per console state is an atomic_long_t with a 32bit bit field and on 64bit also a 32bit sequence for tracking the last printed ringbuffer sequence number. On 32bit the sequence is separate from state for obvious reasons which requires handling a few extra race conditions. Reserve state bits, which will be populated later in the series. Wire it up into the console register/unregister functionality and exclude such consoles from being handled in the console BKL mechanisms. Since the non-BKL consoles will not depend on the console lock/unlock dance for printing, only perform said dance if a BKL console is registered. The decision to use a bitfield was made as using a plain u32 with mask/shift operations turned out to result in uncomprehensible code. Co-developed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:28 +02:00
Sebastian Andrzej Siewior	b9ca69dd0f	time: Allow to preempt after a callback. The TIMER_SOFTIRQ handler invokes timer callbacks of the expired timers. Before each invocation the timer_base::lock is dropped. The only lock that is still held is the timer_base::expiry_lock and the per-CPU bh-lock as part of local_bh_disable(). The former is released as part of lock up prevention if the timer is preempted by the caller which is waiting for its completion. Both locks are already released as part of timer_sync_wait_running(). This can be extended by also releasing in bh-lock. The timer core does not rely on any state that is serialized by the bh-lock. The timer callback expects the bh-state to be serialized by the lock but there is no need to keep state synchronized while invoking multiple callbacks. Preempt handling softirqs and release all locks after a timer invocation if the current has inherited priority. Link: https://lore.kernel.org/r/20230804113039.419794-4-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:28 +02:00
Sebastian Andrzej Siewior	b2d0254952	softirq: Add function to preempt serving softirqs. Add a functionality for the softirq handler to preempt its current work if needed. The softirq core has no particular state. It reads and resets the pending softirq bits and then processes one after the other. It can already be preempted while it invokes a certain softirq handler. By enabling the BH the softirq core releases the per-CPU bh lock which serializes all softirq handler. It is safe to do as long as the code does not expect any serialisation in between. A typical scenarion would after the invocation of callback where no state needs to be preserved before the next callback is invoked. Add functionaliry to preempt the serving softirqs. Link: https://lore.kernel.org/r/20230804113039.419794-3-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:28 +02:00
Sebastian Andrzej Siewior	c6d3d3b3fe	sched/core: Provide a method to check if a task is PI-boosted. Provide a method to check if a task inherited the priority from another task. This happens if a task owns a lock which is requested by a task with higher priority. This can be used as a hint to add a preemption point to the critical section. Provide a function which reports true if the task is PI-boosted. Link: https://lore.kernel.org/r/20230804113039.419794-2-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:27 +02:00
Sebastian Andrzej Siewior	c15abad8f7	preempt: Put preempt_enable() within an instrumentation*() section. Callers of preempt_enable() can be within an noinstr section leading to: \| vmlinux.o: warning: objtool: native_sched_clock+0x97: call to preempt_schedule_notrace_thunk() leaves .noinstr.text section \| vmlinux.o: warning: objtool: kvm_clock_read+0x22: call to preempt_schedule_notrace_thunk() leaves .noinstr.text section \| vmlinux.o: warning: objtool: local_clock+0xb4: call to preempt_schedule_notrace_thunk() leaves .noinstr.text section \| vmlinux.o: warning: objtool: enter_from_user_mode+0xea: call to preempt_schedule_thunk() leaves .noinstr.text section \| vmlinux.o: warning: objtool: syscall_enter_from_user_mode+0x140: call to preempt_schedule_thunk() leaves .noinstr.text section \| vmlinux.o: warning: objtool: syscall_enter_from_user_mode_prepare+0xf2: call to preempt_schedule_thunk() leaves .noinstr.text section \| vmlinux.o: warning: objtool: irqentry_enter_from_user_mode+0xea: call to preempt_schedule_thunk() leaves .noinstr.text section Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20230309072724.3F6zRkvw@linutronix.de	2023-09-13 09:10:27 +02:00
Mike Galbraith	047a6b7f0e	zram: Replace bit spinlocks with spinlock_t for PREEMPT_RT. The bit spinlock disables preemption. The spinlock_t lock becomes a sleeping lock on PREEMPT_RT and it can not be acquired in this context. In this locked section, zs_free() acquires a zs_pool::lock, and there is access to zram::wb_limit_lock. Use a spinlock_t on PREEMPT_RT for locking and set/ clear ZRAM_LOCK bit after the lock has been acquired/ dropped. Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lkml.kernel.org/r/YqIbMuHCPiQk+Ac2@linutronix.de Link: https://lore.kernel.org/20230323161830.jFbWCosd@linutronix.de	2023-09-13 09:10:27 +02:00
Junxiao Chang	7119d7c8c7	softirq: Wake ktimers thread also in softirq. If the hrtimer is raised while a softirq is processed then it does not wake the corresponding ktimers thread. This is due to the optimisation in the irq-exit path which is also used to wake the ktimers thread. For the other softirqs, this is okay because the additional softirq bits will be handled by the currently running softirq handler. The timer related softirq bits are added to a different variable and rely on the ktimers thread. As a consuequence the wake up of ktimersd is delayed until the next timer tick. Always wake the ktimers thread if a timer related softirq is pending. Reported-by: Peh, Hock Zhang <hock.zhang.peh@intel.com> Signed-off-by: Junxiao Chang <junxiao.chang@intel.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:27 +02:00
Frederic Weisbecker	2f6cac08f4	tick: Fix timer storm since introduction of timersd If timers are pending while the tick is reprogrammed on nohz_mode, the next expiry is not armed to fire now, it is delayed one jiffy forward instead so as not to raise an inextinguishable timer storm with such scenario: 1) IRQ triggers and queue a timer 2) ksoftirqd() is woken up 3) IRQ tail: timer is reprogrammed to fire now 4) IRQ exit 5) TIMER interrupt 6) goto 3) ...all that until we finally reach ksoftirqd. Unfortunately we are checking the wrong softirq vector bitmask since timersd kthread has split from ksoftirqd. Timers now have their own vector state field that must be checked separately. As a result, the old timer storm is back. This shows up early on boot with extremely long initcalls: [ 333.004807] initcall dquot_init+0x0/0x111 returned 0 after 323822879 usecs and the cause is uncovered with the right trace events showing just 10 microseconds between ticks (~100 000 Hz): \|swapper/-1 1dn.h111 60818582us : hrtimer_expire_entry: hrtimer=00000000e0ef0f6b function=tick_sched_timer now=60415486608 \|swapper/-1 1dn.h111 60818592us : hrtimer_expire_entry: hrtimer=00000000e0ef0f6b function=tick_sched_timer now=60415496082 \|swapper/-1 1dn.h111 60818601us : hrtimer_expire_entry: hrtimer=00000000e0ef0f6b function=tick_sched_timer now=60415505550 Fix this by checking the right timer vector state from the nohz code. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lkml.kernel.org/r/20220405010752.1347437-2-frederic@kernel.org	2023-09-13 09:10:27 +02:00
Frederic Weisbecker	0b3cadf084	rcutorture: Also force sched priority to timersd on boosting test. ksoftirqd is statically boosted to the priority level right above the one of rcu_torture_boost() so that timers, which torture readers rely on, get a chance to run while rcu_torture_boost() is polling. However timers processing got split from ksoftirqd into their own kthread (timersd) that isn't boosted. It has the same SCHED_FIFO low prio as rcu_torture_boost() and therefore timers can't preempt it and may starve. The issue can be triggered in practice on v5.17.1-rt17 using: ./kvm.sh --allcpus --configs TREE04 --duration 10m --kconfig "CONFIG_EXPERT=y CONFIG_PREEMPT_RT=y" Fix this with statically boosting timersd just like is done with ksoftirqd in commit `ea6d962e80` ("rcutorture: Judge RCU priority boosting on grace periods, not callbacks") Suggested-by: Mel Gorman <mgorman@suse.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20220405010752.1347437-1-frederic@kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:27 +02:00
Sebastian Andrzej Siewior	7f49c1dd9a	softirq: Use a dedicated thread for timer wakeups. A timer/hrtimer softirq is raised in-IRQ context. With threaded interrupts enabled or on PREEMPT_RT this leads to waking the ksoftirqd for the processing of the softirq. Once the ksoftirqd is marked as pending (or is running) it will collect all raised softirqs. This in turn means that a softirq which would have been processed at the end of the threaded interrupt, which runs at an elevated priority, is now moved to ksoftirqd which runs at SCHED_OTHER priority and competes with every regular task for CPU resources. This introduces long delays on heavy loaded systems and is not desired especially if the system is not overloaded by the softirqs. Split the TIMER_SOFTIRQ and HRTIMER_SOFTIRQ processing into a dedicated timers thread and let it run at the lowest SCHED_FIFO priority. RT tasks are are woken up from hardirq context so only timer_list timers and hrtimers for "regular" tasks are processed here. The higher priority ensures that wakeups are performed before scheduling SCHED_OTHER tasks. Using a dedicated variable to store the pending softirq bits values ensure that the timer are not accidentally picked up by ksoftirqd and other threaded interrupts. It shouldn't be picked up by ksoftirqd since it runs at lower priority. However if the timer bits are ORed while a threaded interrupt is running, then the timer softirq would be performed at higher priority. The new timer thread will block on the softirq lock before it starts softirq work. This "race window" isn't closed because while timer thread is performing the softirq it can get PI-boosted via the softirq lock by a random force-threaded thread. The timer thread can pick up pending softirqs from ksoftirqd but only if the softirq load is high. It is not be desired that the picked up softirqs are processed at SCHED_FIFO priority under high softirq load but this can already happen by a PI-boost by a force-threaded interrupt. Reported-by: kernel test robot <lkp@intel.com> [ static timer_threads ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:26 +02:00
Sebastian Andrzej Siewior	baf33250e0	sched/rt: Don't try push tasks if there are none. I have a RT task X at a high priority and cyclictest on each CPU with lower priority than X's. If X is active and each CPU wakes their own cylictest thread then it ends in a longer rto_push storm. A random CPU determines via balance_rt() that the CPU on which X is running needs to push tasks. X has the highest priority, cyclictest is next in line so there is nothing that can be done since the task with the higher priority is not touched. tell_cpu_to_push() increments rto_loop_next and schedules rto_push_irq_work_func() on X's CPU. The other CPUs also increment the loop counter and do the same. Once rto_push_irq_work_func() is active it does nothing because it has _no_ pushable tasks on its runqueue. Then checks rto_next_cpu() and decides to queue irq_work on the local CPU because another CPU requested a push by incrementing the counter. I have traces where ~30 CPUs request this ~3 times each before it finally ends. This greatly increases X's runtime while X isn't making much progress. Teach rto_next_cpu() to only return CPUs which also have tasks on their runqueue which can be pushed away. This does not reduce the tell_cpu_to_push() invocations (rto_loop_next counter increments) but reduces the amount of issued rto_push_irq_work_func() if nothing can be done. As the result the overloaded CPU is blocked less often. There are still cases where the "same job" is repeated several times (for instance the current CPU needs to resched but didn't yet because the irq-work is repeated a few times and so the old task remains on the CPU) but the majority of request end in tell_cpu_to_push() before an IPI is issued. Reviewed-by: "Steven Rostedt (Google)" <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20230801152648._y603AS_@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:26 +02:00
Sebastian Andrzej Siewior	55a38d0ab4	x86: Enable RT also on 32bit Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2023-09-13 09:10:26 +02:00
Sebastian Andrzej Siewior	3ea3c069f2	x86: Allow to enable RT Allow to select RT. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2023-09-13 09:10:26 +02:00
Sebastian Andrzej Siewior	1f7514c9c7	net: Avoid the IPI to free the skb_attempt_defer_free() collects a skbs, which was allocated on a remote CPU, on a per-CPU list. These skbs are either freed on that remote CPU once the CPU enters NET_RX or an remote IPI function is invoked in to raise the NET_RX softirq if a threshold of pending skb has been exceeded. This remote IPI can cause the wakeup of ksoftirqd on PREEMPT_RT if the remote CPU idle was idle. This is undesired because once the ksoftirqd is running it will acquire all pending softirqs and they will not be executed as part of the threaded interrupt until ksoftird goes idle again. To void all this, schedule the deferred clean up from a worker. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:26 +02:00
Sebastian Andrzej Siewior	ecc2d44f55	seqlock: Do the lockdep annotation before locking in do_write_seqcount_begin_nested() It was brought up by Tetsuo that the following sequence write_seqlock_irqsave() printk_deferred_enter() could lead to a deadlock if the lockdep annotation within write_seqlock_irqsave() triggers. The problem is that the sequence counter is incremented before the lockdep annotation is performed. The lockdep splat would then attempt to invoke printk() but the reader side, of the same seqcount, could have a tty_port::lock acquired waiting for the sequence number to become even again. The other lockdep annotations come before the actual locking because "we want to see the locking error before it happens". There is no reason why seqcount should be different here. Do the lockdep annotation first then perform the locking operation (the sequence increment). Fixes: `1ca7d67cf5` ("seqcount: Add lockdep functionality to seqcount/seqlock structures") Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: https://lore.kernel.org/20230621130641.-5iueY1I@linutronix.de Link: https://lore.kernel.org/r/20230623171232.892937-2-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:25 +02:00
Sebastian Andrzej Siewior	3b1c90abea	signal: Don't disable preemption in ptrace_stop() on PREEMPT_RT. On PREEMPT_RT keeping preemption disabled during the invocation of cgroup_enter_frozen() is a problem because the function acquires css_set_lock which is a sleeping lock on PREEMPT_RT and must not be acquired with disabled preemption. The preempt-disabled section is only for performance optimisation reasons and can be avoided. Extend the comment and don't disable preemption before scheduling on PREEMPT_RT. Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20230803100932.325870-3-bigeasy@linutronix.de	2023-09-13 09:10:25 +02:00
Sebastian Andrzej Siewior	bf1069f8c0	signal: Add proper comment about the preempt-disable in ptrace_stop(). Commit `53da1d9456` ("fix ptrace slowness") added a preempt-disable section between read_unlock() and the following schedule() invocation without explaining why it is needed. Replace the comment with an explanation why this is needed. Clarify that it is needed for correctness but for performance reasons. Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20230803100932.325870-2-bigeasy@linutronix.de	2023-09-13 09:10:25 +02:00
Sebastian Andrzej Siewior	701e060705	locking/rtmutex: Acquire the hb lock via trylock after wait-proxylock. After rt_mutex_wait_proxy_lock() task_struct::pi_blocked_on is cleared if current owns the lock. If the operation has been interrupted by a signal or timeout then pi_blocked_on can be set. This means spin_lock() can overwrite pi_blocked_on on PREEMPT_RT. This has been noticed by the recently added lockdep-asserts… The rt_mutex_cleanup_proxy_lock() operation will clear pi_blocked_on (and update pending waiters as expected) but it must happen under the hb lock to ensure the same state in rtmutex and userland. Given all the possibilities it is probably the simplest option to try-lock the hb lock. In case the lock is occupied a quick nap is needed. A busy loop can lock up the system if performed by a task with high priorioty preventing the owner from running. The rt_mutex_post_schedule() needs to be put before try-lock-loop because otherwie the schedule() in schedule_hrtimeout() will trip over the !sched_rt_mutex assert. Introduce futex_trylock_hblock() to try-lock the hb lock and sleep until the try-lock operation succeeds. Use it after rt_mutex_wait_proxy_lock() to acquire the lock. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230831095314.fTliy0Bh@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:25 +02:00
Thomas Gleixner	b945fd3a9d	locking/rtmutex: Add a lockdep assert to catch potential nested blocking There used to be a BUG_ON(current->pi_blocked_on) in the lock acquisition functions, but that vanished in one of the rtmutex overhauls. Bring it back in form of a lockdep assert to catch code paths which take rtmutex based locks with current::pi_blocked_on != NULL. Reported-by: Crystal Wood <swood@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Peter Zijlstra (Intel)" <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230427111937.2745231-5-bigeasy@linutronix.de Link: https://lore.kernel.org/r/20230815111430.488430699@infradead.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:25 +02:00
Sebastian Andrzej Siewior	7529b7e0ae	locking/rtmutex: Use rt_mutex specific scheduler helpers Have rt_mutex use the rt_mutex specific scheduler helpers to avoid recursion vs rtlock on the PI state. [[ peterz: adapted to new names ]] Reported-by: Crystal Wood <swood@redhat.com> Signed-off-by: "Peter Zijlstra (Intel)" <peterz@infradead.org> Link: https://lore.kernel.org/r/20230815111430.421408298@infradead.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:25 +02:00
Peter Zijlstra	ba05f9cd37	sched: Provide rt_mutex specific scheduler helpers With PREEMPT_RT there is a rt_mutex recursion problem where sched_submit_work() can use an rtlock (aka spinlock_t). More specifically what happens is: mutex_lock() /* really rt_mutex / ... __rt_mutex_slowlock_locked() task_blocks_on_rt_mutex() // enqueue current task as waiter // do PI chain walk rt_mutex_slowlock_block() schedule() sched_submit_work() ... spin_lock() / really rtlock / ... __rt_mutex_slowlock_locked() task_blocks_on_rt_mutex() // enqueue current task as waiter AGAIN* // CONFUSION Fix this by making rt_mutex do the sched_submit_work() early, before it enqueues itself as a waiter -- before it even knows if it will wait. [[ basically Thomas' patch but with different naming and a few asserts added ]] Originally-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Peter Zijlstra (Intel)" <peterz@infradead.org> Link: https://lore.kernel.org/r/20230815111430.355375399@infradead.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:24 +02:00
Thomas Gleixner	04e68a27e5	sched: Extract __schedule_loop() There are currently two implementations of this basic __schedule() loop, and there is soon to be a third. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Peter Zijlstra (Intel)" <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230427111937.2745231-2-bigeasy@linutronix.de Link: https://lore.kernel.org/r/20230815111430.288063671@infradead.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:24 +02:00
Sebastian Andrzej Siewior	47deef3dbc	locking/rtmutex: Avoid unconditional slowpath for DEBUG_RT_MUTEXES With DEBUG_RT_MUTEXES enabled the fast-path rt_mutex_cmpxchg_acquire() always fails and all lock operations take the slow path. Provide a new helper inline rt_mutex_try_acquire() which maps to rt_mutex_cmpxchg_acquire() in the non-debug case. For the debug case it invokes rt_mutex_slowtrylock() which can acquire a non-contended rtmutex under full debug coverage. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: "Peter Zijlstra (Intel)" <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230427111937.2745231-4-bigeasy@linutronix.de Link: https://lore.kernel.org/r/20230815111430.220899937@infradead.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:24 +02:00
Peter Zijlstra	a70db40b7a	sched: Constrain locks in sched_submit_work() Even though sched_submit_work() is ran from preemptible context, it is discouraged to have it use blocking locks due to the recursion potential. Enforce this. Signed-off-by: "Peter Zijlstra (Intel)" <peterz@infradead.org> Link: https://lore.kernel.org/r/20230815111430.154558666@infradead.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-09-13 09:10:24 +02:00
Linus Torvalds	0bb80ecc33	Linux 6.6-rc1 v6.6-rc1	2023-09-10 16:28:41 -07:00
Linus Torvalds	1548b060d6	Merge tag 'topic/drm-ci-2023-08-31-1' of git://anongit.freedesktop.org/drm/drm Pull drm ci scripts from Dave Airlie: "This is a bunch of ci integration for the freedesktop gitlab instance where we currently do upstream userspace testing on diverse sets of GPU hardware. From my perspective I think it's an experiment worth going with and seeing how the benefits/noise playout keeping these files useful. Ideally I'd like to get this so we can do pre-merge testing on PRs eventually. Below is some info from danvet on why we've ended up making the decision and how we can roll it back if we decide it was a bad plan. Why in upstream? - like documentation, testcases, tools CI integration is one of these things where you can waste endless amounts of time if you accidentally have a version that doesn't match your source code - but also like the above, there's a balance, this is the initial cut of what we think makes sense to keep in sync vs out-of-tree, probably needs adjustment - gitlab supports out-of-repo gitlab integration and that's what's been used for the kernel in drm, but it results in per-driver fragmentation and lots of duplicated effort. the simple act of smashing an arbitrary winner into a topic branch already started surfacing patches on dri-devel and sparking good cross driver team discussions Why gitlab? - it's not any more shit than any of the other CI - drm userspace uses it extensively for everything in userspace, we have a lot of people and experience with this, including integration of hw testing labs - media userspace like gstreamer is also on gitlab.fd.o, and there's discussion to extend this to the media subsystem in some fashion Can this be shared? - there's definitely a pile of code that could move to scripts/ if other subsystem adopt ci integration in upstream kernel git. other bits are more drm/gpu specific like the igt-gpu-tests/tools integration - docker images can be run locally or in other CI runners Will we regret this? - it's all in one directory, intentionally, for easy deletion - probably 1-2 years in upstream to see whether this is worth it or a Big Mistake. that's roughly what it took to _really_ roll out solid CI in the bigger userspace projects we have on gitlab.fd.o like mesa3d" * tag 'topic/drm-ci-2023-08-31-1' of git://anongit.freedesktop.org/drm/drm: drm: ci: docs: fix build warning - add missing escape drm: Add initial ci/ subdirectory	2023-09-10 11:55:26 -07:00
Linus Torvalds	e56b2b6057	Merge tag 'x86-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Fix preemption delays in the SGX code, remove unnecessarily UAPI-exported code, fix a ld.lld linker (in)compatibility quirk and make the x86 SMP init code a bit more conservative to fix kexec() lockups" * tag 'x86-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sgx: Break up long non-preemptible delays in sgx_vepc_release() x86: Remove the arch_calc_vm_prot_bits() macro from the UAPI x86/build: Fix linker fill bytes quirk/incompatibility for ld.lld x86/smp: Don't send INIT to non-present and non-booted CPUs	2023-09-10 10:39:31 -07:00
Linus Torvalds	e79dbf03d8	Merge tag 'perf-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf event fix from Ingo Molnar: "Work around a firmware bug in the uncore PMU driver, affecting certain Intel systems" * tag 'perf-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/uncore: Correct the number of CHAs on EMR	2023-09-10 10:34:46 -07:00
Linus Torvalds	535a265d7f	Merge tag 'perf-tools-for-v6.6-1-2023-09-05' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools updates from Arnaldo Carvalho de Melo: "perf tools maintainership: - Add git information for perf-tools and perf-tools-next trees and branches to the MAINTAINERS file. That is where development now takes place and myself and Namhyung Kim have write access, more people to come as we emulate other maintainer groups. perf record: - Record kernel data maps when 'perf record --data' is used, so that global variables can be resolved and used in tools that do data profiling. perf trace: - Remove the old, experimental support for BPF events in which a .c file was passed as an event: "perf trace -e hello.c" to then get compiled and loaded. The only known usage for that, that shipped with the kernel as an example for such events, augmented the raw_syscalls tracepoints and was converted to a libbpf skeleton, reusing all the user space components and the BPF code connected to the syscalls. In the end just the way to glue the BPF part and the user space type beautifiers changed, now being performed by libbpf skeletons. The next step is to use BTF to do pretty printing of all syscall types, as discussed with Alan Maguire and others. Now, on a perf built with BUILD_BPF_SKEL=1 we get most if not all path/filenames/strings, some of the networking data structures, perf_event_attr, etc, i.e. systemwide tracing of nanosleep calls and perf_event_open syscalls while 'perf stat' runs 'sleep' for 5 seconds: # perf trace -a -e nanosleep,perf perf stat -e cycles,instructions sleep 5 0.000 ( 9.034 ms): perf/327641 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED\|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 327642 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3 9.039 ( 0.006 ms): perf/327641 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x1 (PERF_COUNT_HW_INSTRUCTIONS), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED\|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 327642 (perf-exec), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 4 ? ( ): gpm/991 ... [continued]: clock_nanosleep()) = 0 10.133 ( ): sleep/327642 clock_nanosleep(rqtp: { .tv_sec: 5, .tv_nsec: 0 }, rmtp: 0x7ffd36f83ed0) ... ? ( ): pool-gsd-smart/3051 ... [continued]: clock_nanosleep()) = 0 30.276 ( ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ... 223.215 (1000.430 ms): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) = 0 30.276 (2000.394 ms): gpm/991 ... [continued]: clock_nanosleep()) = 0 1230.814 ( ): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) ... 1230.814 (1000.404 ms): pool-gsd-smart/3051 ... [continued]: clock_nanosleep()) = 0 2030.886 ( ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ... 2237.709 (1000.153 ms): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) = 0 ? ( ): crond/1172 ... [continued]: clock_nanosleep()) = 0 3242.699 ( ): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) ... 2030.886 (2000.385 ms): gpm/991 ... [continued]: clock_nanosleep()) = 0 3728.078 ( ): crond/1172 clock_nanosleep(rqtp: { .tv_sec: 60, .tv_nsec: 0 }, rmtp: 0x7ffe0971dcf0) ... 3242.699 (1000.158 ms): pool-gsd-smart/3051 ... [continued]: clock_nanosleep()) = 0 4031.409 ( ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ... 10.133 (5000.375 ms): sleep/327642 ... [continued]: clock_nanosleep()) = 0 Performance counter stats for 'sleep 5': 2,617,347 cycles 1,855,997 instructions # 0.71 insn per cycle 5.002282128 seconds time elapsed 0.000855000 seconds user 0.000852000 seconds sys perf annotate: - Building with binutils' libopcode now is opt-in (BUILD_NONDISTRO=1) for licensing reasons, and we missed a build test on tools/perf/tests makefile. Since we now default to NDEBUG=1, we ended up segfaulting when building with BUILD_NONDISTRO=1 because a needed initialization routine was being "error checked" via an assert. Fix it by explicitly checking the result and aborting instead if it fails. We better back propagate the error, but at least 'perf annotate' on samples collected for a BPF program is back working when perf is built with BUILD_NONDISTRO=1. perf report/top: - Add back TUI hierarchy mode header, that is seen when using 'perf report/top --hierarchy'. - Fix the number of entries for 'e' key in the TUI that was preventing navigation of lines when expanding an entry. perf report/script: - Support cross platform register handling, allowing a perf.data file collected on one architecture to have registers sampled correctly displayed when analysis tools such as 'perf report' and 'perf script' are used on a different architecture. - Fix handling of event attributes in pipe mode, i.e. when one uses: perf record -o - \| perf report -i - When no perf.data files are used. - Handle files generated via pipe mode with a version of perf and then read also via pipe mode with a different version of perf, where the event attr record may have changed, use the record size field to properly support this version mismatch. perf probe: - Accessing global variables from uprobes isn't supported, make the error message state that instead of stating that some minimal kernel version is needed to have that feature. This seems just a tool limitation, the kernel probably has all that is needed. perf tests: - Fix a reference count related leak in the dlfilter v0 API where the result of a thread__find_symbol_fb() is not matched with an addr_location__exit() to drop the reference counts of the resolved components (machine, thread, map, symbol, etc). Add a dlfilter test to make sure that doesn't regresses. - Lots of fixes for the 'perf test' written in shell script related to problems found with the shellcheck utility. - Fixes for 'perf test' shell scripts testing features enabled when perf is built with BUILD_BPF_SKEL=1, such as 'perf stat' bpf counters. - Add perf record sample filtering test, things like the following example, that gets implemented as a BPF filter attached to the event: # perf record -e task-clock -c 10000 --filter 'ip < 0xffffffff00000000' - Improve the way the task_analyzer test checks if libtraceevent is linked, using 'perf version --build-options' instead of the more expensinve 'perf record -e "sched:sched_switch"'. - Add support for riscv in the mmap-basic test. (This went as well via the RiscV tree, same contents). libperf: - Implement riscv mmap support (This went as well via the RiscV tree, same contents). perf script: - New tool that converts perf.data files to the firefox profiler format so that one can use the visualizer at https://profiler.firefox.com/. Done by Anup Sharma as part of this year's Google Summer of Code. One can generate the output and upload it to the web interface but Anup also automated everything: perf script gecko -F 99 -a sleep 60 - Support syscall name parsing on arm64. - Print "cgroup" field on the same line as "comm". perf bench: - Add new 'uprobe' benchmark to measure the overhead of uprobes with/without BPF programs attached to it. - breakpoints are not available on power9, skip that test. perf stat: - Add #num_cpus_online literal to be used in 'perf stat' metrics, and add this extra 'perf test' check that exemplifies its purpose: TEST_ASSERT_VAL("#num_cpus_online", expr__parse(&num_cpus_online, ctx, "#num_cpus_online") == 0); TEST_ASSERT_VAL("#num_cpus", expr__parse(&num_cpus, ctx, "#num_cpus") == 0); TEST_ASSERT_VAL("#num_cpus >= #num_cpus_online", num_cpus >= num_cpus_online); Miscellaneous: - Improve tool startup time by lazily reading PMU, JSON, sysfs data. - Improve error reporting in the parsing of events, passing YYLTYPE to error routines, so that the output can show were the parsing error was found. - Add 'perf test' entries to check the parsing of events improvements. - Fix various leak for things detected by -fsanitize=address, mostly things that would be freed at tool exit, including: - Free evsel->filter on the destructor. - Allow tools to register a thread->priv destructor and use it in 'perf trace'. - Free evsel->priv in 'perf trace'. - Free string returned by synthesize_perf_probe_point() when the caller fails to do all it needs. - Adjust various compiler options to not consider errors some warnings when building with broken headers found in things like python, flex, bison, as we otherwise build with -Werror. Some for gcc, some for clang, some for some specific version of those, some for some specific version of flex or bison, or some specific combination of these components, bah. - Allow customization of clang options for BPF target, this helps building on gentoo where there are other oddities where BPF targets gets passed some compiler options intended for the native build, so building with WERROR=0 helps while these oddities are fixed. - Dont pass ERR_PTR() values to perf_session__delete() in 'perf top' and 'perf lock', fixing some segfaults when handling some odd failures. - Add LTO build option. - Fix format of unordered lists in the perf docs (tools/perf/Documentation) - Overhaul the bison files, using constructs such as YYNOMEM. - Remove unused tokens from the bison .y files. - Add more comments to various structs. - A few LoongArch enablement patches. Vendor events (JSON): - Add JSON metrics for Yitian 710 DDR (aarch64). Things like: EventName, BriefDescription visible_window_limit_reached_rd, "At least one entry in read queue reaches the visible window limit.", visible_window_limit_reached_wr, "At least one entry in write queue reaches the visible window limit.", op_is_dqsosc_mpc , "A DQS Oscillator MPC command to DRAM.", op_is_dqsosc_mrr , "A DQS Oscillator MRR command to DRAM.", op_is_tcr_mrr , "A Temperature Compensated Refresh(TCR) MRR command to DRAM.", - Add AmpereOne metrics (aarch64). - Update N2 and V2 metrics (aarch64) and events using Arm telemetry repo. - Update scale units and descriptions of common topdown metrics on aarch64. Things like: - "MetricExpr": "stall_slot_frontend / (#slots * cpu_cycles)", - "BriefDescription": "Frontend bound L1 topdown metric", + "MetricExpr": "100 * (stall_slot_frontend / (#slots * cpu_cycles))", + "BriefDescription": "This metric is the percentage of total slots that were stalled due to resource constraints in the frontend of the processor.", - Update events for intel: meteorlake to 1.04, sapphirerapids to 1.15, Icelake+ metric constraints. - Update files for the power10 platform" * tag 'perf-tools-for-v6.6-1-2023-09-05' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (217 commits) perf parse-events: Fix driver config term perf parse-events: Fixes relating to no_value terms perf parse-events: Fix propagation of term's no_value when cloning perf parse-events: Name the two term enums perf list: Don't print Unit for "default_core" perf vendor events intel: Fix modifier in tma_info_system_mem_parallel_reads for skylake perf dlfilter: Avoid leak in v0 API test use of resolve_address() perf metric: Add #num_cpus_online literal perf pmu: Remove str from perf_pmu_alias perf parse-events: Make common term list to strbuf helper perf parse-events: Minor help message improvements perf pmu: Avoid uninitialized use of alias->str perf jevents: Use "default_core" for events with no Unit perf test stat_bpf_counters_cgrp: Enhance perf stat cgroup BPF counter test perf test shell stat_bpf_counters: Fix test on Intel perf test shell record_bpf_filter: Skip 6.2 kernel libperf: Get rid of attr.id field perf tools: Convert to perf_record_header_attr_id() libperf: Add perf_record_header_attr_id() perf tools: Handle old data in PERF_RECORD_ATTR ...	2023-09-09 20:06:17 -07:00
Linus Torvalds	fd3a5940e6	Merge tag '6.6-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6 Pull smb client fixes from Steve French: - six smb3 client fixes including ones to allow controlling smb3 directory caching timeout and limits, and one debugging improvement - one fix for nls Kconfig (don't need to expose NLS_UCS2_UTILS option) - one minor spnego registry update * tag '6.6-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: spnego: add missing OID to oid registry smb3: fix minor typo in SMB2_GLOBAL_CAP_LARGE_MTU cifs: update internal module version number for cifs.ko smb3: allow controlling maximum number of cached directories smb3: add trace point for queryfs (statfs) nls: Hide new NLS_UCS2_UTILS smb3: allow controlling length of time directory entries are cached with dir leases smb: propagate error code of extract_sharename()	2023-09-09 19:56:23 -07:00
David Howells	a3c57ab79a	iov_iter: Kunit tests for page extraction Add some kunit tests for page extraction for ITER_BVEC, ITER_KVEC and ITER_XARRAY type iterators. ITER_UBUF and ITER_IOVEC aren't dealt with as they require userspace VM interaction. ITER_DISCARD isn't dealt with either as that can't be extracted. Signed-off-by: David Howells <dhowells@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: David Hildenbrand <david@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2023-09-09 15:11:49 -07:00
David Howells	2d71340ff1	iov_iter: Kunit tests for copying to/from an iterator Add some kunit tests for page extraction for ITER_BVEC, ITER_KVEC and ITER_XARRAY type iterators. ITER_UBUF and ITER_IOVEC aren't dealt with as they require userspace VM interaction. ITER_DISCARD isn't dealt with either as that does nothing. Signed-off-by: David Howells <dhowells@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: David Hildenbrand <david@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2023-09-09 15:11:49 -07:00

1 2 3 4 5 ...

1215159 Commits