x86 APIC updates:
- Fix the incorrect handling of atomic offset updates in
reserve_eilvt_offset()
The check for the return value of atomic_cmpxchg() is not compared
against the old value, it is compared against the new value, which
makes it two round on success.
Convert it to atomic_try_cmpxchg() which does the right thing.
- Handle IO/APIC less systems correctly
When IO/APIC is not advertised by ACPI then the computation of the lower
bound for dynamically allocated interrupts like MSI goes wrong.
This lower bound is used to exclude the IO/APIC legacy GSI space as that
must stay reserved for the legacy interrupts.
In case that the system, e.g. VM, does not advertise an IO/APIC the
lower bound stays at 0.
0 is an invalid interrupt number except for the legacy timer interrupt
on x86. The return value is unchecked in the core code, so it ends up
to allocate interrupt number 0 which is subsequently considered to be
invalid by the caller, e.g. the MSI allocation code.
A similar problem was already cured for device tree based systems years
ago, but that missed - or did not envision - the zero IO/APIC case.
Consolidate the zero check and return the provided "from" argument to the
core code call site, which is guaranteed to be greater than 0.
- Simplify the X2APIC cluster CPU mask logic for CPU hotplug
Per cluster CPU masks are required for X2APIC in cluster mode to
determine the correct cluster for a target CPU when calculating the
destination for IPIs
These masks are established when CPUs are borught up. The first CPU in a
cluster must allocate a new cluster CPU mask. As this happens during the
early startup of a CPU, where memory allocations cannot be done, the
mask has to be allocated by the control CPU.
The current implementation allocates a clustermask just in case and if
the to be brought up CPU is the first in a cluster the CPU takes over
this allocation from a global pointer.
This works nicely in the fully serialized CPU bringup scenario which is
used today, but would fail completely for parallel bringup of CPUs.
The cluster association of a CPU can be computed from the APIC ID which
is enumerated by ACPI/MADT.
So the cluster CPU masks can be preallocated and associated upfront and
the upcoming CPUs just need to set their corresponding bit.
Aside of preparing for parallel bringup this is a valuable
simplification on its own.
- Remove global variables which control the early startup of secondary
CPUs on 64-bit
The only information which is needed by a starting CPU is the Linux CPU
number. The CPU number allows it to retrieve the rest of the required
data from already existing per CPU storage.
So instead of initial_stack, early_gdt_desciptor and initial_gs provide
a new variable smpboot_control which contains the Linux CPU number for
now. The starting CPU can retrieve and compute all required information
for startup from there.
Aside of being a cleanup, this is also preparing for parallel CPU
bringup, where starting CPUs will look up their Linux CPU number via the
APIC ID, when smpboot_control has the corresponding control bit set.
- Make cc_vendor globally accesible
Subsequent parallel bringup changes require access to cc_vendor because
confidental computing platforms need special treatment in the early
startup phase vs. CPUID and APCI ID readouts.
The change makes cc_vendor global and provides stub accessors in case
that CONFIG_ARCH_HAS_CC_PLATFORM is not set.
This was merged from the x86/cc branch in anticipation of further
parallel bringup commits which require access to cc_vendor. Due to late
discoveries of fundamental issue with those patches these commits never
happened.
The merge commit is unfortunately in the middle of the APIC commits so
unraveling it would have required a rebase or revert. As the parallel
bringup seems to be well on its way for 6.5 this would be just pointless
churn. As the commit does not contain any functional change it's not a
risk to keep it.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmRGuAwTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoRzSEADEx1sVkd2yrLcTYdpjdKbbUaDJ6lR0
DXxIP3+ApGHmV9l9yIh+/5C2oEJsiUfFf1vdh6ajv5iXpksCKzcUzkW5g3w7nM36
CSpULpFjwvaq8TIo0o1PIhAbo/yIMMzJVDs8R0reCnWgGAWZoW/a9Ndcvcicd0an
pQAlkw3FD5r92mcMlKPNWFoui1AkScGEV02zJ7884MAukmBZwD8Jd+gE6eQC9GKa
9hyJiB77st1URl+a0cPsPYvv8RLVuVcljWsh2edyvxgovIO56+BoEjbrgRSF6cqQ
Bhzo//3KgbUJ1y+YqH01aKZzY0hRpbAi2Rew4RBKcBKwCGd2qltUQG0LFNxAtV83
RsC573wSCGSCGO5Xb1RVXih5is+9YqMqitJNWvEc15jjOA9nwoLc80axP11v42f9
Xl4iGHQTWVGdxT4H22NH7UCuRlGg38vAx+In2HGpN/e57q2ighESjiGuqQAQpLel
pbOeJtQ/D2xXVKcCap4T/P/2x5ls7bsc76MWJBMcYC3pRgJ5M7ZHw7wTw0IAty4x
xCfR1bsRVEAhrE9r/odgNipXjBJu+CdGBAupNEIiRyq1QiwUKtMTayasRGUlbYO6
vrieHKqoflzRVg2M9Bgm3oI28X27FzZHWAZJW2oJ2Wnn2jL5kuRJa1nEykqo8pEP
j6rjnScRVvdpIw==
=IQWG
-----END PGP SIGNATURE-----
Merge tag 'x86-apic-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 APIC updates from Thomas Gleixner:
- Fix the incorrect handling of atomic offset updates in
reserve_eilvt_offset()
The check for the return value of atomic_cmpxchg() is not compared
against the old value, it is compared against the new value, which
makes it two round on success.
Convert it to atomic_try_cmpxchg() which does the right thing.
- Handle IO/APIC less systems correctly
When IO/APIC is not advertised by ACPI then the computation of the
lower bound for dynamically allocated interrupts like MSI goes wrong.
This lower bound is used to exclude the IO/APIC legacy GSI space as
that must stay reserved for the legacy interrupts.
In case that the system, e.g. VM, does not advertise an IO/APIC the
lower bound stays at 0.
0 is an invalid interrupt number except for the legacy timer
interrupt on x86. The return value is unchecked in the core code, so
it ends up to allocate interrupt number 0 which is subsequently
considered to be invalid by the caller, e.g. the MSI allocation code.
A similar problem was already cured for device tree based systems
years ago, but that missed - or did not envision - the zero IO/APIC
case.
Consolidate the zero check and return the provided "from" argument to
the core code call site, which is guaranteed to be greater than 0.
- Simplify the X2APIC cluster CPU mask logic for CPU hotplug
Per cluster CPU masks are required for X2APIC in cluster mode to
determine the correct cluster for a target CPU when calculating the
destination for IPIs
These masks are established when CPUs are borught up. The first CPU
in a cluster must allocate a new cluster CPU mask. As this happens
during the early startup of a CPU, where memory allocations cannot be
done, the mask has to be allocated by the control CPU.
The current implementation allocates a clustermask just in case and
if the to be brought up CPU is the first in a cluster the CPU takes
over this allocation from a global pointer.
This works nicely in the fully serialized CPU bringup scenario which
is used today, but would fail completely for parallel bringup of
CPUs.
The cluster association of a CPU can be computed from the APIC ID
which is enumerated by ACPI/MADT.
So the cluster CPU masks can be preallocated and associated upfront
and the upcoming CPUs just need to set their corresponding bit.
Aside of preparing for parallel bringup this is a valuable
simplification on its own.
- Remove global variables which control the early startup of secondary
CPUs on 64-bit
The only information which is needed by a starting CPU is the Linux
CPU number. The CPU number allows it to retrieve the rest of the
required data from already existing per CPU storage.
So instead of initial_stack, early_gdt_desciptor and initial_gs
provide a new variable smpboot_control which contains the Linux CPU
number for now. The starting CPU can retrieve and compute all
required information for startup from there.
Aside of being a cleanup, this is also preparing for parallel CPU
bringup, where starting CPUs will look up their Linux CPU number via
the APIC ID, when smpboot_control has the corresponding control bit
set.
- Make cc_vendor globally accesible
Subsequent parallel bringup changes require access to cc_vendor
because confidental computing platforms need special treatment in the
early startup phase vs. CPUID and APCI ID readouts.
The change makes cc_vendor global and provides stub accessors in case
that CONFIG_ARCH_HAS_CC_PLATFORM is not set.
This was merged from the x86/cc branch in anticipation of further
parallel bringup commits which require access to cc_vendor. Due to
late discoveries of fundamental issue with those patches these
commits never happened.
The merge commit is unfortunately in the middle of the APIC commits
so unraveling it would have required a rebase or revert. As the
parallel bringup seems to be well on its way for 6.5 this would be
just pointless churn. As the commit does not contain any functional
change it's not a risk to keep it.
* tag 'x86-apic-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ioapic: Don't return 0 from arch_dynirq_lower_bound()
x86/apic: Fix atomic update of offset in reserve_eilvt_offset()
x86/coco: Export cc_vendor
x86/smpboot: Reference count on smpboot_setup_warm_reset_vector()
x86/smpboot: Remove initial_gs
x86/smpboot: Remove early_gdt_descr on 64-bit
x86/smpboot: Remove initial_stack on 64-bit
x86/apic/x2apic: Allow CPU cluster_mask to be populated in parallel
This commit is contained in:
commit
de10553fce
@ -13,7 +13,7 @@
|
||||
#include <asm/coco.h>
|
||||
#include <asm/processor.h>
|
||||
|
||||
static enum cc_vendor vendor __ro_after_init;
|
||||
enum cc_vendor cc_vendor __ro_after_init;
|
||||
static u64 cc_mask __ro_after_init;
|
||||
|
||||
static bool intel_cc_platform_has(enum cc_attr attr)
|
||||
@ -99,7 +99,7 @@ static bool amd_cc_platform_has(enum cc_attr attr)
|
||||
|
||||
bool cc_platform_has(enum cc_attr attr)
|
||||
{
|
||||
switch (vendor) {
|
||||
switch (cc_vendor) {
|
||||
case CC_VENDOR_AMD:
|
||||
return amd_cc_platform_has(attr);
|
||||
case CC_VENDOR_INTEL:
|
||||
@ -119,7 +119,7 @@ u64 cc_mkenc(u64 val)
|
||||
* - for AMD, bit *set* means the page is encrypted
|
||||
* - for AMD with vTOM and for Intel, *clear* means encrypted
|
||||
*/
|
||||
switch (vendor) {
|
||||
switch (cc_vendor) {
|
||||
case CC_VENDOR_AMD:
|
||||
if (sev_status & MSR_AMD64_SNP_VTOM)
|
||||
return val & ~cc_mask;
|
||||
@ -135,7 +135,7 @@ u64 cc_mkenc(u64 val)
|
||||
u64 cc_mkdec(u64 val)
|
||||
{
|
||||
/* See comment in cc_mkenc() */
|
||||
switch (vendor) {
|
||||
switch (cc_vendor) {
|
||||
case CC_VENDOR_AMD:
|
||||
if (sev_status & MSR_AMD64_SNP_VTOM)
|
||||
return val | cc_mask;
|
||||
@ -149,11 +149,6 @@ u64 cc_mkdec(u64 val)
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(cc_mkdec);
|
||||
|
||||
__init void cc_set_vendor(enum cc_vendor v)
|
||||
{
|
||||
vendor = v;
|
||||
}
|
||||
|
||||
__init void cc_set_mask(u64 mask)
|
||||
{
|
||||
cc_mask = mask;
|
||||
|
||||
@ -10,13 +10,30 @@ enum cc_vendor {
|
||||
CC_VENDOR_INTEL,
|
||||
};
|
||||
|
||||
void cc_set_vendor(enum cc_vendor v);
|
||||
void cc_set_mask(u64 mask);
|
||||
|
||||
#ifdef CONFIG_ARCH_HAS_CC_PLATFORM
|
||||
extern enum cc_vendor cc_vendor;
|
||||
|
||||
static inline enum cc_vendor cc_get_vendor(void)
|
||||
{
|
||||
return cc_vendor;
|
||||
}
|
||||
|
||||
static inline void cc_set_vendor(enum cc_vendor vendor)
|
||||
{
|
||||
cc_vendor = vendor;
|
||||
}
|
||||
|
||||
void cc_set_mask(u64 mask);
|
||||
u64 cc_mkenc(u64 val);
|
||||
u64 cc_mkdec(u64 val);
|
||||
#else
|
||||
static inline enum cc_vendor cc_get_vendor(void)
|
||||
{
|
||||
return CC_VENDOR_NONE;
|
||||
}
|
||||
|
||||
static inline void cc_set_vendor(enum cc_vendor vendor) { }
|
||||
|
||||
static inline u64 cc_mkenc(u64 val)
|
||||
{
|
||||
return val;
|
||||
|
||||
@ -647,7 +647,11 @@ static inline void spin_lock_prefetch(const void *x)
|
||||
#define KSTK_ESP(task) (task_pt_regs(task)->sp)
|
||||
|
||||
#else
|
||||
#define INIT_THREAD { }
|
||||
extern unsigned long __end_init_task[];
|
||||
|
||||
#define INIT_THREAD { \
|
||||
.sp = (unsigned long)&__end_init_task - sizeof(struct pt_regs), \
|
||||
}
|
||||
|
||||
extern unsigned long KSTK_ESP(struct task_struct *task);
|
||||
|
||||
|
||||
@ -59,7 +59,6 @@ extern struct real_mode_header *real_mode_header;
|
||||
extern unsigned char real_mode_blob_end[];
|
||||
|
||||
extern unsigned long initial_code;
|
||||
extern unsigned long initial_gs;
|
||||
extern unsigned long initial_stack;
|
||||
#ifdef CONFIG_AMD_MEM_ENCRYPT
|
||||
extern unsigned long initial_vc_handler;
|
||||
|
||||
@ -199,5 +199,8 @@ extern void nmi_selftest(void);
|
||||
#define nmi_selftest() do { } while (0)
|
||||
#endif
|
||||
|
||||
#endif /* __ASSEMBLY__ */
|
||||
extern unsigned int smpboot_control;
|
||||
|
||||
#endif /* !__ASSEMBLY__ */
|
||||
|
||||
#endif /* _ASM_X86_SMP_H */
|
||||
|
||||
@ -111,13 +111,26 @@ int x86_acpi_suspend_lowlevel(void)
|
||||
saved_magic = 0x12345678;
|
||||
#else /* CONFIG_64BIT */
|
||||
#ifdef CONFIG_SMP
|
||||
initial_stack = (unsigned long)temp_stack + sizeof(temp_stack);
|
||||
early_gdt_descr.address =
|
||||
(unsigned long)get_cpu_gdt_rw(smp_processor_id());
|
||||
initial_gs = per_cpu_offset(smp_processor_id());
|
||||
/*
|
||||
* As each CPU starts up, it will find its own stack pointer
|
||||
* from its current_task->thread.sp. Typically that will be
|
||||
* the idle thread for a newly-started AP, or even the boot
|
||||
* CPU which will find it set to &init_task in the static
|
||||
* per-cpu data.
|
||||
*
|
||||
* Make the resuming CPU use the temporary stack at startup
|
||||
* by setting current->thread.sp to point to that. The true
|
||||
* %rsp will be restored with the rest of the CPU context,
|
||||
* by do_suspend_lowlevel(). And unwinders don't care about
|
||||
* the abuse of ->thread.sp because it's a dead variable
|
||||
* while the thread is running on the CPU anyway; the true
|
||||
* value is in the actual %rsp register.
|
||||
*/
|
||||
current->thread.sp = (unsigned long)temp_stack + sizeof(temp_stack);
|
||||
smpboot_control = smp_processor_id();
|
||||
#endif
|
||||
initial_code = (unsigned long)wakeup_long64;
|
||||
saved_magic = 0x123456789abcdef0L;
|
||||
saved_magic = 0x123456789abcdef0L;
|
||||
#endif /* CONFIG_64BIT */
|
||||
|
||||
/*
|
||||
|
||||
@ -422,10 +422,9 @@ static unsigned int reserve_eilvt_offset(int offset, unsigned int new)
|
||||
if (vector && !eilvt_entry_is_changeable(vector, new))
|
||||
/* may not change if vectors are different */
|
||||
return rsvd;
|
||||
rsvd = atomic_cmpxchg(&eilvt_offsets[offset], rsvd, new);
|
||||
} while (rsvd != new);
|
||||
} while (!atomic_try_cmpxchg(&eilvt_offsets[offset], &rsvd, new));
|
||||
|
||||
rsvd &= ~APIC_EILVT_MASKED;
|
||||
rsvd = new & ~APIC_EILVT_MASKED;
|
||||
if (rsvd && rsvd != vector)
|
||||
pr_info("LVT offset %d assigned for vector 0x%02x\n",
|
||||
offset, rsvd);
|
||||
|
||||
@ -2478,17 +2478,21 @@ static int io_apic_get_redir_entries(int ioapic)
|
||||
|
||||
unsigned int arch_dynirq_lower_bound(unsigned int from)
|
||||
{
|
||||
unsigned int ret;
|
||||
|
||||
/*
|
||||
* dmar_alloc_hwirq() may be called before setup_IO_APIC(), so use
|
||||
* gsi_top if ioapic_dynirq_base hasn't been initialized yet.
|
||||
*/
|
||||
if (!ioapic_initialized)
|
||||
return gsi_top;
|
||||
ret = ioapic_dynirq_base ? : gsi_top;
|
||||
|
||||
/*
|
||||
* For DT enabled machines ioapic_dynirq_base is irrelevant and not
|
||||
* updated. So simply return @from if ioapic_dynirq_base == 0.
|
||||
* For DT enabled machines ioapic_dynirq_base is irrelevant and
|
||||
* always 0. gsi_top can be 0 if there is no IO/APIC registered.
|
||||
* 0 is an invalid interrupt number for dynamic allocations. Return
|
||||
* @from instead.
|
||||
*/
|
||||
return ioapic_dynirq_base ? : from;
|
||||
return ret ? : from;
|
||||
}
|
||||
|
||||
#ifdef CONFIG_X86_32
|
||||
|
||||
@ -9,11 +9,7 @@
|
||||
|
||||
#include "local.h"
|
||||
|
||||
struct cluster_mask {
|
||||
unsigned int clusterid;
|
||||
int node;
|
||||
struct cpumask mask;
|
||||
};
|
||||
#define apic_cluster(apicid) ((apicid) >> 4)
|
||||
|
||||
/*
|
||||
* __x2apic_send_IPI_mask() possibly needs to read
|
||||
@ -23,8 +19,7 @@ struct cluster_mask {
|
||||
static u32 *x86_cpu_to_logical_apicid __read_mostly;
|
||||
|
||||
static DEFINE_PER_CPU(cpumask_var_t, ipi_mask);
|
||||
static DEFINE_PER_CPU_READ_MOSTLY(struct cluster_mask *, cluster_masks);
|
||||
static struct cluster_mask *cluster_hotplug_mask;
|
||||
static DEFINE_PER_CPU_READ_MOSTLY(struct cpumask *, cluster_masks);
|
||||
|
||||
static int x2apic_acpi_madt_oem_check(char *oem_id, char *oem_table_id)
|
||||
{
|
||||
@ -60,10 +55,10 @@ __x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int apic_dest)
|
||||
|
||||
/* Collapse cpus in a cluster so a single IPI per cluster is sent */
|
||||
for_each_cpu(cpu, tmpmsk) {
|
||||
struct cluster_mask *cmsk = per_cpu(cluster_masks, cpu);
|
||||
struct cpumask *cmsk = per_cpu(cluster_masks, cpu);
|
||||
|
||||
dest = 0;
|
||||
for_each_cpu_and(clustercpu, tmpmsk, &cmsk->mask)
|
||||
for_each_cpu_and(clustercpu, tmpmsk, cmsk)
|
||||
dest |= x86_cpu_to_logical_apicid[clustercpu];
|
||||
|
||||
if (!dest)
|
||||
@ -71,7 +66,7 @@ __x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int apic_dest)
|
||||
|
||||
__x2apic_send_IPI_dest(dest, vector, APIC_DEST_LOGICAL);
|
||||
/* Remove cluster CPUs from tmpmask */
|
||||
cpumask_andnot(tmpmsk, tmpmsk, &cmsk->mask);
|
||||
cpumask_andnot(tmpmsk, tmpmsk, cmsk);
|
||||
}
|
||||
|
||||
local_irq_restore(flags);
|
||||
@ -105,55 +100,98 @@ static u32 x2apic_calc_apicid(unsigned int cpu)
|
||||
|
||||
static void init_x2apic_ldr(void)
|
||||
{
|
||||
struct cluster_mask *cmsk = this_cpu_read(cluster_masks);
|
||||
u32 cluster, apicid = apic_read(APIC_LDR);
|
||||
unsigned int cpu;
|
||||
struct cpumask *cmsk = this_cpu_read(cluster_masks);
|
||||
|
||||
x86_cpu_to_logical_apicid[smp_processor_id()] = apicid;
|
||||
BUG_ON(!cmsk);
|
||||
|
||||
if (cmsk)
|
||||
goto update;
|
||||
|
||||
cluster = apicid >> 16;
|
||||
for_each_online_cpu(cpu) {
|
||||
cmsk = per_cpu(cluster_masks, cpu);
|
||||
/* Matching cluster found. Link and update it. */
|
||||
if (cmsk && cmsk->clusterid == cluster)
|
||||
goto update;
|
||||
}
|
||||
cmsk = cluster_hotplug_mask;
|
||||
cmsk->clusterid = cluster;
|
||||
cluster_hotplug_mask = NULL;
|
||||
update:
|
||||
this_cpu_write(cluster_masks, cmsk);
|
||||
cpumask_set_cpu(smp_processor_id(), &cmsk->mask);
|
||||
cpumask_set_cpu(smp_processor_id(), cmsk);
|
||||
}
|
||||
|
||||
static int alloc_clustermask(unsigned int cpu, int node)
|
||||
/*
|
||||
* As an optimisation during boot, set the cluster_mask for all present
|
||||
* CPUs at once, to prevent each of them having to iterate over the others
|
||||
* to find the existing cluster_mask.
|
||||
*/
|
||||
static void prefill_clustermask(struct cpumask *cmsk, unsigned int cpu, u32 cluster)
|
||||
{
|
||||
int cpu_i;
|
||||
|
||||
for_each_present_cpu(cpu_i) {
|
||||
struct cpumask **cpu_cmsk = &per_cpu(cluster_masks, cpu_i);
|
||||
u32 apicid = apic->cpu_present_to_apicid(cpu_i);
|
||||
|
||||
if (apicid == BAD_APICID || cpu_i == cpu || apic_cluster(apicid) != cluster)
|
||||
continue;
|
||||
|
||||
if (WARN_ON_ONCE(*cpu_cmsk == cmsk))
|
||||
continue;
|
||||
|
||||
BUG_ON(*cpu_cmsk);
|
||||
*cpu_cmsk = cmsk;
|
||||
}
|
||||
}
|
||||
|
||||
static int alloc_clustermask(unsigned int cpu, u32 cluster, int node)
|
||||
{
|
||||
struct cpumask *cmsk = NULL;
|
||||
unsigned int cpu_i;
|
||||
|
||||
/*
|
||||
* At boot time, the CPU present mask is stable. The cluster mask is
|
||||
* allocated for the first CPU in the cluster and propagated to all
|
||||
* present siblings in the cluster. If the cluster mask is already set
|
||||
* on entry to this function for a given CPU, there is nothing to do.
|
||||
*/
|
||||
if (per_cpu(cluster_masks, cpu))
|
||||
return 0;
|
||||
/*
|
||||
* If a hotplug spare mask exists, check whether it's on the right
|
||||
* node. If not, free it and allocate a new one.
|
||||
*/
|
||||
if (cluster_hotplug_mask) {
|
||||
if (cluster_hotplug_mask->node == node)
|
||||
return 0;
|
||||
kfree(cluster_hotplug_mask);
|
||||
}
|
||||
|
||||
cluster_hotplug_mask = kzalloc_node(sizeof(*cluster_hotplug_mask),
|
||||
GFP_KERNEL, node);
|
||||
if (!cluster_hotplug_mask)
|
||||
if (system_state < SYSTEM_RUNNING)
|
||||
goto alloc;
|
||||
|
||||
/*
|
||||
* On post boot hotplug for a CPU which was not present at boot time,
|
||||
* iterate over all possible CPUs (even those which are not present
|
||||
* any more) to find any existing cluster mask.
|
||||
*/
|
||||
for_each_possible_cpu(cpu_i) {
|
||||
u32 apicid = apic->cpu_present_to_apicid(cpu_i);
|
||||
|
||||
if (apicid != BAD_APICID && apic_cluster(apicid) == cluster) {
|
||||
cmsk = per_cpu(cluster_masks, cpu_i);
|
||||
/*
|
||||
* If the cluster is already initialized, just store
|
||||
* the mask and return. There's no need to propagate.
|
||||
*/
|
||||
if (cmsk) {
|
||||
per_cpu(cluster_masks, cpu) = cmsk;
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
}
|
||||
/*
|
||||
* No CPU in the cluster has ever been initialized, so fall through to
|
||||
* the boot time code which will also populate the cluster mask for any
|
||||
* other CPU in the cluster which is (now) present.
|
||||
*/
|
||||
alloc:
|
||||
cmsk = kzalloc_node(sizeof(*cmsk), GFP_KERNEL, node);
|
||||
if (!cmsk)
|
||||
return -ENOMEM;
|
||||
cluster_hotplug_mask->node = node;
|
||||
per_cpu(cluster_masks, cpu) = cmsk;
|
||||
prefill_clustermask(cmsk, cpu, cluster);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int x2apic_prepare_cpu(unsigned int cpu)
|
||||
{
|
||||
if (alloc_clustermask(cpu, cpu_to_node(cpu)) < 0)
|
||||
u32 phys_apicid = apic->cpu_present_to_apicid(cpu);
|
||||
u32 cluster = apic_cluster(phys_apicid);
|
||||
u32 logical_apicid = (cluster << 16) | (1 << (phys_apicid & 0xf));
|
||||
|
||||
x86_cpu_to_logical_apicid[cpu] = logical_apicid;
|
||||
|
||||
if (alloc_clustermask(cpu, cluster, cpu_to_node(cpu)) < 0)
|
||||
return -ENOMEM;
|
||||
if (!zalloc_cpumask_var(&per_cpu(ipi_mask, cpu), GFP_KERNEL))
|
||||
return -ENOMEM;
|
||||
@ -162,10 +200,10 @@ static int x2apic_prepare_cpu(unsigned int cpu)
|
||||
|
||||
static int x2apic_dead_cpu(unsigned int dead_cpu)
|
||||
{
|
||||
struct cluster_mask *cmsk = per_cpu(cluster_masks, dead_cpu);
|
||||
struct cpumask *cmsk = per_cpu(cluster_masks, dead_cpu);
|
||||
|
||||
if (cmsk)
|
||||
cpumask_clear_cpu(dead_cpu, &cmsk->mask);
|
||||
cpumask_clear_cpu(dead_cpu, cmsk);
|
||||
free_cpumask_var(per_cpu(ipi_mask, dead_cpu));
|
||||
return 0;
|
||||
}
|
||||
|
||||
@ -115,6 +115,7 @@ static void __used common(void)
|
||||
OFFSET(TSS_sp1, tss_struct, x86_tss.sp1);
|
||||
OFFSET(TSS_sp2, tss_struct, x86_tss.sp2);
|
||||
OFFSET(X86_top_of_stack, pcpu_hot, top_of_stack);
|
||||
OFFSET(X86_current_task, pcpu_hot, current_task);
|
||||
#ifdef CONFIG_CALL_DEPTH_TRACKING
|
||||
OFFSET(X86_call_depth, pcpu_hot, call_depth);
|
||||
#endif
|
||||
|
||||
@ -61,23 +61,15 @@ SYM_CODE_START_NOALIGN(startup_64)
|
||||
* tables and then reload them.
|
||||
*/
|
||||
|
||||
/* Set up the stack for verify_cpu(), similar to initial_stack below */
|
||||
leaq (__end_init_task - FRAME_SIZE)(%rip), %rsp
|
||||
/* Set up the stack for verify_cpu() */
|
||||
leaq (__end_init_task - PTREGS_SIZE)(%rip), %rsp
|
||||
|
||||
leaq _text(%rip), %rdi
|
||||
|
||||
/*
|
||||
* initial_gs points to initial fixed_percpu_data struct with storage for
|
||||
* the stack protector canary. Global pointer fixups are needed at this
|
||||
* stage, so apply them as is done in fixup_pointer(), and initialize %gs
|
||||
* such that the canary can be accessed at %gs:40 for subsequent C calls.
|
||||
*/
|
||||
/* Setup GSBASE to allow stack canary access for C code */
|
||||
movl $MSR_GS_BASE, %ecx
|
||||
movq initial_gs(%rip), %rax
|
||||
movq $_text, %rdx
|
||||
subq %rdx, %rax
|
||||
addq %rdi, %rax
|
||||
movq %rax, %rdx
|
||||
leaq INIT_PER_CPU_VAR(fixed_percpu_data)(%rip), %rdx
|
||||
movl %edx, %eax
|
||||
shrq $32, %rdx
|
||||
wrmsr
|
||||
|
||||
@ -241,13 +233,36 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
|
||||
UNWIND_HINT_EMPTY
|
||||
ANNOTATE_NOENDBR // above
|
||||
|
||||
#ifdef CONFIG_SMP
|
||||
movl smpboot_control(%rip), %ecx
|
||||
|
||||
/* Get the per cpu offset for the given CPU# which is in ECX */
|
||||
movq __per_cpu_offset(,%rcx,8), %rdx
|
||||
#else
|
||||
xorl %edx, %edx /* zero-extended to clear all of RDX */
|
||||
#endif /* CONFIG_SMP */
|
||||
|
||||
/*
|
||||
* Setup a boot time stack - Any secondary CPU will have lost its stack
|
||||
* by now because the cr3-switch above unmaps the real-mode stack.
|
||||
*
|
||||
* RDX contains the per-cpu offset
|
||||
*/
|
||||
movq pcpu_hot + X86_current_task(%rdx), %rax
|
||||
movq TASK_threadsp(%rax), %rsp
|
||||
|
||||
/*
|
||||
* We must switch to a new descriptor in kernel space for the GDT
|
||||
* because soon the kernel won't have access anymore to the userspace
|
||||
* addresses where we're currently running on. We have to do that here
|
||||
* because in 32bit we couldn't load a 64bit linear address.
|
||||
*/
|
||||
lgdt early_gdt_descr(%rip)
|
||||
subq $16, %rsp
|
||||
movw $(GDT_SIZE-1), (%rsp)
|
||||
leaq gdt_page(%rdx), %rax
|
||||
movq %rax, 2(%rsp)
|
||||
lgdt (%rsp)
|
||||
addq $16, %rsp
|
||||
|
||||
/* set up data segments */
|
||||
xorl %eax,%eax
|
||||
@ -271,16 +286,13 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
|
||||
* the per cpu areas are set up.
|
||||
*/
|
||||
movl $MSR_GS_BASE,%ecx
|
||||
movl initial_gs(%rip),%eax
|
||||
movl initial_gs+4(%rip),%edx
|
||||
#ifndef CONFIG_SMP
|
||||
leaq INIT_PER_CPU_VAR(fixed_percpu_data)(%rip), %rdx
|
||||
#endif
|
||||
movl %edx, %eax
|
||||
shrq $32, %rdx
|
||||
wrmsr
|
||||
|
||||
/*
|
||||
* Setup a boot time stack - Any secondary CPU will have lost its stack
|
||||
* by now because the cr3-switch above unmaps the real-mode stack
|
||||
*/
|
||||
movq initial_stack(%rip), %rsp
|
||||
|
||||
/* Setup and Load IDT */
|
||||
pushq %rsi
|
||||
call early_setup_idt
|
||||
@ -372,7 +384,11 @@ SYM_CODE_END(secondary_startup_64)
|
||||
SYM_CODE_START(start_cpu0)
|
||||
ANNOTATE_NOENDBR
|
||||
UNWIND_HINT_EMPTY
|
||||
movq initial_stack(%rip), %rsp
|
||||
|
||||
/* Find the idle task stack */
|
||||
movq PER_CPU_VAR(pcpu_hot) + X86_current_task, %rcx
|
||||
movq TASK_threadsp(%rcx), %rsp
|
||||
|
||||
jmp .Ljump_to_C_code
|
||||
SYM_CODE_END(start_cpu0)
|
||||
#endif
|
||||
@ -416,16 +432,9 @@ SYM_CODE_END(vc_boot_ghcb)
|
||||
__REFDATA
|
||||
.balign 8
|
||||
SYM_DATA(initial_code, .quad x86_64_start_kernel)
|
||||
SYM_DATA(initial_gs, .quad INIT_PER_CPU_VAR(fixed_percpu_data))
|
||||
#ifdef CONFIG_AMD_MEM_ENCRYPT
|
||||
SYM_DATA(initial_vc_handler, .quad handle_vc_boot_ghcb)
|
||||
#endif
|
||||
|
||||
/*
|
||||
* The FRAME_SIZE gap is a convention which helps the in-kernel unwinder
|
||||
* reliably detect the end of the stack.
|
||||
*/
|
||||
SYM_DATA(initial_stack, .quad init_thread_union + THREAD_SIZE - FRAME_SIZE)
|
||||
__FINITDATA
|
||||
|
||||
__INIT
|
||||
@ -657,8 +666,7 @@ SYM_DATA_END(level1_fixmap_pgt)
|
||||
.data
|
||||
.align 16
|
||||
|
||||
SYM_DATA(early_gdt_descr, .word GDT_ENTRIES*8-1)
|
||||
SYM_DATA_LOCAL(early_gdt_descr_base, .quad INIT_PER_CPU_VAR(gdt_page))
|
||||
SYM_DATA(smpboot_control, .long 0)
|
||||
|
||||
.align 16
|
||||
/* This must match the first entry in level2_kernel_pgt */
|
||||
|
||||
@ -121,17 +121,20 @@ int arch_update_cpu_topology(void)
|
||||
return retval;
|
||||
}
|
||||
|
||||
|
||||
static unsigned int smpboot_warm_reset_vector_count;
|
||||
|
||||
static inline void smpboot_setup_warm_reset_vector(unsigned long start_eip)
|
||||
{
|
||||
unsigned long flags;
|
||||
|
||||
spin_lock_irqsave(&rtc_lock, flags);
|
||||
CMOS_WRITE(0xa, 0xf);
|
||||
if (!smpboot_warm_reset_vector_count++) {
|
||||
CMOS_WRITE(0xa, 0xf);
|
||||
*((volatile unsigned short *)phys_to_virt(TRAMPOLINE_PHYS_HIGH)) = start_eip >> 4;
|
||||
*((volatile unsigned short *)phys_to_virt(TRAMPOLINE_PHYS_LOW)) = start_eip & 0xf;
|
||||
}
|
||||
spin_unlock_irqrestore(&rtc_lock, flags);
|
||||
*((volatile unsigned short *)phys_to_virt(TRAMPOLINE_PHYS_HIGH)) =
|
||||
start_eip >> 4;
|
||||
*((volatile unsigned short *)phys_to_virt(TRAMPOLINE_PHYS_LOW)) =
|
||||
start_eip & 0xf;
|
||||
}
|
||||
|
||||
static inline void smpboot_restore_warm_reset_vector(void)
|
||||
@ -143,10 +146,12 @@ static inline void smpboot_restore_warm_reset_vector(void)
|
||||
* to default values.
|
||||
*/
|
||||
spin_lock_irqsave(&rtc_lock, flags);
|
||||
CMOS_WRITE(0, 0xf);
|
||||
if (!--smpboot_warm_reset_vector_count) {
|
||||
CMOS_WRITE(0, 0xf);
|
||||
*((volatile u32 *)phys_to_virt(TRAMPOLINE_PHYS_LOW)) = 0;
|
||||
}
|
||||
spin_unlock_irqrestore(&rtc_lock, flags);
|
||||
|
||||
*((volatile u32 *)phys_to_virt(TRAMPOLINE_PHYS_LOW)) = 0;
|
||||
}
|
||||
|
||||
/*
|
||||
@ -1059,8 +1064,6 @@ int common_cpu_up(unsigned int cpu, struct task_struct *idle)
|
||||
#ifdef CONFIG_X86_32
|
||||
/* Stack for startup_32 can be just as for start_secondary onwards */
|
||||
per_cpu(pcpu_hot.top_of_stack, cpu) = task_top_of_stack(idle);
|
||||
#else
|
||||
initial_gs = per_cpu_offset(cpu);
|
||||
#endif
|
||||
return 0;
|
||||
}
|
||||
@ -1086,9 +1089,14 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,
|
||||
start_ip = real_mode_header->trampoline_start64;
|
||||
#endif
|
||||
idle->thread.sp = (unsigned long)task_pt_regs(idle);
|
||||
early_gdt_descr.address = (unsigned long)get_cpu_gdt_rw(cpu);
|
||||
initial_code = (unsigned long)start_secondary;
|
||||
initial_stack = idle->thread.sp;
|
||||
|
||||
if (IS_ENABLED(CONFIG_X86_32)) {
|
||||
early_gdt_descr.address = (unsigned long)get_cpu_gdt_rw(cpu);
|
||||
initial_stack = idle->thread.sp;
|
||||
} else {
|
||||
smpboot_control = cpu;
|
||||
}
|
||||
|
||||
/* Enable the espfix hack for this CPU */
|
||||
init_espfix_ap(cpu);
|
||||
|
||||
@ -49,7 +49,7 @@ SYM_CODE_START(startup_xen)
|
||||
ANNOTATE_NOENDBR
|
||||
cld
|
||||
|
||||
mov initial_stack(%rip), %rsp
|
||||
leaq (__end_init_task - PTREGS_SIZE)(%rip), %rsp
|
||||
|
||||
/* Set up %gs.
|
||||
*
|
||||
|
||||
Loading…
Reference in New Issue
Block a user