Currntly, if we are not doing UFO on the packet, all UDP
packets will start with CHECKSUM_NONE and thus perform full
checksum computations in software even if device support
IPv6 checksum offloading.
Let's start start with CHECKSUM_PARTIAL if the device
supports it and we are sending only a single packet at
or below mtu size.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit adds the same functionaliy to IPv6 that
commit 903ab86d19
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue Mar 1 02:36:48 2011 +0000
udp: Add lockless transmit path
added to IPv4.
UDP transmit path can now run without a socket lock,
thus allowing multiple threads to send to a single socket
more efficiently.
This is only used when corking/MSG_MORE is not used.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we can individually construct IPv6 skbs to send, add a
udpv6_send_skb() function to populate the udp header and send the
skb. This allows udp_v6_push_pending_frames() to re-use this
function as well as enables us to add lockless sendmsg() support.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is very similar to
commit 1c32c5ad6f
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue Mar 1 02:36:47 2011 +0000
inet: Add ip_make_skb and ip_finish_skb
It adds IPv6 version of the helpers ip6_make_skb and ip6_finish_skb.
The job of ip6_make_skb is to collect messages into an ipv6 packet
and poplulate ipv6 eader. The job of ip6_finish_skb is to transmit
the generated skb. Together they replicated the job of
ip6_push_pending_frames() while also provide the capability to be
called independently. This will be needed to add lockless UDP sendmsg
support.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the ability to append data to arbitrary queue. This
will be needed later to implement lockless UDP sends.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull IPv6 cork initialization into its own function that
can be re-used. IPv6 specific cork data did not have an
explicit data structure. This patch creats eone so that
just ipv6 cork data can be as arguemts. Also, since
IPv6 tries to save the flow label into inet_cork_full
tructure, pass the full cork.
Adjust ip6_cork_release() to take cork data structures.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
o removed an unecessary INIT_LIST_HEAD after LIST_HEAD
o merge a declare & INIT_LIST_HEAD pair into one LIST_HEAD
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Below test will fail currently:
mkfs.ext4 -F /dev/sda
btrfs-convert /dev/sda
mount /dev/sda /mnt
btrfs device add -f /dev/sdb /mnt
btrfs balance start -v -dconvert=raid1 -mconvert=raid1 /mnt
The reason is there are some block groups with usage 0, but the whole
disk hasn't free space to allocate new chunk, so we even can't set such
block group readonly. This patch deletes the chunk allocation when
setting block group ro. For META, we already have reserve. But for
SYSTEM, we don't have, so the check_system_chunk is still required.
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
Verify that the sys_array has enough bytes to read the next item.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
There's a pointer to buffer, integer offset and offset passed as
pointer, try to find matching names for them.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Verify that possible minimum and maximum size is set, validity of
contents is checked in btrfs_read_sys_array.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
I received a few crafted images from Jiri, all got through the recently
added superblock checks. The lower bounds checks for num_devices and
sector/node -sizes were missing and caused a crash during mount.
Tools for symbolic code execution were used to prepare the images
contents.
Reported-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
* Add support for IEEE ets & pfc api.
* Fix bug that resulted in incorrect bandwidth percentage being returned for
CEE peers
* Convert pfc enabled info from firmware format to what dcbnl expects before
returning
Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ARM has 32-byte cache lines, which according to the comment in
the init registers function seems to work best with the default
value of 0x4800 that is also used on sparc and parisc.
This adds ARM to the same list, to use that default but no
longer warn about it.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The hip04 ethernet driver causes a new compile-time warning
when built as a loadable module:
WARNING: modpost: missing MODULE_LICENSE() in drivers/net/ethernet/hisilicon/hip04_eth.o
see include/linux/module.h for more information
This adds the license as "GPL", which matches the header of the file.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update netlink_mmap.txt wrt. commit 4682a03586
("netlink: Always copy on mmap TX.").
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
One deployment requirement of DCTCP is to be able to run
in a DC setting along with TCP traffic. As Glenn Judd's
NSDI'15 paper "Attaining the Promise and Avoiding the Pitfalls
of TCP in the Datacenter" [1] (tba) explains, one way to
solve this on switch side is to split DCTCP and TCP traffic
in two queues per switch port based on the DSCP: one queue
soley intended for DCTCP traffic and one for non-DCTCP traffic.
For the DCTCP queue, there's the marking threshold K as
explained in commit e3118e8359 ("net: tcp: add DCTCP congestion
control algorithm") for RED marking ECT(0) packets with CE.
For the non-DCTCP queue, there's f.e. a classic tail drop queue.
As already explained in e3118e8359, running DCTCP at scale
when not marking SYN/SYN-ACK packets with ECT(0) has severe
consequences as for non-ECT(0) packets, traversing the RED
marking DCTCP queue will result in a severe reduction of
connection probability.
This is due to the DCTCP queue being dominated by ECT(0) traffic
and switches handle non-ECT traffic in the RED marking queue
after passing K as drops, where K is usually a low watermark
in order to leave enough tailroom for bursts. Splitting DCTCP
traffic among several queues (ECN and non-ECN queue) is being
considered a terrible idea in the network community as it
splits single flows across multiple network paths.
Therefore, commit e3118e8359 implements this on Linux as
ECT(0) marked traffic, as we argue that marking all packets
of a DCTCP flow is the only viable solution and also doesn't
speak against the draft.
However, recently, a DCTCP implementation for FreeBSD hit also
their mainline kernel [2]. In order to let them play well
together with Linux' DCTCP, we would need to loosen the
requirement that ECT(0) has to be asserted during the 3WHS as
not implemented in FreeBSD. This simplifies the ECN test and
lets DCTCP work together with FreeBSD.
Joint work with Daniel Borkmann.
[1] https://www.usenix.org/conference/nsdi15/technical-sessions/presentation/judd
[2] https://github.com/freebsd/freebsd/commit/8ad879445281027858a7fa706d13e458095b595f
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Glenn Judd <glenn.judd@morganstanley.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn says:
====================
net-timestamp: blinding
Changes
(v2 -> v3)
- rebase only: v2 did not make it to patchwork / netdev
(v1 -> v2)
- fix capability check in patch 2
this could be moved into net/core/sock.c as sk_capable_nouser()
(rfc -> v1)
- dropped patch 4: timestamp batching
due to complexity, as discussed
- dropped patch 5: default mode
because it does not really cover all use cases, as discussed
- added documentation
- minor fix, see patch 2
Two issues were raised during recent timestamping discussions:
1. looping full packets on the error queue exposes packet headers
2. TCP timestamping with retransmissions generates many timestamps
This RFC patchset is an attempt at addressing both without breaking
legacy behavior.
Patch 1 reintroduces the "no payload" timestamp option, which loops
timestamps onto an empty skb. This reduces the pressure on SO_RCVBUF
from looping many timestamps. It does not reduce the number of recv()
calls needed to process them. The timestamp cookie mechanism developed
in http://patchwork.ozlabs.org/patch/427213/ did, but this is
considerably simpler.
Patch 2 then gives administrators the power to block all timestamp
requests that contain data by unprivileged users. I proposed this
earlier as a backward compatible workaround in the discussion of
net-timestamp: pull headers for SOCK_STREAM
http://patchwork.ozlabs.org/patch/414810/
Patch 3 only updates the txtimestamp example to test this option.
Verified that with option '-n', length is zero in all cases and
option '-I' (PKTINFO) stops working.
====================
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Demonstrate how SOF_TIMESTAMPING_OPT_TSONLY can be used and
test the implementation.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tx timestamps are looped onto the error queue on top of an skb. This
mechanism leaks packet headers to processes unless the no-payload
options SOF_TIMESTAMPING_OPT_TSONLY is set.
Add a sysctl that optionally drops looped timestamp with data. This
only affects processes without CAP_NET_RAW.
The policy is checked when timestamps are generated in the stack.
It is possible for timestamps with data to be reported after the
sysctl is set, if these were queued internally earlier.
No vulnerability is immediately known that exploits knowledge
gleaned from packet headers, but it may still be preferable to allow
administrators to lock down this path at the cost of possible
breakage of legacy applications.
Signed-off-by: Willem de Bruijn <willemb@google.com>
----
Changes
(v1 -> v2)
- test socket CAP_NET_RAW instead of capable(CAP_NET_RAW)
(rfc -> v1)
- document the sysctl in Documentation/sysctl/net.txt
- fix access control race: read .._OPT_TSONLY only once,
use same value for permission check and skb generation.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add timestamping option SOF_TIMESTAMPING_OPT_TSONLY. For transmit
timestamps, this loops timestamps on top of empty packets.
Doing so reduces the pressure on SO_RCVBUF. Payload inspection and
cmsg reception (aside from timestamps) are no longer possible. This
works together with a follow on patch that allows administrators to
only allow tx timestamping if it does not loop payload or metadata.
Signed-off-by: Willem de Bruijn <willemb@google.com>
----
Changes (rfc -> v1)
- add documentation
- remove unnecessary skb->len test (thanks to Richard Cochran)
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a new member to the 'struct btrfs_inode' structure to hold
the file creation time.
Signed-off-by: chandan <chandanrmail@gmail.com>
[refreshed, removed btrfs_inode_otime]
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
This patch fixes a bug where vnet_skb_shape() didn't set the already-selected
queue mapping when a packet copy was required. This results in using the
wrong queue index for stops/starts, hung tx queues and watchdog timeouts
under heavy load.
Signed-off-by: David L Stevens <david.stevens@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently qlge_update_hw_vlan_features() will always first put the
interface down, then update features and then bring it up again. But it
is possible to hit this code while the adapter is down and this causes a
non-paired call to napi_disable(), which will get stuck.
This patch fixes it by skipping these down/up actions if the interface
is already down.
Fixes: a45adbe8d3 ("qlge: Enhance nested VLAN (Q-in-Q) handling.")
Cc: Harish Patil <harish.patil@qlogic.com>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Three small fixes that came up during last week, nothing scary:
- Accidently incremented a counter instead of decrementing it (copy-paste error)
- Module parameter of max num of queues must be at least 1 and not 0
- Don't do BUG() as a result from wrong user input
* tag 'drm-amdkfd-fixes-2015-02-02' of git://people.freedesktop.org/~gabbayo/linux:
drm/amdkfd: Don't create BUG due to incorrect user parameter
drm/amdkfd: max num of queues can't be 0
drm/amdkfd: Fix bug in accounting of queues
One last round of fixes for radeon for 3.19:
- fix some fallout from the reservation object integration on the
test/benchmark options
- fix a crash in the gpu vm code if gfx init fails
- fix a pll issue that leads to a blank screen on older IGP parts
* 'drm-fixes-3.19' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon: fix the crash in test functions
drm/radeon: fix the crash in benchmark functions
drm/radeon: properly set vm fragment size for TN/RL
drm/radeon: don't init gpuvm if accel is disabled (v3)
drm/radeon: fix PLLs on RS880 and older v2
Currently when a mode is rejected the reason is printed as a raw number.
Having to manually decode that to a enum drm_mode_status value is
tiresome. Have the code do the decoding instead and print the result
in a human readable format.
Just having an array of strings indexed with the mode status doesn't
work since the enum includes negative values. So we offset the status
by +3 which makes all the indexes non-negative. Also add a bit of
paranoia into the code to catch out of bounds accesses in case
someone adds more enum values but forgets to update the code.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
DRM_GEM_CMA_HELPER is depend on HAVE_DMA_ATTRS, or it will break the
building. The related error (with allmodconfig under xtensa):
CC [M] drivers/gpu/drm/drm_gem_cma_helper.o
drivers/gpu/drm/drm_gem_cma_helper.c: In function 'drm_gem_cma_create':
drivers/gpu/drm/drm_gem_cma_helper.c:110:19: error: implicit declaration of function 'dma_alloc_writecombine' [-Werror=implicit-function-declaration]
cma_obj->vaddr = dma_alloc_writecombine(drm->dev, size,
^
drivers/gpu/drm/drm_gem_cma_helper.c:110:17: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
cma_obj->vaddr = dma_alloc_writecombine(drm->dev, size,
^
drivers/gpu/drm/drm_gem_cma_helper.c: In function 'drm_gem_cma_free_object':
drivers/gpu/drm/drm_gem_cma_helper.c:193:3: error: implicit declaration of function 'dma_free_writecombine' [-Werror=implicit-function-declaration]
dma_free_writecombine(gem_obj->dev->dev, cma_obj->base.size,
^
drivers/gpu/drm/drm_gem_cma_helper.c: In function 'drm_gem_cma_mmap_obj':
drivers/gpu/drm/drm_gem_cma_helper.c:330:8: error: implicit declaration of function 'dma_mmap_writecombine' [-Werror=implicit-function-declaration]
ret = dma_mmap_writecombine(cma_obj->base.dev->dev, vma,
^
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In __cpufreq_remove_dev_finish(), per-cpu 'cpufreq_cpu_data' needs
to be cleared before calling kobject_put(&policy->kobj) and under
cpufreq_driver_lock. Otherwise, if someone else calls cpufreq_cpu_get()
in parallel with it, they can obtain a non-NULL policy from that after
kobject_put(&policy->kobj) was executed.
Consider this case:
Thread A Thread B
cpufreq_cpu_get()
acquire cpufreq_driver_lock
read-per-cpu cpufreq_cpu_data
kobject_put(&policy->kobj);
kobject_get(&policy->kobj);
...
per_cpu(&cpufreq_cpu_data, cpu) = NULL
And this will result in a warning like this one:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 4 at include/linux/kref.h:47
kobject_get+0x41/0x50()
Modules linked in: acpi_cpufreq(+) nfsd auth_rpcgss nfs_acl
lockd grace sunrpc xfs libcrc32c sd_mod ixgbe igb mdio ahci hwmon
...
Call Trace:
[<ffffffff81661b14>] dump_stack+0x46/0x58
[<ffffffff81072b61>] warn_slowpath_common+0x81/0xa0
[<ffffffff81072c7a>] warn_slowpath_null+0x1a/0x20
[<ffffffff812e16d1>] kobject_get+0x41/0x50
[<ffffffff815262a5>] cpufreq_cpu_get+0x75/0xc0
[<ffffffff81527c3e>] cpufreq_update_policy+0x2e/0x1f0
[<ffffffff810b8cb2>] ? up+0x32/0x50
[<ffffffff81381aa9>] ? acpi_ns_get_node+0xcb/0xf2
[<ffffffff81381efd>] ? acpi_evaluate_object+0x22c/0x252
[<ffffffff813824f6>] ? acpi_get_handle+0x95/0xc0
[<ffffffff81360967>] ? acpi_has_method+0x25/0x40
[<ffffffff81391e08>] acpi_processor_ppc_has_changed+0x77/0x82
[<ffffffff81089566>] ? move_linked_works+0x66/0x90
[<ffffffff8138e8ed>] acpi_processor_notify+0x58/0xe7
[<ffffffff8137410c>] acpi_ev_notify_dispatch+0x44/0x5c
[<ffffffff8135f293>] acpi_os_execute_deferred+0x15/0x22
[<ffffffff8108c910>] process_one_work+0x160/0x410
[<ffffffff8108d05b>] worker_thread+0x11b/0x520
[<ffffffff8108cf40>] ? rescuer_thread+0x380/0x380
[<ffffffff81092421>] kthread+0xe1/0x100
[<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
[<ffffffff81669ebc>] ret_from_fork+0x7c/0xb0
[<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
---[ end trace 89e66eb9795efdf7 ]---
The actual code flow is as follows:
Thread A: Workqueue: kacpi_notify
acpi_processor_notify()
acpi_processor_ppc_has_changed()
cpufreq_update_policy()
cpufreq_cpu_get()
kobject_get()
Thread B: xenbus_thread()
xenbus_thread()
msg->u.watch.handle->callback()
handle_vcpu_hotplug_event()
vcpu_hotplug()
cpu_down()
__cpu_notify(CPU_POST_DEAD..)
cpufreq_cpu_callback()
__cpufreq_remove_dev_finish()
cpufreq_policy_put_kobj()
kobject_put()
cpufreq_cpu_get() gets the policy from per-cpu variable cpufreq_cpu_data
under cpufreq_driver_lock, and once it gets a valid policy it expects it
to not be freed until cpufreq_cpu_put() is called.
But the race happens when another thread puts the kobject first and updates
cpufreq_cpu_data before or later. And so the first thread gets a valid policy
structure and before it does kobject_get() on it, the second one has already
done kobject_put().
Fix this by setting cpufreq_cpu_data to NULL before putting the kobject and that
too under locks.
Reported-by: Ethan Zhao <ethan.zhao@oracle.com>
Reported-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 3.12+ <stable@vger.kernel.org> # 3.12+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Since acpi_idle_enter_bm() is only used if flags.bm_check is set for
the given acpi_processor object, it doesn't make sense to check that
flag in there.
For this reason, drop flags.bm_check tests (and some code depending
on them) from acpi_idle_enter_bm().
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
White space in the switch statement in acpi_processor_setup_cpuidle_states()
does not adhere to the kernel coding style and that makes the code
difficult to read. Clean that up.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The comment about bus master disable in acpi_idle_enter_simple() is
irrelevant, because the function doesn't disable bus mastering, so
drop it.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
acpi_idle_enter_simple() and acpi_idle_enter_bm() both check
if C2/C3 type entry is supported on MP in the same way, so move
those checks to a separate function and call it from both
places (and it doesn't need to check if the state type is not
C1, because the functions in question won't be called otherwise).
While at it, use IS_ENABLED() for the CONFIG_HOTPLUG_CPU check.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
acpi_idle_enter_simple() and acpi_idle_enter_bm() don't need to
call sched_clock_idle_sleep/wakeup_event(), because that's taken
care of by the core already. Namely, sched_clock_idle_sleep_event()
is called by tick_nohz_start_idle() called by __tick_nohz_idle_enter()
which in turn is called by tick_nohz_idle_enter() and that is called
by cpu_idle_loop(). Analogously for sched_clock_idle_wakeup_event().
For this reason, drop those calls from the ACPI cpuidle driver's
->enter callback routines.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Since the cpuidle core calls stop_critical_timings() and
start_critical_timings() around the execution of ->enter callbacks,
acpi_idle_do_entry() doesn't need to do that (and really shouldn't).
Also using "inline" on it is pointless and the kerneldoc entry does
not reflect the actual situation any more.
Fix all of the above.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull devfreq changes for v3.20 from MyungJoo Ham.
* tag 'pull_req_20150129' of git://git.kernel.org/pub/scm/linux/kernel/git/mzx/devfreq:
PM / devfreq: event: Add documentation for exynos-ppmu devfreq-event driver
devfreq: Fix build break of devfreq-event class
PM / devfreq: event: Add devfreq_event class
PM / devfreq: tegra: add devfreq driver for Tegra Activity Monitor
Instead of overriding error codes, pass them on unmodified. This
way a EPROBE_DEFER is correctly passed to the driver core. This results
in the LED driver correctly requesting probe deferral in cases the GPIO
controller is not yet available.
Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
Reported-and-tested-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Bryan Wu <cooloney@gmail.com>
Adds a way for clock consumers to set maximum and minimum rates. This
can be used for thermal drivers to set minimum rates, or by misc.
drivers to set maximum rates to assure a minimum performance level.
Changes the signature of the determine_rate callback by adding the
parameters min_rate and max_rate.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
[sboyd@codeaurora.org: set req_rate in __clk_init]
Signed-off-by: Michael Turquette <mturquette@linaro.org>
[mturquette@linaro.org: min/max rate for sun6i_ahb1_clk_determine_rate
migrated clk-private.h changes to clk.c]