The hwcontrol function enforced a step by step state machine
for any kind of hardware chip access. Let the hardware driver
know which control bits are set and inform it about a change
of the control lines. Let the hardware driver write out the
command and address bytes directly. This gives a peformance
advantage for address bus controlled chips and simplifies the
quirks in the hardware drivers.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
These flags are needed by userspace - move them outside __KERNEL__
(Pointed out by dwmw2)
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
MTD clients are agnostic of FLASH which needs ECC suppport.
Remove the functions and fixup the callers.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
First step of modularizing ECC support.
- Move ECC related functionality into a seperate embedded data structure
- Get rid of the hardware dependend constants to simplify new ECC models
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The NAND driver used a mix of unsigned char, u_char amd uint8_t
data types. Consolidate to uint8_t usage
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Replace the chip lock by a the controller lock. For simple drivers a
dummy controller structure is created by the scan code.
This simplifies the locking algorithm in nand_get/release_chip().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
At least two flashes exists that have the concept of a minimum write unit,
similar to NAND pages, but no other NAND characteristics. Therefore, rename
the minimum write unit to "writesize" for all flashes, including NAND.
Signed-off-by: Joern Engel <joern@wh.fh-wedel.de>
This patch enables agpgart on a Via "PT880 Ultra" based motherboard
(Asus P4V800D-X). The PCI ID of the PT880 Ultra is 0x0308 instead of
0x0258 of the PT880.
The patched via-agp passes testgart.
Signed-off-by: Magnus Kessler <Magnus.Kessler@gmx.net>
Signed-off-by: Dave Jones <davej@redhat.com>
Andy added code to buddy allocator which does not require the zone's
endpoints to be aligned to MAX_ORDER. An issue is that the buddy allocator
requires the node_mem_map's endpoints to be MAX_ORDER aligned. Otherwise
__page_find_buddy could compute a buddy not in node_mem_map for partial
MAX_ORDER regions at zone's endpoints. page_is_buddy will detect that
these pages at endpoints are not PG_buddy (they were zeroed out by bootmem
allocator and not part of zone). Of course the negative here is we could
waste a little memory but the positive is eliminating all the old checks
for zone boundary conditions.
SPARSEMEM won't encounter this issue because of MAX_ORDER size constraint
when SPARSEMEM is configured. ia64 VIRTUAL_MEM_MAP doesn't need the logic
either because the holes and endpoints are handled differently. This
leaves checking alloc_remap and other arches which privately allocate for
node_mem_map.
Signed-off-by: Bob Picco <bob.picco@hp.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- remove the following global function that is both unused and
unimplemented:
- register_firmware()
- make the following needlessly global function static:
- firmware_class_uevent()
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This driver supports the SPI controller on the MPC83xx SoC devices from
Freescale. Note, this driver supports only the simple shift register SPI
controller and not the descriptor based CPM or QUICCEngine SPI controller.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix the HSM error_mask mapping.
Changes:
- Better mapping in ac_err_mask()
- In HSM_ST_FIRST ans HSM_ST state, check ATA_ERR|ATA_DF and map it to AC_ERR_DEV instead of AC_ERR_HSM.
- In HSM_ST_FIRST and HSM_ST state, map DRQ=1 ERR=1 to AC_ERR_HSM.
- For PIO data in and DRQ=1 ERR=1, add check after the junk data block is read.
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Patch from Pavel Pisa
There has been problems that for some paths that clock are not stopped
during new command programming and initiation. Result is issuing
of incorrect command to the card. Some other problems are cleaned too.
Noisy report of known ERRATUM #4 has been suppressed.
Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Device node major/minor numbers are just stored in the payload of a single
data node. Just extend that to 4 bytes and use new_encode_dev() for it.
We only use the 4-byte format if we _need_ to, if !old_valid_dev(foo).
This preserves backwards compatibility with older code as much as
possible. If we do make devices with major or minor numbers above 255, and
then mount the file system with the old code, it'll just read the first
two bytes and get the numbers wrong. If it comes to garbage-collect it,
it'll then write back those wrong numbers. But that's about the best we
can expect.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
We have to pack at least the jint16_t structure, because otherwise it'll
be four bytes in size. Thankfully, we can do that and _not_ pack the
actual node structures, and the compiler still doesn't emit stupid code.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
We need to be able to have a "SPI bus 0" matching chip numbering; but
that number was wrongly used to flag dynamic allocation of a bus number.
This patch resolves that issue; now negative numbers trigger dynamic alloc.
It also updates the how-to-write-a-controller-driver overview to mention
this stuff.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add spi_device hook for LSB-first word encoding, and update all the
(in-tree) controller drivers to reject such devices. Eventually,
some controller drivers will be updated to support lsb-first encodings
on the wire; no current drivers need this.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Renamed bitbang_transfer_setup to follow convention of other exported symbols
from spi-bitbang. Exported spi_bitbang_setup_transfer to allow users of
spi-bitbang to use the function in their own setup_transfer.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Cc: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This removes superfluous whitespace in the <linux/spi/spi.h> header.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Some protocols (like one for some bitmap displays) require different clock
speed or word size settings for each transfer in an SPI message. This adds
those parameters to struct spi_transfer. They are to be used when they are
nonzero; otherwise the defaults from spi_device are to be used.
The patch also adds a setup_transfer callback to spi_bitbang, uses it for
messages that use those overrides, and implements it so that the pure
bitbanging code can help resolve any questions about how it should work.
Signed-off-by: Imre Deak <imre.deak@nokia.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
can_share_swap_page() is used to check if the page has the last reference.
This avoids allocating a new page for COW if it's the last page.
However, if CONFIG_SWAP is not set, can_share_swap_page() is defined as 0,
thus always causes a copy for the last COW page. The below simple patch
fixes it.
Signed-off-by: Hua Zhong <hzhong@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
slab_is_available() indicates slab based allocators are available for use.
SPARSEMEM code needs to know this as it can be called at various times
during the boot process.
Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Even since a previous patch:
Fix race between CONFIG_DEBUG_SLABALLOC and modules
Sun, 27 Jun 2004 17:55:19 +0000 (17:55 +0000)
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/old-2.6-bkcvs.git;a=commit;h=92b3db26d31cf21b70e3c1eadc56c179506d8fbe
The function symbol_put_addr() will deadlock the kernel.
symbol_put_addr() would acquire modlist_lock, then while holding the lock call
two functions kernel_text_address() and module_text_address() which also try
to acquire the same lock. This deadlocks the kernel of course.
This patch changes symbol_put_addr() to not acquire the modlist_lock, it
doesn't need it since it never looks at the module list directly. Also, it
now uses core_kernel_text() instead of kernel_text_address(). The latter has
an additional check for addr inside a module, but we don't need to do that
since we call module_text_address() (the same function kernel_text_address
uses) ourselves.
Signed-off-by: Trent Piepho <xyzzy@speakeasy.org>
Cc: Zwane Mwaikambo <zwane@fsmlabs.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Johannes Stezenbach <js@linuxtv.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With "Paul E. McKenney" <paulmck@us.ibm.com>
Introduce rcu_needs_cpu() interface. This can be used to tell if there
will be a new rcu batch on a cpu soon by looking at the curlist pointer.
This can be used to avoid to enter a tickless idle state where the cpu
would miss that a new batch is ready when rcu_start_batch would be called
on a different cpu.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now that all NCQ related stuff are in place, implement NCQ device
configuration and bump ATA_MAX_QUEUE to 32 thus activating NCQ
support.
Original implementation is from Jens Axboe.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Add ap->qc_active and ap->sactive, mask of all active qcs and libata's
view of the SActive register, respectively. Also, implement
ata_qc_complete_multiple() which takes new qc_active mask and complete
multiple qcs according to the mask.
These will be used to track NCQ commands and complete them. The
distinction between ap->qc_active and ap->sactive is also useful for
later PM implementation.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Rename ap->qactive to ap->qc_allocated. This is to accomodate
addition of ap->qc_active, mask of active qcs.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Implement new EH. The exported interface is ata_do_eh() which is to
be called from ->error_handler and performs the following steps to
recover the failed port.
ata_eh_autopsy() : analyze SError/TF, determine the cause of failure
and required recovery actions and record it in
ap->eh_context
ata_eh_report() : report the failure to user
ata_eh_recover() : perform recovery actions described in ap->eh_context
ata_eh_finish() : finish failed qcs
LLDDs can customize error handling by modifying eh_context before
calling ata_do_eh() or, if necessary, doing so inbetween each major
steps by calling each step explicitly.
Signed-off-by: Tejun Heo <htejun@gmail.com>
struct ata_eh_info serves as the communication channel between
execution path and EH. Execution path describes detected error
condition in ap->eh_info and EH recovers the port using it. To avoid
missing error conditions detected during EH, EH makes its own copy of
eh_info and clears it on entry allowing error info to accumulate
during EH.
Most EH states including EH's copy of eh_info are stored in
ap->eh_context (struct ata_eh_context) which is owned by EH and thus
doesn't require any synchronization to access and alter. This
standardized context makes it easy to integrate various parts of EH
and extend EH to handle multiple links (for PM).
Signed-off-by: Tejun Heo <htejun@gmail.com>
This patch implements ata_ering and uses it to define dev->ering.
ata_ering is a ring buffer which records libata errors - whether a
command was for normar IO request, err_mask and timestamp. Errors are
recorded per-device in dev->ering. This will be used by EH to
determine recovery actions.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Update ata_scsi_error() for new EH. ata_scsi_error() is responsible
for claiming timed out qcs and invoking ->error_handler in safe and
synchronized manner. As the state of the controller is unknown if a
qc has timed out, the port is frozen in such cases.
Note that ata_scsi_timed_out() isn't used for new EH. This is because
a timed out qc cannot be claimed by EH without freezing the port and
freezing the port in ata_scsi_timed_out() results in unnecessary
abortion of other active qcs. ata_scsi_timed_out() can be removed
once all drivers are converted to new EH.
While at it, add 'TODO: kill' comments to old EH functions.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Freezing is performed atomic w.r.t. host_set->lock and once frozen
LLDD is not allowed to access the port or any qc on it. Also, libata
makes sure that no new qc gets issued to a frozen port.
A frozen port is thawed after a reset operation completes
successfully, so reset methods must do its job while the port is
frozen. During initialization all ports get frozen before requesting
IRQ, so reset methods are always invoked on a frozen port.
Optional ->freeze and ->thaw operations notify LLDD that the port is
being frozen and thawed, respectively. LLDD can disable/enable
hardware interrupt in these callbacks if the controller's IRQ mask can
be changed dynamically. If the controller doesn't allow such
operation, LLDD can check for frozen state in the interrupt handler
and ack/clear interrupts unconditionally while frozen.
Signed-off-by: Tejun Heo <htejun@gmail.com>
ata_port_schedule_eh() directly schedules EH for @ap without
associated qc. Once EH scheduled, no further qc is allowed and EH
kicks in as soon as all currently active qc's are drained.
ata_port_abort() schedules all currently active commands for EH by
qc_completing them with ATA_QCFLAG_FAILED set. If ata_port_abort()
doesn't find any qc to abort, it directly schedule EH using
ata_port_schedule_eh().
These two functions provide ways to invoke EH for conditions which
aren't directly related to any specfic qc.
Signed-off-by: Tejun Heo <htejun@gmail.com>
There are several ways a qc can get schedule for EH in new EH. This
patch implements one of them - completing a qc with ATA_QCFLAG_FAILED
set or with non-zero qc->err_mask. ALL such qc's are examined by EH.
New EH schedules a qc for EH from completion iff ->error_handler is
implemented, qc is marked as failed or qc->err_mask is non-zero and
the command is not an internal command (internal cmd is handled via
->post_internal_cmd). The EH scheduling itself is performed by asking
SCSI midlayer to schedule EH for the specified scmd.
For drivers implementing old-EH, nothing changes. As this change
makes ata_qc_complete() rather large, it's not inlined anymore and
__ata_qc_complete() is exported to other parts of libata for later
use.
Signed-off-by: Tejun Heo <htejun@gmail.com>
New EH framework has clear distinction about who owns a qc. Every qc
starts owned by normal execution path - PIO, interrupt or whatever.
When an exception condition occurs which affects the qc, the qc gets
scheduled for EH. Note that some events (say, link lost and regained,
command timeout) may schedule qc's which are not directly related but
could have been affected for EH too. Scheduling for EH is atomic
w.r.t. ap->host_set->lock and once schedule for EH, normal execution
path is not allowed to access the qc in whatever way. (PIO
synchronization acts a bit different and will be dealt with later)
This patch make ata_qc_from_tag() check whether a qc is active and
owned by normal path before returning it. If conditions don't match,
NULL is returned and thus access to the qc is denied.
__ata_qc_from_tag() is the original ata_qc_from_tag() and is used by
libata core/EH layers to access inactive/failed qc's.
This change is applied only if the associated LLDD implements new EH
as indicated by non-NULL ->error_handler
Signed-off-by: Tejun Heo <htejun@gmail.com>
New EH may issue internal commands to recover from error while failed
qc's are still hanging around. To allow such usage, reserve tag
ATA_MAX_QUEUE-1 for internal command. This also makes it easy to tell
whether a qc is for internal command or not. ata_tag_internal() test
implements this test.
To avoid breaking existing drivers, ata_exec_internal() uses
ATA_TAG_INTERNAL only for drivers which implement ->error_handler.
For drivers using old EH, tag 0 is used. Note that this makes
ata_tag_internal() test valid only when ->error_handler is
implemented. This is okay as drivers on old EH should not and does
not have any reason to use ata_tag_internal().
Signed-off-by: Tejun Heo <htejun@gmail.com>
Add ATA_FLAG_EH_{PENDING|FROZEN}, ATA_ATA_QCFLAG_{FAILED|SENSE_VALID}
and ops->freeze, thaw, error_handler, post_internal_cmd() for new EH.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Implement ata_{port|dev}_printk() which prefixes the message with
proper identification string. This change is necessary for later PM
support because devices and links should be identified differently
depending on how they are attached.
This also helps unifying device id strings. Currently, there are two
forms in use (P is the port number D device number) - 'ataP(D):', and
'ataP: dev D '. These macros also make it harder to forget proper ID
string (e.g. printing only port number when a device is in question).
Debug message handling can be integrated into these printk macros by
passing debug type and level via @lv.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Add dev->ap which points back to the port the device belongs to. This
makes it unnecessary to pass @ap for silly reasons (e.g. printks).
Also, this change is necessary to accomodate later PM support which
will introduce ATA link inbetween port and device.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Implement ata_scr_{valid|read|write|write_flush}() and
ata_port_{online|offline}(). These functions replace
scr_{read|write}() and sata_dev_present().
Major difference between between the new SCR functions and the old
ones is that the new ones have a way to signal error to the caller.
This makes handling SCR-available and SCR-unavailable cases in the
same path easier. Also, it eases later PM implementation where SCR
access can fail due to various reasons.
ata_port_{online|offline}() functions return 1 only when they are
affirmitive of the condition. e.g. if SCR is unaccessible or
presence cannot be determined for other reasons, these functions
return 0. So, ata_port_online() != !ata_port_offline(). This
distinction is useful in many exception handling cases.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Add qc->result_tf and ATA_QCFLAG_RESULT_TF. This moves the
responsibility of loading result TF from post-compltion path to qc
execution path. qc->result_tf is loaded if explicitly requested or
the qc failsa. This allows more efficient completion implementation
and correct handling of result TF for controllers which don't have
global TF representation such as sil3124/32.
Signed-off-by: Tejun Heo <htejun@gmail.com>
It's not a very good idea to allocate memory during EH. Use
statically allocated buffer for dev->id[] and add 512byte buffer
ap->sector_buf. This buffer is owned by EH (or probing) and to be
used as temporary buffer for various purposes (IDENTIFY, NCQ log page
10h, PM GSCR block).
Signed-off-by: Tejun Heo <htejun@gmail.com>
Rename ata_down_sata_spd_limit() and friends to sata_down_spd_limit()
and likewise for simplicity & consistency.
Signed-off-by: Tejun Heo <htejun@gmail.com>
If we use __attribute__((packed)), GCC will _also_ assume that the
structures aren't sensibly aligned, and it'll emit code to cope with
that instead of straight word load/save. This can be _very_ suboptimal
on architectures like ARM.
Ideally, we want an attribute which just tells GCC not to do any
padding, without the alignment side-effects. In the absense of that,
we'll just drop the 'packed' attribute and hope that everything stays as
it was (which to be fair is fairly much what we expect). And add some
paranoia checks in the initialisation code, which should be optimised
away completely in the normal case.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
This attached patches provide xattr support including POSIX-ACL and
SELinux support on JFFS2 (version.5).
There are some significant differences from previous version posted
at last December.
The biggest change is addition of EBS(Erase Block Summary) support.
Currently, both kernel and usermode utility (sumtool) can recognize
xattr nodes which have JFFS2_NODETYPE_XATTR/_XREF nodetype.
In addition, some bugs are fixed.
- A potential race condition was fixed.
- Unexpected fail when updating a xattr by same name/value pair was fixed.
- A bug when removing xattr name/value pair was fixed.
The fundamental structures (such as using two new nodetypes and exclusion
mechanism by rwsem) are unchanged. But most of implementation were reviewed
and updated if necessary.
Espacially, we had to change several internal implementations related to
load_xattr_datum() to avoid a potential race condition.
[1/2] xattr_on_jffs2.kernel.version-5.patch
[2/2] xattr_on_jffs2.utils.version-5.patch
Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
V4L1 API is depreciated and should be removed soon from kernel. This patch
adds two new options, one to disable V4L1 drivers, and another to disable
V4L1 compat module. This way, it would be easy to check what still depends
on V4L1 stuff, allowing also to test if app works fine with V4L2 only support.
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
One Block of the NAND Flash Array memory is reserved as
a One-Time Programmable Block memory area.
Also, 1st Block of NAND Flash Array can be used as OTP.
The OTP block can be read, programmed and locked using the same
operations as any other NAND Flash Array memory block.
OTP block cannot be erased.
OTP block is fully-guaranteed to be a valid block.
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
* master.kernel.org:/home/rmk/linux-2.6-serial:
[SERIAL] 8250: add locking to console write function
[SERIAL] Remove unconditional enable of TX irq for console
[SERIAL] 8250: set divisor register correctly for AMD Alchemy SoC uart
[SERIAL] AMD Alchemy UART: claim memory range
[SERIAL] Clean up serial locking when obtaining a reference to a port
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[NET_SCHED]: HFSC: fix thinko in hfsc_adjust_levels()
[IPV6]: skb leakage in inet6_csk_xmit
[BRIDGE]: Do sysfs registration inside rtnl.
[NET]: Do sysfs registration as part of register_netdevice.
[TG3]: Fix possible NULL deref in tg3_run_loopback().
[NET] linkwatch: Handle jiffies wrap-around
[IRDA]: Switching to a workqueue for the SIR work
[IRDA]: smsc-ircc: Minimal hotplug support.
[IRDA]: Removing unused EXPORT_SYMBOLs
[IRDA]: New maintainer.
[NET]: Make netdev_chain a raw notifier.
[IPV4]: ip_options_fragment() has no effect on fragmentation
[NET]: Add missing operstates documentation.
The last step of netdevice registration was being done by a delayed
call, but because it was delayed, it was impossible to return any error
code if the class_device registration failed.
Side effects:
* one state in registration process is unnecessary.
* register_netdevice can sleep inside class_device registration/hotplug
* code in netdev_run_todo only does unregistration so it is simpler.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This closes a couple holes in our attribute aliasing avoidance scheme:
- The current kernel fails mmaps of some /dev/mem MMIO regions because
they don't appear in the EFI memory map. This keeps X from working
on the Intel Tiger box.
- The current kernel allows UC mmap of the 0-1MB region of
/sys/.../legacy_mem even when the chipset doesn't support UC
access. This causes an MCA when starting X on HP rx7620 and rx8620
boxes in the default configuration.
There's more detail in the Documentation/ia64/aliasing.txt file this
adds, but the general idea is that if a region might be covered by
a granule-sized kernel identity mapping, any access via /dev/mem or
mmap must use the same attribute as the identity mapping.
Otherwise, we fall back to using an attribute that is supported
according to the EFI memory map, or to using UC if the EFI memory
map doesn't mention the region.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This is a backout of earlier patch.
The whole rescheduling hack was a bad idea. It doesn't really solve
the problem and it makes the code more complicated for no good reason.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
After dwmw2 let me know it ought to be done, I rewrote the physmap map
driver to be a platform driver. I know zilch about the driver model,
so I probably botched it in some way, but I've done some tests on an
ixp23xx board which uses physmap, and it all seems to work.
In order to not break existing physmap users, I've added some compat
code that will instantiate a platform device iff CONFIG_MTD_PHYSMAP_LEN
is defined and != 0. Also, I've changed the default value for
CONFIG_MTD_PHYSMAP_LEN to zero, so that people who inadvertently
compile in physmap (or new, platform-style, users of physmap) don't get
burned.
This works pretty well -- the new physmap driver is a drop-in replacement
for the old one, and works on said ixp23xx board without any code changes
needed. (This should hold as long as users don't touch 'physmap_map'
directly.)
Once all physmap users have been converted to instantiate their own
platform devices, the compat code can go. (Or we decide that we can
change all the in-tree users at the same time, and never merge the
compat code.)
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Atomically create attributes when class device is added. This avoids
the race between registering class_device (which generates hotplug
event), and the creation of attribute groups.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend the support of attribute groups in class_device's to allow
groups to be created as part of the registration process. This allows
network device's to avoid race between registration and creating
groups.
Note that unlike attributes that are a property of the class object,
the groups are a property of the class_device object. This is done
because there are different types of network devices (wireless for
example).
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
[PATCH] powerpc: Use the ibm,pa-features property if available
powerpc: Fix incorrect might_sleep in __get_user/__put_user on kernel addresses
[PATCH] ppc32 CPM_UART: fixes and improvements
[PATCH] ppc32 CPM_UART: Fixed break send on SCC
[PATCH] powerpc/kprobes: fix singlestep out-of-line
[PATCH] powerpc/pseries: avoid crash in PCI code if mem system not up
Signed-off-by: Jing Min Zhao <zhaojingmin@users.sourceforge.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nick says that the current construct isn't safe. This goes back to the
original, but sets PIPE_BUF_FLAG_LRU on user pages as well as they all
seem to be on the LRU in the first place.
Signed-off-by: Jens Axboe <axboe@suse.de>
A number of small issues are fixed, and added the header file, missed from the
original series. With this, driver should be pretty stable as tested among
both platform-device-driven and "old way" boards. Also added missing GPL
statement , and updated year field on existing ones to reflect
code update.
Signed-off-by: Vitaly Bordug <vbordug@ru.mvista.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The CSD contains a "read2write factor" which determines the multiplier to
be applied to the read timeout to obtain the write timeout. We were
ignoring this parameter, resulting in the possibility for writes being
timed out too early.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Currently we rely on the PIPE_BUF_FLAG_LRU flag being set correctly
to know whether we need to fiddle with page LRU state after stealing it,
however for some origins we just don't know if the page is on the LRU
list or not.
So remove PIPE_BUF_FLAG_LRU and do this check/add manually in pipe_to_file()
instead.
Signed-off-by: Jens Axboe <axboe@suse.de>
* 'audit.b10' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
[PATCH] Audit Filter Performance
[PATCH] Rework of IPC auditing
[PATCH] More user space subject labels
[PATCH] Reworked patch for labels on user space messages
[PATCH] change lspp ipc auditing
[PATCH] audit inode patch
[PATCH] support for context based audit filtering, part 2
[PATCH] support for context based audit filtering
[PATCH] no need to wank with task_lock() and pinning task down in audit_syscall_exit()
[PATCH] drop task argument of audit_syscall_{entry,exit}
[PATCH] drop gfp_mask in audit_log_exit()
[PATCH] move call of audit_free() into do_exit()
[PATCH] sockaddr patch
[PATCH] deal with deadlocks in audit_free()
When iptables userspace adds an ipt_standard_target, it calculates the size
of the entire entry as:
sizeof(struct ipt_entry) + XT_ALIGN(sizeof(struct ipt_standard_target))
ipt_standard_target looks like this:
struct xt_standard_target
{
struct xt_entry_target target;
int verdict;
};
xt_entry_target contains a pointer, so when compiled for 64 bit the
structure gets an extra 4 byte of padding at the end. On 32 bit
architectures where iptables aligns to 8 byte it will also have 4
byte padding at the end because it is only 36 bytes large.
The compat_ipt_standard_fn in the kernel adjusts the offsets by
sizeof(struct ipt_standard_target) - sizeof(struct compat_ipt_standard_target),
which will always result in 4, even if the structure from userspace
was already padded to a multiple of 8. On x86 this works out by
accident because userspace only aligns to 4, on all other
architectures this is broken and causes incorrect adjustments to
the size and following offsets.
Thanks to Linus for lots of debugging help and testing.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block:
[PATCH] vmsplice: allow user to pass in gift pages
[PATCH] pipe: enable atomic copying of pipe data to/from user space
[PATCH] splice: call handle_ra_miss() on failure to lookup page
[PATCH] Add ->splice_read/splice_write to def_blk_fops
[PATCH] pipe: introduce ->pin() buffer operation
[PATCH] splice: fix bugs in pipe_to_file()
[PATCH] splice: fix bugs with stealing regular pipe pages
If SPLICE_F_GIFT is set, the user is basically giving this pages away to
the kernel. That means we can steal them for eg page cache uses instead
of copying it.
The data must be properly page aligned and also a multiple of the page size
in length.
Signed-off-by: Jens Axboe <axboe@suse.de>
The pipe ->map() method uses kmap() to virtually map the pages, which
is both slow and has known scalability issues on SMP. This patch enables
atomic copying of pipe pages, by pre-faulting data and using kmap_atomic()
instead.
lmbench bw_pipe and lat_pipe measurements agree this is a Good Thing. Here
are results from that on a UP machine with highmem (1.5GiB of RAM), running
first a UP kernel, SMP kernel, and SMP kernel patched.
Vanilla-UP:
Pipe bandwidth: 1622.28 MB/sec
Pipe bandwidth: 1610.59 MB/sec
Pipe bandwidth: 1608.30 MB/sec
Pipe latency: 7.3275 microseconds
Pipe latency: 7.2995 microseconds
Pipe latency: 7.3097 microseconds
Vanilla-SMP:
Pipe bandwidth: 1382.19 MB/sec
Pipe bandwidth: 1317.27 MB/sec
Pipe bandwidth: 1355.61 MB/sec
Pipe latency: 9.6402 microseconds
Pipe latency: 9.6696 microseconds
Pipe latency: 9.6153 microseconds
Patched-SMP:
Pipe bandwidth: 1578.70 MB/sec
Pipe bandwidth: 1579.95 MB/sec
Pipe bandwidth: 1578.63 MB/sec
Pipe latency: 9.1654 microseconds
Pipe latency: 9.2266 microseconds
Pipe latency: 9.1527 microseconds
Signed-off-by: Jens Axboe <axboe@suse.de>
The ->map() function is really expensive on highmem machines right now,
since it has to use the slower kmap() instead of kmap_atomic(). Splice
rarely needs to access the virtual address of a page, so it's a waste
of time doing it.
Introduce ->pin() to take over the responsibility of making sure the
page data is valid. ->map() is then reduced to just kmap(). That way we
can also share a most of the pipe buffer ops between pipe.c and splice.c
Signed-off-by: Jens Axboe <axboe@suse.de>
Found by Oleg Nesterov <oleg@tv-sign.ru>, fixed by me.
- Only allow full pages to go to the page cache.
- Check page != buf->page instead of using PIPE_BUF_FLAG_STOLEN.
- Remember to clear 'stolen' if add_to_page_cache() fails.
And as a cleanup on that:
- Make the bottom fall-through logic a little less convoluted. Also make
the steal path hold an extra reference to the page, so we don't have
to differentiate between stolen and non-stolen at the end.
Signed-off-by: Jens Axboe <axboe@suse.de>
Reverting c7afb48eb5147be9eb9789b4161462d246451ac2 since a better (but
more intrusive) fix is now merged upstream.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[TG3]: Update version and reldate
[TG3]: Fix bug in nvram write
[TG3]: Add reset_phy parameter to chip reset functions
[TG3]: Reset chip when changing MAC address
[TG3]: Add phy workaround
[TG3]: Call netif_carrier_off() during phy reset
[IPV6]: Fix race in route selection.
[XFRM]: fix incorrect xfrm_policy_afinfo_lock use
[XFRM]: fix incorrect xfrm_state_afinfo_lock use
[TCP]: Fix unlikely usage in tcp_transmit_skb()
[XFRM]: fix softirq-unsafe xfrm typemap->lock use
[IPSEC]: Fix IP ID selection
[NET]: use hlist_unhashed()
[IPV4]: inet_init() -> fs_initcall
[NETLINK]: cleanup unused macro in net/netlink/af_netlink.c
[PKT_SCHED] netem: fix loss
[X25]: fix for spinlock recurse and spinlock lockup with timer handler
1) The audit_ipc_perms() function has been split into two different
functions:
- audit_ipc_obj()
- audit_ipc_set_perm()
There's a key shift here... The audit_ipc_obj() collects the uid, gid,
mode, and SElinux context label of the current ipc object. This
audit_ipc_obj() hook is now found in several places. Most notably, it
is hooked in ipcperms(), which is called in various places around the
ipc code permforming a MAC check. Additionally there are several places
where *checkid() is used to validate that an operation is being
performed on a valid object while not necessarily having a nearby
ipcperms() call. In these locations, audit_ipc_obj() is called to
ensure that the information is captured by the audit system.
The audit_set_new_perm() function is called any time the permissions on
the ipc object changes. In this case, the NEW permissions are recorded
(and note that an audit_ipc_obj() call exists just a few lines before
each instance).
2) Support for an AUDIT_IPC_SET_PERM audit message type. This allows
for separate auxiliary audit records for normal operations on an IPC
object and permissions changes. Note that the same struct
audit_aux_data_ipcctl is used and populated, however there are separate
audit_log_format statements based on the type of the message. Finally,
the AUDIT_IPC block of code in audit_free_aux() was extended to handle
aux messages of this new type. No more mem leaks I hope ;-)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Hi,
The patch below builds upon the patch sent earlier and adds subject label to
all audit events generated via the netlink interface. It also cleans up a few
other minor things.
Signed-off-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The below patch should be applied after the inode and ipc sid patches.
This patch is a reworking of Tim's patch that has been updated to match
the inode and ipc patches since its similar.
[updated:
> Stephen Smalley also wanted to change a variable from isec to tsec in the
> user sid patch. ]
Signed-off-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Hi,
The patch below converts IPC auditing to collect sid's and convert to context
string only if it needs to output an audit record. This patch depends on the
inode audit change patch already being applied.
Signed-off-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Previously, we were gathering the context instead of the sid. Now in this patch,
we gather just the sid and convert to context only if an audit event is being
output.
This patch brings the performance hit from 146% down to 23%
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The following patch provides selinux interfaces that will allow the audit
system to perform filtering based on the process context (user, role, type,
sensitivity, and clearance). These interfaces will allow the selinux
module to perform efficient matches based on lower level selinux constructs,
rather than relying on context retrievals and string comparisons within
the audit module. It also allows for dominance checks on the mls portion
of the contexts that are impossible with only string comparisons.
Signed-off-by: Darrel Goeddel <dgoeddel@trustedcs.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The locking for the uart_port is over complicated, and can be
simplified if we introduce a flag to indicate that a port is "dead"
and will be removed.
This also helps the validator because it removes a case of non-nested
unlock ordering.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Use hlist_unhashed() rather than accessing inside data structure.
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
While writing to an event device allows to set repeat rate for an
individual input device there is no way to retrieve current settings
so we need to ressurect EVIOCGREP. Also ressurect EVIOCSREP so we
have a symmetrical interface.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
They shouldn't be using 'u32' et al in structures which are used for
communication with userspace. Switch to the proper types (__u32 etc).
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
It uses kernel_ulong_t but can't be wrapped in __KERNEL__ because it's
used from scripts/mod/file2alias.c -- but we _can_ hide it inside
header manually too (and it doesn't generally exist for userspace).
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
- Add new SA_PROBEIRQ which suppresses the new sharing-mismatch warning.
Some drivers like to use request_irq() to find an unused interrupt slot.
- Use it in i82365.c
- Kill unused SA_PROBE.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6:
[PATCH] Added URI of "linux kernel development process"
[PATCH] Kobject: possible cleanups
[PATCH] Fix OCFS2 warning when DEBUG_FS is not enabled
[PATCH] Kobject: fix build error
[PATCH] Frame buffer: remove cmap sysfs interface
This patch contains the following possible cleanups:
- #if 0 the following unused global function:
- subsys_remove_file()
- remove the following unused EXPORT_SYMBOL's:
- kset_find_obj
- subsystem_init
- remove the following unused EXPORT_SYMBOL_GPL:
- kobject_add_dir
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix the following warning which happens when OCFS2_FS is enabled but
DEBUG_FS isn't:
fs/ocfs2/dlmglue.c: In function `ocfs2_dlm_init_debug':
fs/ocfs2/dlmglue.c:2036: warning: passing arg 5 of `debugfs_create_file' discards qualifiers from pointer target type
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Joel Becker <Joel.Becker@oracle.com>
Acked-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This fixes a build error for various odd combinations of CONFIG_HOTPLUG
and CONFIG_NET.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Nigel Cunningham <ncunningham@cyclades.com>
Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
find_get_pages_contig() will break out if we hit a hole in the page cache.
From Andrew Morton, small modifications and documentation by me.
Signed-off-by: Jens Axboe <axboe@suse.de>
There was a whole load of crap exposed which should have been inside the
existing #ifdef __KERNEL__ part. Also hide struct sched_param for now,
since glibc has its own and doesn't like being given ours (yet).
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Don't include <linux/sched.h> outside __KERNEL__, and split the EM_xxx
definitions out of elf.h into elf-em.h so that audit.h can include just
that and not pollute the namespace any further than it needs to.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
This is a workaround for the case edge-triggered irq's. Several users
seem to have broken configurations sharing edge-triggered irq's. To avoid
losing IRQ's, reshedule if more work arrives.
The changes to netdevice.h are to extract the part that puts device
back in list into separate inline.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
sys_splice() moves data to/from pipes with a file input/output. sys_vmsplice()
moves data to a pipe, with the input being a user address range instead.
This uses an approach suggested by Linus, where we can hold partial ranges
inside the pages[] map. Hopefully this will be useful for network
receive support as well.
Signed-off-by: Jens Axboe <axboe@suse.de>
Providing more accurate coordinates for thumb press requires additional
steps in the filtering logic:
- Ignore samples found invalid by the debouncing logic, or the ones that
have out of bound pressure value.
- Add a parameter to repeat debouncing, so that more then two consecutive
good readings are required for a valid sample.
Signed-off-by: Imre Deak <imre.deak@nokia.com>
Acked-by: Juha Yrjola <juha.yrjola@nokia.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Move some inclusion of private header files and the definition of
RPC_DEBUG inside the existing #ifdef __KERNEL__
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
For now, just make sure all inclusion of private header files is done
within #ifdef __KERNEL__. There'll be more to clean up later.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
It only really needs to define a few constants and include <asm/mman.h>
when it's used by userspace. Move the rest within #ifdef __KERNEL__
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
It was unconditionally including a whole bunch of headers which aren't
user-visible, and also exposing a lot of private internal stuff of its
own. Also fix some legacy character set to UTF-8 while we're at it.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
It doesn't need to include i2c.h, because a forward declaration of
struct i2c_adapter is perfectly sufficient. And it can be inside
#ifdef __KERNEL__ along with the kernel-internal structure definition.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Some (?) non-x86 architectures require 8byte alignment for u_int64_t
even when compiled for 32bit, using u_int32_t in compat_xt_counters
breaks on these architectures, use u_int64_t for everything but x86.
Reported by Andreas Schwab <schwab@suse.de>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also switch it to use the same method of using off-tree nodes as
everyone else now does -- set them to point to themselves.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Seems like a strange requirement, but allegedly it was necessary for
struct address_space on CRIS, because it otherwise ended up being only
byte-aligned. It's harmless enough, and easier to just do it than to
prove it isn't necessary... although I really ought to dig out my etrax
board and test it some time.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
We only used a single bit for colour information, so having a whole
machine word of space allocated for it was a bit wasteful. Instead,
store it in the lowest bit of the 'parent' pointer, since that was
always going to be aligned anyway.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
This is in preparation for merging those fields into a single
'unsigned long', because using a whole machine-word for a single bit
of colour information is wasteful.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
We don't have to #if guard prototypes.
This also fixes a bug observed by Randy Dunlap due to a misspelled
option in the #if.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add some sanity checking. truesize should be at least sizeof(struct
sk_buff) plus the current packet length. If not, then truesize is
seriously mangled and deserves a kernel log message.
Currently we'll do the check for release of stream socket buffers.
But we can add checks to more spots over time.
Incorporating ideas from Herbert Xu.
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.linux-nfs.org/pub/linux/nfs-2.6:
SUNRPC: Dead code in net/sunrpc/auth_gss/auth_gss.c
NFS: remove needless check in nfs_opendir()
NFS: nfs_show_stats; for_each_possible_cpu(), not NR_CPUS
NFS: make 2 functions static
NFS,SUNRPC: Fix compiler warnings if CONFIG_PROC_FS & CONFIG_SYSCTL are unset
NFS: fix PROC_FS=n compile error
VFS: Fix another open intent Oops
RPCSEC_GSS: fix leak in krb5 code caused by superfluous kmalloc
fs/built-in.o: In function `nfs_show_stats':inode.c:(.text+0x15481a): undefined reference to `rpc_print_iostats'
net/built-in.o: In function `rpc_destroy_client': undefined reference to `rpc_free_iostats'
net/built-in.o: In function `rpc_clone_client': undefined reference to `rpc_alloc_iostats'
net/built-in.o: In function `rpc_new_client': undefined reference to `rpc_alloc_iostats'
net/built-in.o: In function `xprt_release': undefined reference to `rpc_count_iostats'
make: *** [.tmp_vmlinux1] Error 1
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Noted by Sergei Shtylylov <sshtylyov@ru.mvista.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add support for the IDE device on ATI SB600
Signed-off-by: Felix Kuehling <fkuehlin@ati.com>
Acked-by: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
While we can currently walk through thread groups, process groups, and
sessions with just the rcu_read_lock, this opens the door to walking the
entire task list.
We already have all of the other RCU guarantees so there is no cost in
doing this, this should be enough so that proc can stop taking the
tasklist lock during readdir.
prev_task was killed because it has no users, and using it will miss new
tasks when doing an rcu traversal.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Somehow in the midst of dotting i's and crossing t's during
the merge up to rc1 we wound up keeping __put_task_struct_cb
when it should have been killed as it no longer has any users.
Sorry I probably should have caught this while it was
still in the -mm tree.
Having the old code there gets confusing when reading
through the code and trying to understand what is
happening.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6: (679 commits)
commit 7676f83aeb774e7a3abe6af06ec92b29488b5b79
Author: James Bottomley <James.Bottomley@steeleye.com>
Date: Fri Apr 14 09:47:59 2006 -0500
[SCSI] scsi_transport_sas: don't scan a non-existent end device
Any end device that can't support any of the scanning protocols
shouldn't be scanned, so set its id to -1 to prevent
scsi_scan_target() being called for it.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
commit 3c0c25b97c7d020ef07f6366cf1d668a8e980c7c
Author: Moore, Eric <Eric.Moore@lsil.com>
Date: Thu Apr 13 16:08:17 2006 -0600
[SCSI] mptfusion - fix panic in mptsas_slave_configure
Driver panic when RAID logical volume was present when driver
loaded, or when a RAID logical volume was created on the fly.
...
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: (169 commits)
commit 78a596b4490e17b9990d87b9d468ef5bb70daa10
Author: Adrian Bunk <bunk@stusta.de>
Date: Fri Mar 31 01:38:12 2006 -0800
[PATCH] remove kernel/power/pm.c:pm_unregister()
Since the last user is removed in -mm, we can now remove this long deprecated
function.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 21440d313358043b0ce5e43b00ff3c9b35a8616c
Author: David Brownell <david-b@pacbell.net>
Date: Sat Apr 1 10:21:52 2006 -0800
[PATCH] dma doc updates
...
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (158 commits)
commit 4f705ae3e94ffaafe8d35f71ff4d5c499bb06814
Author: Bjorn Helgaas <bjorn.helgaas@hp.com>
Date: Mon Apr 3 17:09:22 2006 -0700
[PATCH] DMI: move dmi_scan.c from arch/i386 to drivers/firmware/
dmi_scan.c is arch-independent and is used by i386, x86_64, and ia64.
Currently all three arches compile it from arch/i386, which means that ia64
and x86_64 depend on things in arch/i386 that they wouldn't otherwise care
about.
This is simply "mv arch/i386/kernel/dmi_scan.c drivers/firmware/" (removing
trailing whitespace) and the associated Makefile changes. All three
architectures already set CONFIG_DMI in their top-level Kconfig files.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andrey Panin <pazke@orbita1.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
...
Since the last user is removed in -mm, we can now remove this long deprecated
function.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Sparse warns about casting to a __bitwise type. However, it's correct
to do when defining the enum for pci_bus_flags_t, so add a __force to
quiet the warnings. This will fix getting
include/linux/pci.h💯26: warning: cast to restricted type
from sparse all over the build.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The naming of the constant defined for PCI ID 1022:7450 does not seem
to match the information at http://pciids.sourceforge.net/:
http://pci-ids.ucw.cz/iii/?i=1022
There 1022:7450 is listed as "AMD-8131 PCI-X Bridge" while 1022:7451
is listed as "AMD-8131 PCI-X IOAPIC". Yet, the current definition for
0x7450 is PCI_DEVICE_ID_AMD_8131_APIC. It seems to me like that name
should map to 0x7451, while a name like PCI_DEVICE_ID_AMD_8131_BRIDGE
should map to 0x7450.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Print more diagnostic info to help identify the source of power management
suspend failures.
Example:
usb_hcd_pci_suspend(): pci_set_power_state+0x0/0x1af() returns -22
pci_device_suspend(): usb_hcd_pci_suspend+0x0/0x11b() returns -22
suspend_device(): pci_device_suspend+0x0/0x34() returns -22
Work-in-progress. It needs lots more suspend_report_result() calls sprinkled
everywhere.
Cc: Patrick Mochel <mochel@digitalimplant.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@suspend2.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[BLOCK] delay all uevents until partition table is scanned
Here we delay the annoucement of all block device events until the
disk's partition table is scanned and all partition devices are already
created and sysfs is populated.
We have a bunch of old bugs for removable storage handling where we
probe successfully for a filesystem on the raw disk, but at the
same time the kernel recognizes a partition table and creates partition
devices.
Currently there is no sane way to tell if partitions will show up or not
at the time the disk device is announced to userspace. With the delayed
events we can simply skip any probe for a filesystem on the raw disk when
we find already present partitions.
Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It works like this:
Open the file
Read all the contents.
Call poll requesting POLLERR or POLLPRI (so select/exceptfds works)
When poll returns,
close the file and go to top of loop.
or lseek to start of file and go back to the 'read'.
Events are signaled by an object manager calling
sysfs_notify(kobj, dir, attr);
If the dir is non-NULL, it is used to find a subdirectory which
contains the attribute (presumably created by sysfs_create_group).
This has a cost of one int per attribute, one wait_queuehead per kobject,
one int per open file.
The name "sysfs_notify" may be confused with the inotify
functionality. Maybe it would be nice to support inotify for sysfs
attributes as well?
This patch also uses sysfs_notify to allow /sys/block/md*/md/sync_action
to be pollable
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Move common definitions for NET2280 to <linux/usb/net2280.h>, so that I can
use them in prism54usb (it is not merged yet, but I plan to do it soon).
Signed-off-by: Pete Zaitcev <zaitcev@redhat.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'tee' of git://brick.kernel.dk/data/git/linux-2.6-block:
[PATCH] splice: add support for sys_tee()
[PATCH] splice: pass offset around for ->splice_read() and ->splice_write()
We currently have two implementations of this obsolete ioctl, one in
the block layer and one in the scsi code. Both of them have drawbacks.
This patch kills the scsi layer version after updating the block version
with the missing bits:
- argument checking
- use scatterlist I/O
- set number of retries based on the submitted command
This is the last user of non-S/G I/O except for the gdth driver, so
getting this in ASAP and through the scsi tree would be nie to kill
the non-S/G I/O path. Jens, what do you think about adding a check
for non-S/G I/O in the midlayer?
Thanks to Or Gerlitz for testing this patch.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The pen down IRQ will toggle during each X,Y,Z measurement cycle.
Even though the IRQ is disabled it will be latched and delivered
when after enable_irq. Thus in the IRQ handler we must avoid
starting a new measurement cycle when such an "unwanted" IRQ happens.
Add a get_pendown_state platform function, which will probably
determine this by reading the current GPIO level of the pen IRQ pin.
Move the IRQ reenabling from the SPI RX function to the timer. After
the last power down message the pen IRQ pin is still active for a
while and get_pendown_state would report incorrectly a pen down state.
When suspending we should check the ts->pending flag instead of
ts->pendown, since the timer can be pending regardless of ts->pendown.
Also if ts->pending is set we can be sure that the timer is running,
so no need to rearm it. Similarly if ts->pending is not set we can
be sure that the IRQ is enabled (and the timer is not).
Signed-off-by: Imre Deak <imre.deak@nokia.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Some touchscreens seem to oscillate heavily for a while after touching
the screen. Implement support for sampling the screen until we get two
consecutive values that are close enough.
Signed-off-by: Imre Deak <imre.deak@nokia.com>
Signed-off-by: Juha Yrjola <juha.yrjola@nokia.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
The in-kernel Sangoma drivers are both not compiling and marked as BROKEN
since at least kernel 2.6.0.
Sangoma offers out-of-tree drivers, and David Mandelstam told me Sangoma
does no longer maintain the in-kernel drivers and prefers to provide them
as a separate installation package.
This patch therefore removes these drivers.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As waiting for some register bits to change seems to be a common
operation shared by some controllers, implement helper function
ata_wait_register(). This function also takes care of register write
flushing.
Note that the condition is inverted, the wait is over when the masked
value does NOT match @val. As we're waiting for bits to change, this
test is more powerful and allows the function to be used in more
places.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
@verbose was added to ata_reset_fn_t because AHCI complained during
probing if no device was attached to the port. However, muting
failure message isn't the correct approach. Reset methods are
responsible for detecting no device condition and finishing
successfully. Now that AHCI softreset is fixed, kill @verbose.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Basically an in-kernel implementation of tee, which uses splice and the
pipe buffers as an intelligent way to pass data around by reference.
Where the user space tee consumes the input and produces a stdout and
file output, this syscall merely duplicates the data inside a pipe to
another pipe. No data is copied, the output just grabs a reference to the
input pipe data.
Signed-off-by: Jens Axboe <axboe@suse.de>
We need not use ->f_pos as the offset for the file input/output. If the
user passed an offset pointer in through sys_splice(), just use that and
leave ->f_pos alone.
Signed-off-by: Jens Axboe <axboe@suse.de>
In vsyscall function do_vgettimeofday(), some functions are declared as
inlined, which is a hint for gcc to compile the function inlined but it
not forced. Sometimes compiler does not compile the function as
inlined, so here inline is replaced by __always_inline prefix.
It does not happen in gcc compiler actually, but it possibly happens.
Signed-off-by: bibo mao <bibo.mao@intel.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block:
[PATCH] vfs: add splice_write and splice_read to documentation
[PATCH] Remove sys_ prefix of new syscalls from __NR_sys_*
[PATCH] splice: warning fix
[PATCH] another round of fs/pipe.c cleanups
[PATCH] splice: comment styles
[PATCH] splice: add Ingo as addition copyright holder
[PATCH] splice: unlikely() optimizations
[PATCH] splice: speedups and optimizations
[PATCH] pipe.c/fifo.c code cleanups
[PATCH] get rid of the PIPE_*() macros
[PATCH] splice: speedup __generic_file_splice_read
[PATCH] splice: add direct fd <-> fd splicing support
[PATCH] splice: add optional input and output offsets
[PATCH] introduce a "kernel-internal pipe object" abstraction
[PATCH] splice: be smarter about calling do_page_cache_readahead()
[PATCH] splice: optimize the splice buffer mapping
[PATCH] splice: cleanup __generic_file_splice_read()
[PATCH] splice: only call wake_up_interruptible() when we really have to
[PATCH] splice: potential !page dereference
[PATCH] splice: mark the io page as accessed
Bugzilla Bug 6299:
A pixel size of 8 bits produces wrong logo colors in x86_64.
The driver has 2 methods for setting the color map, using the protected
mode interface provided by the video BIOS and directly writing to the VGA
registers. The former is not supported in x86_64 and the latter is enabled
only in i386.
Fix by enabling the latter method in x86_64 only if supported by the BIOS.
If both methods are unsupported, change the visual of vesafb to
STATIC_PSEUDOCOLOR.
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Every caller of svc_take_page ignores its return value and assumes it
succeeded. So just WARN() instead of returning an ignored error. This would
have saved some time debugging a recent nfsd4 problem.
If there are still failure cases here, then the result is probably that we
overwrite an earlier part of the reply while xdr-encoding.
While the corrupted reply is a nasty bug, it would be worse to panic here and
create the possibility of a remote DOS; hence WARN() instead of BUG().
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Ingo Oeser <ioe-lkml@rameria.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
An UML user reported (against 2.6.13.3/UML) he got kernel Oopses when
trying to rmmod (on a kernel with module unloading enabled) a module
compiled with module unloading disabled. As crashing is a very correct
thing to do in that case, a solution is altering the vermagic string to
include this too.
Possibly, however, the code should not crash in this case, even if the
module didn't support unloading - it should simply abort the module
removal. In this case, fixing that bug would be a better solution. I've
not investigated though.
(akpm: a bit marginal - root screwed up and shot himself in the foot).
Cc: Hayim Shaul <hayim@post.tau.ac.il>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These are the last conversions of pci_set_dma_mask(),
pci_set_consistent_dma_mask() and pci_dma_supported() to use DMA_xBIT_MASK
constants from linux/dma-mapping.h
Signed-off-by: Tobias Klauser <tklauser@nuerscht.ch>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A couple of /proc/vmcore data structures overflow with 32bit systems having
memory more than 4G. This patch fixes those.
Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Before commit 47e65328a7b1cdfc4e3102e50d60faf94ebba7d3, next_thread() took
a const task_t. Reinstate the const qualifier, getting the next thread
never changes the current thread.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We changed the wrong symbol. It's tty_insert_flip_string_flags() which is
called from the previously-non-GPL'ed now-inlined tty_insert_flip_char().
Fix that up, and uninline tty_schedule_flip() while we're there.
Cc: Tobias Powalowski <t.powa@gmx.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Lay out the structure definitions in include/linux/leds.h to be aligned as
much as possible. Also minor updates to the comments to make them more
concise.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Acked-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implement the scheduled unexport of panic_timeout.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ulrich suggested that the `flags' arg to sync_file_range() become unsigned.
Cc: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some string functions were safely overrideable in lib/string.c, but their
corresponding declarations in linux/string.h were not. Correct this, and
make strcspn overrideable.
Odds of someone wanting to do optimized assembly of these are small, but
for the sake of cleanliness, might as well bring them into line with the
rest of the file.
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Current implementations define NODES_SHIFT in include/asm-xxx/numnodes.h for
each arch. Its definition is sometimes configurable. Indeed, ia64 defines 5
NODES_SHIFT values in the current git tree. But it looks a bit messy.
SGI-SN2(ia64) system requires 1024 nodes, and the number of nodes already has
been changeable by config. Suitable node's number may be changed in the
future even if it is other architecture. So, I wrote configurable node's
number.
This patch set defines just default value for each arch which needs multi
nodes except ia64. But, it is easy to change to configurable if necessary.
On ia64 the number of nodes can be already configured in generic ia64 and SN2
config. But, NODES_SHIFT is defined for DIG64 and HP'S machine too. So, I
changed it so that all platforms can be configured via CONFIG_NODES_SHIFT. It
would be simpler.
See also: http://marc.theaimsgroup.com/?l=linux-kernel&m=114358010523896&w=2
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jack Steiner <steiner@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce GFP_NOWAIT, as an alias for GFP_ATOMIC & ~__GFP_HIGH.
This also changes XFS, which is the only in-tree user of this idiom that I
could find. The XFS piece is compile-tested only.
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Acked-by: Nathan Scott <nathans@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add some documentation regarding the utilisation of the flags field in
struct page. This field is overloaded for per page bits and to hold node,
zone and SPARSEMEM information. Make it clear which areas are used for
what and how many bits are in each area.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These patches are an enhancement of OVERCOMMIT_GUESS algorithm in
__vm_enough_memory().
- why the kernel needed patching
When the kernel can't allocate anonymous pages in practice, currnet
OVERCOMMIT_GUESS could return success. This implementation might be
the cause of oom kill in memory pressure situation.
If the Linux runs with page reservation features like
/proc/sys/vm/lowmem_reserve_ratio and without swap region, I think
the oom kill occurs easily.
- the overall design approach in the patch
When the OVERCOMMET_GUESS algorithm calculates number of free pages,
the reserved free pages are regarded as non-free pages.
This change helps to avoid the pitfall that the number of free pages
become less than the number which the kernel tries to keep free.
- testing results
I tested the patches using my test kernel module.
If the patches aren't applied to the kernel, __vm_enough_memory()
returns success in the situation but autual page allocation is
failed.
On the other hand, if the patches are applied to the kernel, memory
allocation failure is avoided since __vm_enough_memory() returns
failure in the situation.
I checked that on i386 SMP 16GB memory machine. I haven't tested on
nommu environment currently.
This patch adds totalreserve_pages for __vm_enough_memory().
Calculate_totalreserve_pages() checks maximum lowmem_reserve pages and
pages_high in each zone. Finally, the function stores the sum of each
zone to totalreserve_pages.
The totalreserve_pages is calculated when the VM is initilized.
And the variable is updated when /proc/sys/vm/lowmem_reserve_raito
or /proc/sys/vm/min_free_kbytes are changed.
Signed-off-by: Hideo Aoki <haoki@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
reshape_position is a 64bit field that was not 64bit aligned. So swap with
new_level.
NOTE: this is a user-visible change. However:
- The bad code has not appeared in a released kernel
- This code is still marked 'experimental'
- This only affects version-1 superblock, which are not in wide use
- These field are only used (rather than simply reported) by user-space
tools in extemely rare circumstances : after a reshape crashes in the
first second of the reshape process.
So I believe that, at this stage, the change is safe. Especially if people
heed the 'help' message on use mdadm-2.4.1.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
From: Andrew Morton <akpm@osdl.org>
net/socket.c:148: warning: initialization from incompatible pointer type
extern declarations in .c files! Bad boy.
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jens Axboe <axboe@suse.de>
It's more efficient for sendfile() emulation. Basically we cache an
internal private pipe and just use that as the intermediate area for
pages. Direct splicing is not available from sys_splice(), it is only
meant to be used for sendfile() emulation.
Additional patch from Ingo Molnar to avoid the PIPE_BUFFERS loop at
exit for the normal fast path.
Signed-off-by: Jens Axboe <axboe@suse.de>
Oleg Nesterov spotted two interesting bugs with the current de_thread
code. The simplest is a long standing double decrement of
__get_cpu_var(process_counts) in __unhash_process. Caused by
two processes exiting when only one was created.
The other is that since we no longer detach from the thread_group list
it is possible for do_each_thread when run under the tasklist_lock to
see the same task_struct twice. Once on the task list as a
thread_group_leader, and once on the thread list of another
thread.
The double appearance in do_each_thread can cause a double increment
of mm_core_waiters in zap_threads resulting in problems later on in
coredump_wait.
To remedy those two problems this patch takes the simple approach
of changing the old thread group leader into a child thread.
The only routine in release_task that cares is __unhash_process,
and it can be trivially seen that we handle cleaning up a
thread group leader properly.
Since de_thread doesn't change the pid of the exiting leader process
and instead shares it with the new leader process. I change
thread_group_leader to recognize group leadership based on the
group_leader field and not based on pids. This should also be
slightly cheaper then the existing thread_group_leader macro.
I performed a quick audit and I couldn't see any user of
thread_group_leader that cared about the difference.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Overriding the whole EH code is a per-transport, not per-host thing.
Move ->eh_strategy_handler to the transport class, same as
->eh_timed_out.
Downside is that scsi_host_alloc can't check for the total lack of EH
anymore, but the transition period from old EH where we needed it is
long gone already.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Rohit found an obscure bug causing buddy list corruption.
page_is_buddy is using a non-atomic test (PagePrivate && page_count == 0)
to determine whether or not a free page's buddy is itself free and in the
buddy lists.
Each of the conjuncts may be true at different times due to unrelated
conditions, so the non-atomic page_is_buddy test may find each conjunct to
be true even if they were not both true at the same time (ie. the page was
not on the buddy lists).
Signed-off-by: Martin Bligh <mbligh@google.com>
Signed-off-by: Rohit Seth <rohitseth@google.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
add optional input and output offsets to sys_splice(), for seekable file
descriptors:
asmlinkage long sys_splice(int fd_in, loff_t __user *off_in,
int fd_out, loff_t __user *off_out,
size_t len, unsigned int flags);
semantics are straightforward: f_pos will be updated with the offset
provided by user-space, before the splice transfer is about to begin.
Providing a NULL offset pointer means the existing f_pos will be used
(and updated in situ). Providing an offset for a pipe results in
-ESPIPE. Providing an invalid offset pointer results in -EFAULT.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jens Axboe <axboe@suse.de>
separate out the 'internal pipe object' abstraction, and make it
usable to splice. This cleans up and fixes several aspects of the
internal splice APIs and the pipe code:
- pipes: the allocation and freeing of pipe_inode_info is now more symmetric
and more streamlined with existing kernel practices.
- splice: small micro-optimization: less pointer dereferencing in splice
methods
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Update XFS for the ->splice_read/->splice_write changes.
Signed-off-by: Jens Axboe <axboe@suse.de>
Add checksum operation which takes care of verifying the checksum and
dealing with HW checksum errors and avoids multiple checksum
operations by setting ip_summed to CHECKSUM_UNNECESSARY after
successful verification.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the queue rerouter intrastructure to a generic usable
infrastructure for address family specific operations as a base for
some cleanups.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move prototypes of NAT callbacks to ip_conntrack_h323.h. Because the
use of typedefs as arguments, some header files need to be moved as
well.
Signed-off-by: Jing Min Zhao <zhaojingmin@users.sourceforge.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the HPET timer is enabled, the clock can drift by ~3 seconds a day.
This is due to the HPET timer not being initialized with the correct
setting (still using PIT count).
If HZ changes, this drift can become even more pronounced.
HPET patch initializes tick_nsec with correct tick_nsec settings for
HPET timer.
Vojtech comments:
"It's not entirely correct (it assumes the HPET ticks totally
exactly), but it's significantly better than assuming the PIT error
there."
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The node setup code would try to allocate the node metadata in the node
itself, but that fails if there is no memory in there.
This can happen with memory hotplug when the hotplug area defines an so
far empty node.
Now use bootmem to try to allocate the mem_map in other nodes.
And if it fails don't panic, but just ignore the node.
To make this work I added a new __alloc_bootmem_nopanic function that
does what its name implies.
TBD should try to use nearby nodes here. Currently we just use any.
It's hard to do it better because bootmem doesn't have proper fallback
lists yet.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Memory hotadd doesn't need SPARSEMEM, but can be handled by just preallocating
mem_maps. This only needs some untangling of ifdefs to enable the necessary
code even without SPARSEMEM.
Originally from Keith Mannthey, hacked by AK.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert the ATAPI_ENABLE_DMADIR compile time option needed
by some SATA-PATA bridge to runtime module parameter.
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Originally from Nick Piggin, just adapted to the newer branch.
You can't check PageLRU without holding zone->lru_lock. The page
release code can get away with it only because the page refcount is 0 at
that point. Also, you can't reliably remove pages from the LRU unless
the refcount is 0. Ever.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Jens Axboe <axboe@suse.de>
By cleaning up the writeback logic (killing write_one_page() and the manual
set_page_dirty()), we can get rid of ->stolen inside the pipe_buffer and
just keep it local in pipe_to_file().
This also adds dirty page balancing logic and O_SYNC handling.
Signed-off-by: Jens Axboe <axboe@suse.de>
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (48 commits)
Documentation: fix minor kernel-doc warnings
BUG_ON() Conversion in drivers/net/
BUG_ON() Conversion in drivers/s390/net/lcs.c
BUG_ON() Conversion in mm/slab.c
BUG_ON() Conversion in mm/highmem.c
BUG_ON() Conversion in kernel/signal.c
BUG_ON() Conversion in kernel/signal.c
BUG_ON() Conversion in kernel/ptrace.c
BUG_ON() Conversion in ipc/shm.c
BUG_ON() Conversion in fs/freevxfs/
BUG_ON() Conversion in fs/udf/
BUG_ON() Conversion in fs/sysv/
BUG_ON() Conversion in fs/inode.c
BUG_ON() Conversion in fs/fcntl.c
BUG_ON() Conversion in fs/dquot.c
BUG_ON() Conversion in md/raid10.c
BUG_ON() Conversion in md/raid6main.c
BUG_ON() Conversion in md/raid5.c
Fix minor documentation typo
BFP->BPF in Documentation/networking/tuntap.txt
...
It doesn't make the splice itself necessarily nonblocking (because the
actual file descriptors that are spliced from/to may block unless they
have the O_NONBLOCK flag set), but it makes the splice pipe operations
nonblocking.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A lot of EH codes are about to be added to libata. Separate out
libata-eh.c. ata_scsi_timed_out(), ata_scsi_error(),
ata_qc_timeout(), ata_eng_timeout(), ata_eh_qc_complete() and
ata_eh_qc_retry() are moved. No code is changed by this patch.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Add a new qc flag ATA_QCFLAG_IO. This flag gets set for normal IO
commands originating from SCSI midlayer. This information will be
used by EH to determine transfer speed reconfiguration.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
ata_dev_configure() should not clear dynamic device flags determined
elsewhere. Lower eight bits are reserved for feature flags, define
ATA_DFLAG_CFG_MASK and clear only those bits before configuring
device. Without this patch, ATA_DFLAG_PIO gets turned off during
revalidation making PIO mode unuseable.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Rename ATA_FLAG_PORT_DISABLED to ATA_FLAG_DISABLED for consistency.
(ATA_FLAG_* are always about ports).
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
* Reorder ATA_DFLAG_* such that feature flags determined by
ata_dev_configure() are on lower bits. Reserve lower eight bits
for this purpose and allocate dynamic flags from bit 8.
* Reorder ATA_FLAG_* such that feature flags determined during driver
initiailization are on bits 0:15, dynamic flags on 16:23 and LLDD
specific flags on 24:31.
* Kill trailing white space and lower-case an one line comment for
consistency.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Improve ata_bus_probe() such that configuration failures are handled
better. Each device is given ATA_PROBE_MAX_TRIES chances, but any
non-transient error (revalidation failure with -ENODEV, configuration
failure with -EINVAL...) disables the device directly. Any IO error
results in SATA PHY speed down and ata_set_mode() failure lowers
transfer mode. The last try always puts a device into PIO-0.
After each failure, the whole port is reset to make sure that the
controller and all the devices are in a known and stable state. The
reset also applies SATA SPD configuration if necessary.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
ap->sata_spd_limit contrains SATA PHY speed of the port. It is
initialized to the configured value prior to probing thus preserving
BIOS configured value. hardreset is responsible for applying SPD
limit and sata_std_hardreset() is updated to do that. SATA SPD limit
will be used to enhance failure handling during probing and later by
EH.
This patch also normalizes some comments around affected code.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
For the time being we cannot use ata_dev_present() as it was renamed
to ata_dev_enabled() but we still need presence test. Implement
negation of the test. Conveniently, the negated result is needed in
more places. This is suggested by Jeff Garzik.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This patch updates the comments to match the actual code.
Signed-off-by: Martin Waitz <tali@admingilde.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
The sliced VBI defines added in videodev2.h are removed since requires
more discussion.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
- Add KEY_BRL_* input keys and K_BRL_* keycodes;
- Add emulation of how braille keyboards usually combine braille dots
to the console keyboard driver;
- Add handling of unicode U+28xy diacritics.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Goramo finally got PCI subsystem ID for their PCI200SYN card. The
attached patch adds support for it - cards with old EEPROM data
will emit a warning with URL for update tool.
Signed-off-by: Krzysztof Halasa <khc@pm.waw.pl>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This patch renames ata_dev_present() to ata_dev_enabled() and adds
ata_dev_disabled(). This is to discern the state where a device is
present but disabled from not-present state. This disctinction is
necessary when configuring transfer mode because device selection
timing must not be violated even if a device fails to configure.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This patch extends current iptables compatibility layer in order to get
32bit iptables to work on 64bit kernel. Current layer is insufficient due
to alignment checks both in kernel and user space tools.
Patch is for current net-2.6.17 with addition of move of ipt_entry_{match|
target} definitions to xt_entry_{match|target}.
Signed-off-by: Dmitry Mishin <dim@openvz.org>
Acked-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This unifies ipt_multiport and ip6t_multiport to xt_multiport.
As a result, this addes support for inversion and port range match
to IPv6 packets.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This unifies ipt_esp and ip6t_esp to xt_esp. Please note that now
a user program needs to specify IPPROTO_ESP as protocol to use esp match
with IPv6. This means that ip6tables requires '-p esp' like iptables.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
I was grepping through the code and some `grep ganularity -R .` didn't
catch what I thought. Then looking closer I saw the term "granuality"
used in only four places (in comments) and granularity in many more
places describing the same idea. Some other facts:
dictionary.com does not know such a word
define:granuality on google is not found (and pages for granuality are
mostly related to patches to the kernel)
it has not been discussed as a term on LKML, AFAICS (=Can Search)
To be consistent, I think granularity should be used everywhere.
Signed-off-by: Kalin KOZHUHAROV <kalin@thinrope.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
As announced, lookup_hash() can now become static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The monochrome->color expansion routine that handles bitmaps which have
(widths % 8) != 0 (slow_imageblit) produces corrupt characters in big-endian.
This is caused by a bogus bit test in slow_imageblit().
Fix.
This patch may deserve to go to the stable tree. The code has already been
well tested in little-endian machines. It's only in big-endian where there is
uncertainty and Herbert confirmed that this is the correct way to go.
It should not introduce regressions.
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Acked-by: Herbert Poetzl <herbert@13thfloor.at>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Backlight class attributes are currently easy to implement incorrectly.
Moving certain handling into the backlight core prevents this whilst at the
same time makes the drivers simpler and consistent. The following changes are
included:
The brightness attribute only sets and reads the brightness variable in the
backlight_properties structure.
The power attribute only sets and reads the power variable in the
backlight_properties structure.
Any framebuffer blanking events change a variable fb_blank in the
backlight_properties structure.
The backlight driver has only two functions to implement. One function is
called when any of the above properties change (to update the backlight
brightness), the second is called to return the current backlight brightness
value. A new attribute "actual_brightness" is added to return this brightness
as determined by the driver having combined all the above factors (and any
driver/device specific factors).
Additionally, the backlight core takes care of checking the maximum brightness
is not exceeded and of turning off the backlight before device removal.
The corgi backlight driver is updated to reflect these changes.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It is very common to hash a dentry and then to call lookup. If we take fs
specific hash functions into account the full hash logic can get ugly.
Further full_name_hash as an inline function is almost 100 bytes on x86 so
having a non-inline choice in some cases can measurably decrease code size.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Simplifies the code, reduces the need for 4 pid hash tables, and makes the
code more capable.
In the discussions I had with Oleg it was felt that to a large extent the
cleanup itself justified the work. With struct pid being dynamically
allocated meant we could create the hash table entry when the pid was
allocated and free the hash table entry when the pid was freed. Instead of
playing with the hash lists when ever a process would attach or detach to a
process.
For myself the fact that it gave what my previous task_ref patch gave for free
with simpler code was a big win. The problem is that if you hold a reference
to struct task_struct you lock in 10K of low memory. If you do that in a user
controllable way like /proc does, with an unprivileged but hostile user space
application with typical resource limits of 1000 fds and 100 processes I can
trigger the OOM killer by consuming all of low memory with task structs, on a
machine wight 1GB of low memory.
If I instead hold a reference to struct pid which holds a pointer to my
task_struct, I don't suffer from that problem because struct pid is 2 orders
of magnitude smaller. In fact struct pid is small enough that most other
kernel data structures dwarf it, so simply limiting the number of referring
data structures is enough to prevent exhaustion of low memory.
This splits the current struct pid into two structures, struct pid and struct
pid_link, and reduces our number of hash tables from PIDTYPE_MAX to just one.
struct pid_link is the per process linkage into the hash tables and lives in
struct task_struct. struct pid is given an indepedent lifetime, and holds
pointers to each of the pid types.
The independent life of struct pid simplifies attach_pid, and detach_pid,
because we are always manipulating the list of pids and not the hash table.
In addition in giving struct pid an indpendent life it makes the concept much
more powerful.
Kernel data structures can now embed a struct pid * instead of a pid_t and
not suffer from pid wrap around problems or from keeping unnecessarily
large amounts of memory allocated.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A big problem with rcu protected data structures that are also reference
counted is that you must jump through several hoops to increase the reference
count. I think someone finally implemented atomic_inc_not_zero(&count) to
automate the common case. Unfortunately this means you must special case the
rcu access case.
When data structures are only visible via rcu in a manner that is not
determined by the reference count on the object (i.e. tasks are visible until
their zombies are reaped) there is a much simpler technique we can employ.
Simply delaying the decrement of the reference count until the rcu interval is
over.
What that means is that the proc code that looks up a task and later
wants to sleep can now do:
rcu_read_lock();
task = find_task_by_pid(some_pid);
if (task) {
get_task_struct(task);
}
rcu_read_unlock();
The effect on the rest of the kernel is that put_task_struct becomes cheaper
and immediate, and in the case where the task has been reaped it frees the
task immediate instead of unnecessarily waiting an until the rcu interval is
over.
Cleanup of task_struct does not happen when its reference count drops to
zero, instead cleanup happens when release_task is called. Tasks can only
be looked up via rcu before release_task is called. All rcu protected
members of task_struct are freed by release_task.
Therefore we can move call_rcu from put_task_struct into release_task. And
we can modify release_task to not immediately release the reference count
but instead have it call put_task_struct from the function it gives to
call_rcu.
The end result:
- get_task_struct is safe in an rcu context where we have just looked
up the task.
- put_task_struct() simplifies into its old pre rcu self.
This reorganization also makes put_task_struct uncallable from modules as
it is not exported but it does not appear to be called from any modules so
this should not be an issue, and is trivially fixed.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This just got nuked in mainline. Bring it back because Eric's patches use it.
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
To increase the strength of SCHED_BATCH as a scheduling hint we can
activate batch tasks on the expired array since by definition they are
latency insensitive tasks.
Signed-off-by: Con Kolivas <kernel@kolivas.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The activated flag in task_struct is used to track different sleep types and
its usage is somewhat obfuscated. Convert the variable to an enum with more
descriptive names without altering the function.
Signed-off-by: Con Kolivas <kernel@kolivas.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently, count_active_tasks() calls both nr_running() &
nr_interruptible(). Each of these functions does a "for_each_cpu" & reads
values from the runqueue of each cpu. Although this is not a lot of
instructions, each runqueue may be located on different node. Depending on
the architecture, a unique TLB entry may be required to access each
runqueue.
Since there may be more runqueues than cpu TLB entries, a scan of all
runqueues can trash the TLB. Each memory reference incurs a TLB miss &
refill.
In addition, the runqueue cacheline that contains nr_running &
nr_uninterruptible may be evicted from the cache between the two passes.
This causes unnecessary cache misses.
Combining nr_running() & nr_interruptible() into a single function
substantially reduces the TLB & cache misses on large systems. This should
have no measureable effect on smaller systems.
On a 128p IA64 system running a memory stress workload, the new function
reduced the overhead of calc_load() from 605 usec/call to 324 usec/call.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The removal of the data field in the hrtimer structure enforces the
embedding of the timer into another data structure. nanosleep now uses a
private implementation of the most common used timer callback function
(simple task wakeup).
In order to avoid the reimplentation of such functionality all over the
place a generic hrtimer_sleeper functionality is created.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add an LED trigger for IDE disk activity to the ide-disk driver.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Acked-by: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add support for LED triggers to the LED subsystem. "Triggers" are events
which change the state of an LED. Two kinds of trigger are available, simple
ones which can be added to exising code with minimum disruption and complex
ones for implementing new or more complex functionality.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add the foundations of a new LEDs subsystem. This patch adds a class which
presents LED devices within sysfs and allows their brightness to be
controlled.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add TIOCL_GETKMSGREDIRECT needed by the userland suspend tool to get the
current value of kmsg_redirect from the kernel so that it can save it and
restore it after resume.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@suse.cz>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the recently-added LINUX_FADV_ASYNC_WRITE and LINUX_FADV_WRITE_WAIT
fadvise() additions, do it in a new sys_sync_file_range() syscall instead.
Reasons:
- It's more flexible. Things which would require two or three syscalls with
fadvise() can be done in a single syscall.
- Using fadvise() in this manner is something not covered by POSIX.
The patch wires up the syscall for x86.
The sycall is implemented in the new fs/sync.c. The intention is that we can
move sys_fsync(), sys_fdatasync() and perhaps sys_sync() into there later.
Documentation for the syscall is in fs/sync.c.
A test app (sync_file_range.c) is in
http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz.
The available-to-GPL-modules do_sync_file_range() is for knfsd: "A COMMIT can
say NFS_DATA_SYNC or NFS_FILE_SYNC. I can skip the ->fsync call for
NFS_DATA_SYNC which is hopefully the more common."
Note: the `async' writeout mode SYNC_FILE_RANGE_WRITE will turn synchronous if
the queue is congested. This is trivial to fix: add a new flag bit, set
wbc->nonblocking. But I'm not sure that we want to expose implementation
details down to that level.
Note: it's notable that we can sync an fd which wasn't opened for writing.
Same with fsync() and fdatasync()).
Note: the code takes some care to handle attempts to sync file contents
outside the 16TB offset on 32-bit machines. It makes such attempts appear to
succeed, for best 32-bit/64-bit compatibility. Perhaps it should make such
requests fail...
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Matt Domsch noticed a startup race with the IPMI kernel thread, it was
possible (though extraordinarly unlikely) that a message could come in
before the upper layer was ready to handle it. This patch splits the
startup processing of an IPMI interface into two parts, one to get ready
and one to actually start the processes to receive messages from the
interface.
[akpm@osdl.org: cleanups]
Signed-off-by: Corey Minyard <minyard@acm.org>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make baby-simple the code for /proc/devices. Based on the proven design
for /proc/interrupts.
This also fixes the early-termination regression 2.6.16 introduced, as
demonstrated by:
# dd if=/proc/devices bs=1
Character devices:
1 mem
27+0 records in
27+0 records out
This should also work (but is untested) when /proc/devices >4096 bytes,
which I believe is what the original 2.6.16 rewrite fixed.
[akpm@osdl.org: cleanups, simplifications]
Signed-off-by: Joe Korty <joe.korty@ccur.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Commit a4a6198b80cf82eb8160603c98da218d1bd5e104:
[PATCH] tvec_bases too large for per-cpu data
introduced "struct tvec_t_base_s boot_tvec_bases" which is visible at
compile time. This means we can kill __init_timer_base and move
timer_base_s's content into tvec_t_base_s.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
find_trylock_page() is an odd interface in that it doesn't take a reference
like the others. Now that XFS no longer uses it, and its last remaining
caller actually wants an elevated refcount, opencode that callsite and
schedule find_trylock_page() for removal.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- chips/sharp.c: make two needlessly global functions static
- move some declarations to a header file where they belong to
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Previously we added NET_IP_ALIGN so an architecture can override the
padding done to align headers. The next step is to allow the skb
headroom to be overridden.
We currently always reserve 16 bytes to grow into, meaning all DMAs
start 16 bytes into a cacheline. On ppc64 we really want DMA writes to
start on a cacheline boundary, so we increase that headroom to one
cacheline.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev:
[PATCH] sata_mv: three bug fixes
[PATCH] libata: ata_dev_init_params() fixes
[PATCH] libata: Fix interesting use of "extern" and also some bracketing
[PATCH] libata: Simplex and other mode filtering logic
[PATCH] libata - ATA is both ATA and CFA
[PATCH] libata: Add ->set_mode hook for odd drivers
[PATCH] libata: BMDMA handling updates
[PATCH] libata: kill trailing whitespace
[PATCH] libata: add FIXME above ata_dev_xfermask()
[PATCH] libata: cosmetic changes in ata_bus_softreset()
[PATCH] libata: kill E.D.D.
This enables the caller to migrate pages from one address space page
cache to another. In buzz word marketing, you can do zero-copy file
copies!
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds support for the sys_splice system call. Using a pipe as a
transport, it can connect to files or sockets (latter as output only).
From the splice.c comments:
"splice": joining two ropes together by interweaving their strands.
This is the "extended pipe" functionality, where a pipe is used as
an arbitrary in-memory buffer. Think of a pipe as a small kernel
buffer that you can use to transfer data from one end to the other.
The traditional unix read/write is extended with a "splice()" operation
that transfers data buffers to or from a pipe buffer.
Named by Larry McVoy, original implementation from Linus, extended by
Jens to support splicing to files and fixing the initial implementation
bugs.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a field to the host_set called 'flags' (was host_set_flags changed
to suit Jeff)
Add a simplex_claimed field so we can remember who owns the DMA channel
Add a ->mode_filter() hook to allow drivers to filter modes
Add docs for mode_filter and set_mode
Filter according to simplex state
Filter cable in core
This provides the needed framework to support all the mode rules found
in the PATA world. The simplex filter deals with 'to spec' simplex DMA
systems found in older chips. The cable filter avoids duplicating the
same rules in each chip driver with PATA. Finally the mode filter is
neccessary because drive/chip combinations have errata that forbid
certain modes with some drives or types of ATA object.
Drive speed setup remains per channel for now and the filters now use
the framework Tejun put into place which cleans them up a lot from the
older libata-pata patches.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Some hardware doesn't want the usual mode setup logic running. This
allows the hardware driver to replace it for special cases in the least
invasive way possible.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This is the minimal patch set to enable the current code to be used with
a controller following SFF (ie any PATA and early SATA controllers)
safely without crashes if there is no BMDMA area or if BMDMA is not
assigned by the BIOS for some reason.
Simplex status is recorded but not acted upon in this change, this isn't
a problem with the current drivers as none of them are for simplex
hardware. A following diff will deal with that.
The flags in the probe structure remain ->host_set_flags although Jeff
asked me to rename them, simply because the rename would break the usual
Linux rules that old code should break when there are changes. not
compile and run and then blow up/eat your computer/etc. Renaming this
later is a trivial exercise once a better name is chosen.
Signed-off-by: Jeff Garzik <jeff@garzik.org>
On a allyesconfig'ured kernel:
Size Uses Wasted Name and definition
===== ==== ====== ================================================
95 162 12075 netif_wake_queue include/linux/netdevice.h
129 86 9265 dev_kfree_skb_any include/linux/netdevice.h
127 56 5885 netif_device_attach include/linux/netdevice.h
73 86 4505 dev_kfree_skb_irq include/linux/netdevice.h
46 60 1534 netif_device_detach include/linux/netdevice.h
119 16 1485 __netif_rx_schedule include/linux/netdevice.h
143 5 492 netif_rx_schedule include/linux/netdevice.h
81 7 366 netif_schedule include/linux/netdevice.h
netif_wake_queue is big because __netif_schedule is a big inline:
static inline void __netif_schedule(struct net_device *dev)
{
if (!test_and_set_bit(__LINK_STATE_SCHED, &dev->state)) {
unsigned long flags;
struct softnet_data *sd;
local_irq_save(flags);
sd = &__get_cpu_var(softnet_data);
dev->next_sched = sd->output_queue;
sd->output_queue = dev;
raise_softirq_irqoff(NET_TX_SOFTIRQ);
local_irq_restore(flags);
}
}
static inline void netif_wake_queue(struct net_device *dev)
{
#ifdef CONFIG_NETPOLL_TRAP
if (netpoll_trap())
return;
#endif
if (test_and_clear_bit(__LINK_STATE_XOFF, &dev->state))
__netif_schedule(dev);
}
By de-inlining __netif_schedule we are saving a lot of text
at each callsite of netif_wake_queue and netif_schedule.
__netif_rx_schedule is also big, and it makes more sense to keep
both of them out of line.
Patch also deinlines dev_kfree_skb_any. We can deinline dev_kfree_skb_irq
instead... oh well.
netif_device_attach/detach are not hot paths, we can deinline them too.
Signed-off-by: Denis Vlasenko <vda@ilport.com.ua>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cleanup the following unused functions:
- ata_pio_poll()
- ata_pio_complete()
- ata_pio_first_block()
- ata_pio_block()
- ata_pio_error()
ap->pio_task_timeout and other enums.
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (67 commits)
[PATCH] powerpc: Remove oprofile spinlock backtrace code
[PATCH] powerpc: Add oprofile calltrace support to all powerpc cpus
[PATCH] powerpc: Add oprofile calltrace support
[PATCH] for_each_possible_cpu: ppc
[PATCH] for_each_possible_cpu: powerpc
[PATCH] lock PTE before updating it in 440/BookE page fault handler
[PATCH] powerpc: Kill _machine and hard-coded platform numbers
ppc: Fix compile error in arch/ppc/lib/strcase.c
[PATCH] git-powerpc: WARN was a dumb idea
[PATCH] powerpc: a couple of trivial compile warning fixes
powerpc: remove OCP references
powerpc: Make uImage default build output for MPC8540 ADS
powerpc: move math-emu over to arch/powerpc
powerpc: use memparse() for mem= command line parsing
ppc: fix strncasecmp prototype
[PATCH] powerpc: make ISA floppies work again
[PATCH] powerpc: Fix some initcall return values
[PATCH] powerpc: Workaround for pSeries RTAS bug
[PATCH] spufs: fix __init/__exit annotations
[PATCH] powerpc: add hvc backend for rtas
...
Move 'tsk->sighand = NULL' from cleanup_sighand() to __exit_signal(). This
makes the exit path more understandable and allows us to do
cleanup_sighand() outside of ->siglock protected section.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch kills PIDTYPE_TGID pid_type thus saving one hash table in
kernel/pid.c and speeding up subthreads create/destroy a bit. It is also a
preparation for the further tref/pids rework.
This patch adds 'struct list_head thread_group' to 'struct task_struct'
instead.
We don't detach group leader from PIDTYPE_PID namespace until another
thread inherits it's ->pid == ->tgid, so we are safe wrt premature
free_pidmap(->tgid) call.
Currently there are no users of find_task_by_pid_type(PIDTYPE_TGID).
Should the need arise, we can use find_task_by_pid()->group_leader.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-By: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
__exit_signal() is private to release_task() now. I think it is better to
make it static in kernel/exit.c and export flush_sigqueue() instead - this
function is much more simple and straightforward.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Cosmetic, rename __exit_sighand to cleanup_sighand and move it close to
copy_sighand().
This matches copy_signal/cleanup_signal naming, and I think it is easier to
follow.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
__exit_signal() does important cleanups atomically under ->siglock. It is
also called from copy_process's error path. This is not good, for example we
can't move __unhash_process() under ->siglock for that reason.
We should not mix these 2 paths, just look at ugly 'if (p->sighand)' under
'bad_fork_cleanup_sighand:' label. For copy_process() case it is sufficient
to just backout copy_signal(), nothing more.
Again, nobody can see this task yet. For CLONE_THREAD case we just decrement
signal->count, otherwise nobody can see this ->signal and we can free it
lockless.
This patch assumes it is safe to do exit_thread_group_keys() without
tasklist_lock.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The only caller of exit_sighand(tsk) is copy_process's error path. We can
call __exit_sighand() directly and kill exit_sighand().
This 'tsk' was not yet registered in pid_hash[] or init_task.tasks, it has no
external references, nobody can see it, and
IF (clone_flags & CLONE_SIGHAND)
At least 'current' has a reference to ->sighand, this
means atomic_dec_and_test(sighand->count) can't be true.
ELSE
Nobody can see this ->sighand, this means we can free it
without any locking.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add lock_task_sighand() helper and converts group_send_sig_info() to use
it. Hopefully we will have more users soon.
This patch also removes '!sighand->count' and '!p->usage' checks, I think
they both are bogus, racy and unneeded (but probably it makes sense to
restore them as BUG_ON()s).
->sighand is cleared and it's ->count is decremented in release_task() with
sighand->siglock held, so it is a bug to have '!p->usage || !->count' after
we already locked and verified it is the same. On the other hand, an
already dead task without ->sighand can have a non-zero ->usage due to
ptrace, for example.
If we read the stale value of ->sighand we must see the change after
spin_lock(), because that change was done while holding that same old
->sighand.siglock.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch borrows a clever Hugh's 'struct anon_vma' trick.
Without tasklist_lock held we can't trust task->sighand until we locked it
and re-checked that it is still the same.
But this means we don't need to defer 'kmem_cache_free(sighand)'. We can
return the memory to slab immediately, all we need is to be sure that
sighand->siglock can't dissapear inside rcu protected section.
To do so we need to initialize ->siglock inside ctor function,
SLAB_DESTROY_BY_RCU does the rest.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
daemonize() calls set_special_pids(1,1), while init and kernel threads spawned
from init/main.c:init() run with 0,0 special pids. This patch changes
INIT_SIGNALS() so that that they run with ->pgrp == ->session == 1 also. This
patch relies on fact that swapper's pid == 1.
Now we have no hashed zero pids in pid_hash[].
User-space visibible change is that now /sbin/init runs with (1,1) special
pids and becomes a session leader.
Quoting Eric W. Biederman:
>
> daemonize consuming pids (1,1) then consumes pgrp 1. So that when
> /sbin/init calls setsid() it thinks /sbin/init is a process group
> leader and setsid() fails. So /sbin/init wants pgrp 1 session 1
> but doesn't get it. I am pretty certain daemonize did not exist so
> /sbin/init got pgrp 1 session 1 in 2.4.
>
> That is the bug that is being fixed.
>
> This patch takes things one step farther and essentially calls
> setsid() for pid == 1 before init is execed. That is new behavior
> but it cleans up the kernel as we now do not need to support the
> case of a process without a process group or a session.
>
> The only process that could have possibly cared was /sbin/init
> and it already calls setsid() because it doesn't want that.
>
> If this was going to break anything noticeable the change in behavior
> from 2.4 to 2.6 would have already done that.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
fork_idle() does unhash_process() just after copy_process(). Contrary,
boot_cpu's idle thread explicitely registers itself for each pid_type with nr
= 0.
copy_process() already checks p->pid != 0 before process_counts++, I think we
can just skip attach_pid() calls and job control inits for idle threads and
kill unhash_process(). We don't need to cleanup ->proc_dentry in fork_idle()
because with this patch idle threads are never hashed in
kernel/pid.c:pid_hash[].
We don't need to hash pid == 0 in pidmap_init(). free_pidmap() is never
called with pid == 0 arg, so it will never be reused. So it is still possible
to use pid == 0 in any PIDTYPE_xxx namespace from kernel/pid.c's POV.
However with this patch we don't hash pid == 0 for PIDTYPE_PID case. We still
have have PIDTYPE_PGID/PIDTYPE_SID entries with pid == 0: /sbin/init and
kernel threads which don't call daemonize().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Both SET_LINKS() and SET_LINKS/REMOVE_LINKS() have exactly one caller, and
these callers already check thread_group_leader().
This patch kills theese macros, they mix two different things: setting
process's parent and registering it in init_task.tasks list. Callers are
updated to do these actions by hand.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
add_parent(p, parent) is always called with parent == p->parent, and it makes
no sense to do it differently. This patch removes this argument.
No changes in affected .o files.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
switch_exec_pids is only called from de_thread by way of exec, and it is
only called when we are exec'ing from a non thread group leader.
Currently switch_exec_pids gives the leader the pid of the thread and
unhashes and rehashes all of the process groups. The leader is already in
the EXIT_DEAD state so no one cares about it's pids. The only concern for
the leader is that __unhash_process called from release_task will function
correctly. If we don't touch the leader at all we know that
__unhash_process will work fine so there is no need to touch the leader.
For the task becomming the thread group leader, we just need to give it the
pid of the old thread group leader, add it to the task list, and attach it
to the session and the process group of the thread group.
Currently de_thread is also adding the task to the task list which is just
silly.
Currently the only leader of __detach_pid besides detach_pid is
switch_exec_pids because of the ugly extra work that was being
performed.
So this patch removes switch_exec_pids because it is doing too much, it is
creating an unnecessary special case in pid.c, duing work duplicated in
de_thread, and generally obscuring what it is going on.
The necessary work is added to de_thread, and it seems to be a little
clearer there what is going on.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The kill_sl function doesn't exist in the kernel so a prototype is completely
unnecessary.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* master.kernel.org:/home/rmk/linux-2.6-serial:
[SERIAL] Provide Cirrus EP93xx AMBA PL010 serial support.
[SERIAL] amba-pl010: allow platforms to specify modem control method
[SERIAL] Remove obsoleted au1x00_uart driver
[SERIAL] Small time UART configuration fix for AU1100 processor
* 'cfq-merge' of git://brick.kernel.dk/data/git/linux-2.6-block:
[BLOCK] cfq-iosched: seek and async performance fixes
[PATCH] ll_rw_blk: fix 80-col offender in put_io_context()
[PATCH] cfq-iosched: small cfq_choose_req() optimization
[PATCH] [BLOCK] cfq-iosched: change cfq io context linking from list to tree
Fix a lot of typos. Eyeballed by jmc@ in OpenBSD.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Replace all occurences of 0xff.. in calls to function pci_set_dma_mask()
and pci_set_consistant_dma_mask() with the corresponding DMA_xBIT_MASK from
linux/dma-mapping.h.
Signed-off-by: Matthias Gehre <M.Gehre@gmx.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is a conversion to make the various file_operations structs in fs/
const. Basically a regexp job, with a few manual fixups
The goal is both to increase correctness (harder to accidentally write to
shared datastructures) and reducing the false sharing of cachelines with
things that get dirty in .data (while .rodata is nicely read only and thus
cache clean)
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mark the f_ops members of inodes as const, as well as fix the
ripple-through this causes by places that copy this f_ops and then "do
stuff" with it.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
for_each_cpu() is a for-loop over cpu_possible_map. for_each_online_cpu is
for-loop cpu over cpu_online_map. .....for_each_cpu() is not sufficiently
explicit and can lead to mistakes.
This patch adds for_each_possible_cpu() in preparation for the removal of
for_each_cpu().
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Optimize select and poll by a using stack space for small fd sets
This brings back an old optimization from Linux 2.0. Using the stack is
faster than kmalloc. On a Intel P4 system it speeds up a select of a
single pty fd by about 13% (~4000 cycles -> ~3500)
It also saves memory because a daemon hanging in select or poll will
usually save one or two less pages. This can add up - e.g. if you have 10
daemons blocking in poll/select you save 40KB of memory.
I did a patch for this long ago, but it was never applied. This version is
a reimplementation of the old patch that tries to be less intrusive. I
only did the minimal changes needed for the stack allocation.
The cut off point before external memory is allocated is currently at
832bytes. The system calls always allocate this much memory on the stack.
These 832 bytes are divided into 256 bytes frontend data (for the select
bitmaps of the pollfds) and the rest of the space for the wait queues used
by the low level drivers. There are some extreme cases where this won't
work out for select and it falls back to allocating memory too early -
especially with very sparse large select bitmaps - but the majority of
processes who only have a small number of file descriptors should be ok.
[TBD: 832/256 might not be the best split for select or poll]
I suspect more optimizations might be possible, but they would be more
complicated. One way would be to cache the select/poll context over
multiple system calls because typically the input values should be similar.
Problem is when to flush the file descriptors out though.
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some quick backport bits from the libata PATA work to fix things found in
the sis driver. The piix driver needs some fixes too but those are way to
large and need someone working on old IDE with time to do them.
This patch fixes the case where random bits get loaded into SIS timing
registers according to the description of the correct behaviour from
Vojtech Pavlik. It also adds the SiS5517 ATA16 chipset which is not
currently supported by the driver. Thanks to Conrad Harriss for loaning me
the machine with the 5517 chipset.
Signed-off-by: Alan Cox <alan@redhat.com>
Acked-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add proper prototypes for fat_cache_init() and fat_cache_destroy() in
msdos_fs.h.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On ppc64 we look at a profiling register to work out the sample address and
if it was in userspace or kernel.
The backtrace interface oprofile_add_sample does not allow this. Create
oprofile_add_ext_sample and make oprofile_add_sample use it too.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: Philippe Elie <phil.el@wanadoo.fr>
Cc: John Levon <levon@movementarian.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add driver support for general purpose I/O feature of the Synclink GT
adapters.
Signed-off-by: Paul Fulghum <paulkf@micrgate.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Detect whether a given process is seeky and if so disable (mostly) the
idle window if it is. We still allow just a little idle time, just enough
to allow that process to submit a new request. That is needed to maintain
fairness across priority groups.
In some cases, we could setup several async queues. This is not optimal
from a performance POV, since we want all async io in one queue to perform
good sorting on it. It also impacted sync queues, as async io got too much
slice time.
Signed-off-by: Jens Axboe <axboe@suse.de>
On setups with many disks, we spend a considerable amount of time
looking up the process-disk mapping on each queue of io. Testing with
a NULL based block driver, this costs 40-50% reduction in throughput
for 1000 disks.
Signed-off-by: Jens Axboe <axboe@suse.de>
* 'for-linus' of git://brick.kernel.dk/data/git/linux-2.6-block:
[PATCH] Don't make debugfs depend on DEBUG_KERNEL
[PATCH] Fix blktrace compile with sysfs not defined
[PATCH] unused label in drivers/block/cciss.
[BLOCK] increase size of disk stat counters
[PATCH] blk_execute_rq_nowait-speedup
[PATCH] ide-cd: quiet down GPCMD_READ_CDVD_CAPACITY failure
[BLOCK] ll_rw_blk: kmalloc -> kzalloc conversion
[PATCH] kzalloc() conversion in drivers/block
[PATCH] update max_sectors documentation
... being careful that mutex_trylock is inverted wrt down_trylock
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This allows user-space to access data safely. This is needed for raid5
reshape as user-space needs to take a backup of the first few stripes before
allowing reshape to commence.
It will also be useful in cluster-aware raid1 configurations so that all
cluster members can leave a section of the array untouched while a
resync/recovery happens.
A 'start' and 'end' of the suspended range are written to 2 sysfs attributes.
Note that only one range can be suspended at a time.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
check_reshape checks validity and does things that can be done instantly -
like adding devices to raid1. start_reshape initiates a restriping process to
convert the whole array.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Instead of checkpointing at each stripe, only checkpoint when a new write
would overwrite uncheckpointed data. Block any write to the uncheckpointed
area. Arbitrarily checkpoint at least every 3Meg.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We allow the superblock to record an 'old' and a 'new' geometry, and a
position where any conversion is up to. The geometry allows for changing
chunksize, layout and level as well as number of devices.
When using verion-0.90 superblock, we convert the version to 0.91 while the
conversion is happening so that an old kernel will refuse the assemble the
array. For version-1, we use a feature bit for the same effect.
When starting an array we check for an incomplete reshape and restart the
reshape process if needed. If the reshape stopped at an awkward time (like
when updating the first stripe) we refuse to assemble the array, and let
user-space worry about it.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds raid5_reshape and end_reshape which will start and finish the
reshape processes.
raid5_reshape is only enabled in CONFIG_MD_RAID5_RESHAPE is set, to discourage
accidental use.
Read the 'help' for the CONFIG_MD_RAID5_RESHAPE entry.
and Make sure that you have backups, just in case.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch provides the core of the resize/expand process.
sync_request notices if a 'reshape' is happening and acts accordingly.
It allocated new stripe_heads for the next chunk-wide-stripe in the target
geometry, marking them STRIPE_EXPANDING.
Then it finds which stripe heads in the old geometry can provide data needed
by these and marks them STRIPE_EXPAND_SOURCE. This causes stripe_handle to
read all blocks on those stripes.
Once all blocks on a STRIPE_EXPAND_SOURCE stripe_head are read, any that are
needed are copied into the corresponding STRIPE_EXPANDING stripe_head. Once a
STRIPE_EXPANDING stripe_head is full, it is marks STRIPE_EXPAND_READY and then
is written out and released.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We need to allow that different stripes are of different effective sizes, and
use the appropriate size. Also, when a stripe is being expanded, we must
block any IO attempts until the stripe is stable again.
Key elements in this change are:
- each stripe_head gets a 'disk' field which is part of the key,
thus there can sometimes be two stripe heads of the same area of
the array, but covering different numbers of devices. One of these
will be marked STRIPE_EXPANDING and so won't accept new requests.
- conf->expand_progress tracks how the expansion is progressing and
is used to determine whether the target part of the array has been
expanded yet or not.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Before a RAID-5 can be expanded, we need to be able to expand the stripe-cache
data structure.
This requires allocating new stripes in a new kmem_cache. If this succeeds,
we copy cache pages over and release the old stripes and kmem_cache.
We then allocate new pages. If that fails, we leave the stripe cache at it's
new size. It isn't worth the effort to shrink it back again.
Unfortuanately this means we need two kmem_cache names as we, for a short
period of time, we have two kmem_caches. So they are raid5/%s and
raid5/%s-alt
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The remainder of this batch implements raid5 reshaping. Currently the only
shape change that is supported is added a device, but it is envisioned that
changing the chunksize and layout will also be supported, as well as changing
the level (e.g. 1->5, 5->6).
The reshape process naturally has to move all of the data in the array, and so
should be used with caution. It is believed to work, and some testing does
support this, but wider testing would be great for increasing my confidence.
You will need a version of mdadm newer than 2.3.1 to make use of raid5 growth.
This is because mdadm need to take a copy of a 'critical section' at the
start of the array incase there is a crash at an awkward moment. On restart,
mdadm will restore the critical section and allow reshape to continue.
I hope to release a 2.4-pre by early next week - it still needs a little more
polishing.
This patch:
Previously the array of disk information was included in the raid5 'conf'
structure which was allocated to an appropriate size. This makes it awkward
to change the size of that array. So we split it off into a separate
kmalloced array which will require a little extra indexing, but is much easier
to grow.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adding bd_claim_by_kobject() function which takes kobject as additional
signature of holder device and creates sysfs symlinks between holder device
and claimed device. bd_release_from_kobject() is a counterpart of
bd_claim_by_kobject.
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove all the CONFIG_SYSFS stuff. That's supposed to all be implemented up
in header files.
Yes, the CONFIG_SYSFS=n data structures will be a little larger than
necessary, but that's a tradeoff we can decide to make.
Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Creating "slaves" and "holders" directories in /sys/block/<disk> and
creating "holders" directory under /sys/block/<disk>/<partition>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Allow drive geometry to be stored with a new DM_DEV_SET_GEOMETRY ioctl.
Device-mapper will now respond to HDIO_GETGEO. If the geometry information is
not available, zero will be returned for all of the parameters.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This flag should be set for a virtual device iff it is set for all
underlying devices.
Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Quadro NVS280 is a dual-head PCIe card with PCI ID 10de:00fd and subsystem ID
10de:0215.
Signed-off-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a driver for the ST M48T86 / Dallas DS12887 RTC.
This is a platform driver. The platform device must provide I/O routines to
access the RTC.
Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds the I2C driver ids to i2c-id.h in preparation of the I2C
direct probing method.
This is kept separate so that it can be integrated to
Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch, completely optional, removes from drivers/i2c/chips all the
drivers that are implemented in the new RTC subsystem.
It should be noted that none of the current driver is actually integrated,
i.e. usable without further patches.
Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add the basic RTC subsystem infrastructure to the kernel.
rtc/class.c - registration facilities for RTC drivers
rtc/interface.c - kernel/rtc interface functions
rtc/hctosys.c - snippet of code that copies hw clock to sw clock
at bootup, if configured to do so.
Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The kernel's implementation of notifier chains is unsafe. There is no
protection against entries being added to or removed from a chain while the
chain is in use. The issues were discussed in this thread:
http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2
We noticed that notifier chains in the kernel fall into two basic usage
classes:
"Blocking" chains are always called from a process context
and the callout routines are allowed to sleep;
"Atomic" chains can be called from an atomic context and
the callout routines are not allowed to sleep.
We decided to codify this distinction and make it part of the API. Therefore
this set of patches introduces three new, parallel APIs: one for blocking
notifiers, one for atomic notifiers, and one for "raw" notifiers (which is
really just the old API under a new name). New kinds of data structures are
used for the heads of the chains, and new routines are defined for
registration, unregistration, and calling a chain. The three APIs are
explained in include/linux/notifier.h and their implementation is in
kernel/sys.c.
With atomic and blocking chains, the implementation guarantees that the chain
links will not be corrupted and that chain callers will not get messed up by
entries being added or removed. For raw chains the implementation provides no
guarantees at all; users of this API must provide their own protections. (The
idea was that situations may come up where the assumptions of the atomic and
blocking APIs are not appropriate, so it should be possible for users to
handle these things in their own way.)
There are some limitations, which should not be too hard to live with. For
atomic/blocking chains, registration and unregistration must always be done in
a process context since the chain is protected by a mutex/rwsem. Also, a
callout routine for a non-raw chain must not try to register or unregister
entries on its own chain. (This did happen in a couple of places and the code
had to be changed to avoid it.)
Since atomic chains may be called from within an NMI handler, they cannot use
spinlocks for synchronization. Instead we use RCU. The overhead falls almost
entirely in the unregister routine, which is okay since unregistration is much
less frequent that calling a chain.
Here is the list of chains that we adjusted and their classifications. None
of them use the raw API, so for the moment it is only a placeholder.
ATOMIC CHAINS
-------------
arch/i386/kernel/traps.c: i386die_chain
arch/ia64/kernel/traps.c: ia64die_chain
arch/powerpc/kernel/traps.c: powerpc_die_chain
arch/sparc64/kernel/traps.c: sparc64die_chain
arch/x86_64/kernel/traps.c: die_chain
drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list
kernel/panic.c: panic_notifier_list
kernel/profile.c: task_free_notifier
net/bluetooth/hci_core.c: hci_notifier
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain
net/ipv6/addrconf.c: inet6addr_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain
net/netlink/af_netlink.c: netlink_chain
BLOCKING CHAINS
---------------
arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain
arch/s390/kernel/process.c: idle_chain
arch/x86_64/kernel/process.c idle_notifier
drivers/base/memory.c: memory_chain
drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list
drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list
drivers/macintosh/adb.c: adb_client_list
drivers/macintosh/via-pmu.c sleep_notifier_list
drivers/macintosh/via-pmu68k.c sleep_notifier_list
drivers/macintosh/windfarm_core.c wf_client_list
drivers/usb/core/notify.c usb_notifier_list
drivers/video/fbmem.c fb_notifier_list
kernel/cpu.c cpu_chain
kernel/module.c module_notify_list
kernel/profile.c munmap_notifier
kernel/profile.c task_exit_notifier
kernel/sys.c reboot_notifier_list
net/core/dev.c netdev_chain
net/decnet/dn_dev.c: dnaddr_chain
net/ipv4/devinet.c: inetaddr_chain
It's possible that some of these classifications are wrong. If they are,
please let us know or submit a patch to fix them. Note that any chain that
gets called very frequently should be atomic, because the rwsem read-locking
used for blocking chains is very likely to incur cache misses on SMP systems.
(However, if the chain's callout routines may sleep then the chain cannot be
atomic.)
The patch set was written by Alan Stern and Chandra Seetharaman, incorporating
material written by Keith Owens and suggestions from Paul McKenney and Andrew
Morton.
[jes@sgi.com: restructure the notifier chain initialization macros]
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
futex.h updates:
- get rid of FUTEX_OWNER_PENDING - it's not used
- reduce ROBUST_LIST_LIMIT to a saner value
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- fix: initialize the robust list(s) to NULL in copy_process.
- doc update
- cleanup: rename _inuser to _inatomic
- __user cleanups and other small cleanups
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
32-bit syscall compatibility support. (This patch also moves all futex
related compat functionality into kernel/futex_compat.c.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Acked-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add the core infrastructure for robust futexes: structure definitions, the new
syscalls and the do_exit() based cleanup mechanism.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Acked-by: Ulrich Drepper <drepper@redhat.com>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Just about every architecture defines some macros to do operations on pfns.
They're all virtually identical. This patch consolidates all of them.
One minor glitch is that at least i386 uses them in a very skeletal header
file. To keep away from #include dependency hell, I stuck the new
definitions in a new, isolated header.
Of all of the implementations, sh64 is the only one that varied by a bit.
It used some masks to ensure that any sign-extension got ripped away before
the arithmetic is done. This has been posted to that sh64 maintainers and
the development list.
Compiles on x86, x86_64, ia64 and ppc64.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Helper functions for for_each_online_pgdat/for_each_zone look too big to be
inlined. Speed of these helper macro itself is not very important. (inner
loops are tend to do more work than this)
This patch make helper function to be out-of-lined.
inline out-of-line
.text 005c0680 005bf6a0
005c0680 - 005bf6a0 = FE0 = 4Kbytes.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
By using for_each_online_pgdat(), pgdat_list is not necessary now. This patch
removes it.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a list_head to bootmem_data_t and make bootmems use it. bootmem list is
sorted by node_boot_start.
Only nodes against which init_bootmem() is called are linked to the list.
(i386 allocates bootmem only from one node(0) not from all online nodes.)
A summary:
1. for_each_online_pgdat() traverses all *online* nodes.
2. alloc_bootmem() allocates memory only from initialized-for-bootmem nodes.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch defines for_each_online_pgdat() as a replacement of
for_each_pgdat()
Now, online nodes are managed by node_online_map. But for_each_pgdat()
uses pgdat_link to iterate over all nodes(pgdat). This means management
structure for online pgdat is duplicated.
I think using node_online_map for for_each_pgdat() is simple and sane
rather ather than pgdat_link. New macro is named as
for_each_online_pgdat(). Following patch will fix callers of
for_each_pgdat().
The bootmem allocater uses for_each_pgdat() before pgdat initialization. I
don't think it's sane. Following patch will fix it.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes zone_mem_map.
pfn_to_page uses pgdat, page_to_pfn uses zone. page_to_pfn can use pgdat
instead of zone, which is only one user of zone_mem_map. By modifing it,
we can remove zone_mem_map.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There are 3 memory models, FLATMEM, DISCONTIGMEM, SPARSEMEM.
Each arch has its own page_to_pfn(), pfn_to_page() for each models.
But most of them can use the same arithmetic.
This patch adds asm-generic/memory_model.h, which includes generic
page_to_pfn(), pfn_to_page() definitions for each memory model.
When CONFIG_OUT_OF_LINE_PFN_TO_PAGE=y, out-of-line functions are
used instead of macro. This is enabled by some archs and reduces
text size.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a new sched domain for representing multi-core with shared caches
between cores. Consider a dual package system, each package containing two
cores and with last level cache shared between cores with in a package. If
there are two runnable processes, with this appended patch those two
processes will be scheduled on different packages.
On such systems, with this patch we have observed 8% perf improvement with
specJBB(2 warehouse) benchmark and 35% improvement with CFP2000 rate(with 2
users).
This new domain will come into play only on multi-core systems with shared
caches. On other systems, this sched domain will be removed by domain
degeneration code. This new domain can be also used for implementing power
savings policy (see OLS 2005 CMP kernel scheduler paper for more details..
I will post another patch for power savings policy soon)
Most of the arch/* file changes are for cpu_coregroup_map() implementation.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We can now make some code static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
.. it makes some of the code nicer.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Cache_fresh is now only used in cache.c, so unexport it.
Part of cache_fresh (setting CACHE_VALID) should really be done under the
lock, while part (calling cache_revisit_request etc) must be done outside the
lock. So we split it up appropriately.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This has been replaced by more traditional code.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The C++-like 'template' approach proves to be too ugly and hard to work with.
The old 'template' won't go away until all users are updated.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These were an unnecessary wart. Also only have one 'DefineSimpleCache..'
instead of two.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Current svc_expkey holds a pointer to the svc_export structure, so updates to
that structure have to be in-place, which is a wart on the whole cache
infrastruct. So we break that linkage and just do a second lookup.
If this became a performance issue, it would be possible to put a direct link
back in which was only used conditionally. i.e. when an object is replaced
in the cache, we set a flag in the old object. When dereferencing the link
from svc_expkey, if the flag is set, we drop the reference and do a fresh
lookup.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The 'auth_domain's are simply handles on internal data structures. They do
not cache information from user-space, and forcing them into the mold of a
'cache' misrepresents their true nature and causes confusion.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch define a new autofs packet for autofs v5 and updates the waitq.c
functions to handle the additional packet type.
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Update autofs4 version.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The kernel's representation of the disk statistics uses the type unsigned
which is 32b on both 32b and 64b platforms. Unfortunately, most system
tools that work with these numbers that are exported in /proc/diskstats
including iostat read these numbers into unsigned longs. This works fine
on 32b platforms and when the number of IO transactions are small on 64b
platforms. However, when the numbers wrap on 64b platforms & you read the
numbers into unsigned longs, and compare the numbers to previous readings,
then you get an unsigned representation of a negative number. This looks
like a very large 64b number & gives you bizarre readouts in iostat:
ilc4: Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
ilc4: sda 5.50 0.00 143.96 0.00 307496983987862656.00 0.00 153748491993931328.00 0.00 2136028725038430.00 7.94 55.12 5.59 80.42
Though fixing iostat in user space is possible, and a quick survey
indicates that several other similar tools also use unsigned longs when
processing /proc/diskstats. Therefore, it seems like a better approach
would be to extend the length of the disk_stats structure on 64b
architectures to 64b. The following patch does that. It should not affect
the operation on 32b platforms.
Signed-off-by: Ben Woodard <woodard@redhat.com>
Cc: Rick Lindsley <ricklind@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jens Axboe <axboe@suse.de>
ppc32: Adds support for the PCI hostbridge in MPC5200B
Signed-off-by: John Rigby <jrigby@freescale.com>
Signed-off-by: Sylvain Munaut <tnt@246tNt.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The amba-pl010 hardware does not provide RTS and DTR control lines; it
is expected that these will be implemented using GPIO. Allow platforms
to supply a function to implement manipulation of modem control lines.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
generic_{ffs,fls,fls64,hweight{64,32,16,8}}() were moved into
include/asm-generic/bitops.h. So all architectures don't use them.
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
And: Tilman Schmidt <tilman@imap.cc>
This patch adds the tty interface to the gigaset module. The tty interface
provides direct access to the AT command set of the Gigaset devices.
Signed-off-by: Hansjoerg Lipp <hjlipp@web.de>
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Cc: Karsten Keil <kkeil@suse.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The nanosleep cleanup allows to remove the data field of hrtimer. The
callback function can use container_of() to get it's own data. Since the
hrtimer structure is anyway embedded in other structures, this adds no
overhead.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
nsec_t predates ktime_t and has mostly been superseded by it. In the few
places that are left it's better to make it explicit that we're dealing with
64 bit values here.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now that it_real_value is gone, the last user of DEFINE_KTIME and
ktime_to_clock_t are also gone, so remove it before someone starts using it
again.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the state field and encode this information in the rb_node similiar to
normal timer.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
nanosleep is the only user of the expired state, so let it manage this itself,
which makes the hrtimer code a bit simpler. The remaining time is also only
calculated if requested.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Pass current time to hrtimer_forward(). This allows to use the softirq time
in the timer base when the forward function is called from the timer callback.
Other places pass current time with a call to timer->base->get_time().
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The hrtimer softirq is called from the timer softirq every tick. Retrieve the
current time from xtime and wall_to_monotonic instead of calling
base->get_time() for each timer base. Store the time in the base structure
and provide a hook once clock source abstractions are in place and to keep the
code open for new base clocks.
Based on a patch from: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now that get_block() can handle mapping multiple disk blocks, no need to have
->get_blocks(). This patch removes fs specific ->get_blocks() added for DIO
and makes it users use get_block() instead.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Pass amount of disk needs to be mapped to get_block(). This way one can
modify the fs ->get_block() functions to map multiple blocks at the same time.
[akpm@osdl.org: performance tweak]
[akpm@osdl.org: remove unneeded assignments]
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Increase the size of the buffer_head b_size field (only) for 64 bit platforms.
Update some old and moldy comments in and around the structure as well.
The b_size increase allows us to perform larger mappings and allocations for
large I/O requests from userspace, which tie in with other changes allowing
the get_block_t() interface to map multiple blocks at once.
Signed-off-by: Nathan Scott <nathans@sgi.com>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change ext3_try_to_allocate() (called via ext3_new_blocks()) to try to
allocate the requested number of blocks on a best effort basis: After
allocated the first block, it will always attempt to allocate the next few(up
to the requested size and not beyond the reservation window) adjacent blocks
at the same time.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently ext3_get_block() only maps or allocates one block at a time. This
is quite inefficient for sequential IO workload.
I have posted a early implements a simply multiple block map and allocation
with current ext3. The basic idea is allocating the 1st block in the existing
way, and attempting to allocate the next adjacent blocks on a best effort
basis. More description about the implementation could be found here:
http://marc.theaimsgroup.com/?l=ext2-devel&m=112162230003522&w=2
The following the latest version of the patch: break the original patch into 5
patches, re-worked some logicals, and fixed some bugs. The break ups are:
[patch 1] Adding map multiple blocks at a time in ext3_get_blocks()
[patch 2] Extend ext3_get_blocks() to support multiple block allocation
[patch 3] Implement multiple block allocation in ext3-try-to-allocate
(called via ext3_new_block()).
[patch 4] Proper accounting updates in ext3_new_blocks()
[patch 5] Adjust reservation window size properly (by the given number
of blocks to allocate) before block allocation to increase the
possibility of allocating multiple blocks in a single call.
Tests done so far includes fsx,tiobench and dbench. The following numbers
collected from Direct IO tests (1G file creation/read) shows the system time
have been greatly reduced (more than 50% on my 8 cpu system) with the patches.
1G file DIO write:
2.6.15 2.6.15+patches
real 0m31.275s 0m31.161s
user 0m0.000s 0m0.000s
sys 0m3.384s 0m0.564s
1G file DIO read:
2.6.15 2.6.15+patches
real 0m30.733s 0m30.624s
user 0m0.000s 0m0.004s
sys 0m0.748s 0m0.380s
Some previous test we did on buffered IO with using multiple blocks allocation
and delayed allocation shows noticeable improvement on throughput and system
time.
This patch:
Add support of mapping multiple blocks in one call.
This is useful for DIO reads and re-writes (where blocks are already
allocated), also is in line with Christoph's proposal of using getblocks() in
mpage_readpage() or mpage_readpages().
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This fix was proposed by Trond Myklebust. He says: The type "sector_t" is
heavily tied in to the block layer interface as an offset/handle to a block,
and is subject to a supposedly block-specific configuration option:
CONFIG_LBD. Despite this, it is used in struct kstatfs to save a couple of
bytes on the stack whenever we call the filesystems' ->statfs().
So kstatfs's entries related to blocks are invalid on statfs64 for a network
filesystem which has more than 2^32-1 blocks when CONFIG_LBD is disabled.
- struct kstatfs
Change the type of following entries from sector_t to u64.
f_blocks
f_bfree
f_bavail
f_files
f_ffree
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Takashi Sato <sho@tnes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add blkcnt_t as the type of inode.i_blocks. This enables you to make the size
of blkcnt_t either 4 bytes or 8 bytes on 32 bits architecture with CONFIG_LSF.
- CONFIG_LSF
Add new configuration parameter.
- blkcnt_t
On h8300, i386, mips, powerpc, s390 and sh that define sector_t,
blkcnt_t is defined as u64 if CONFIG_LSF is enabled; otherwise it is
defined as unsigned long.
On other architectures, it is defined as unsigned long.
- inode.i_blocks
Change the type from sector_t to blkcnt_t.
Signed-off-by: Takashi Sato <sho@tnes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch series fixes the following problems on 32 bits architecture.
o stat64 returns the lower 32 bits of blocks, although userland st_blocks
has 64 bits, because i_blocks has only 32 bits. The ioctl with FIOQSIZE has
the same problem.
o As Dave Kleikamp said, making >2TB file on JFS results in writing an
invalid block number to disk inode. The cause is the same as above too.
o In generic quota code dquot_transfer(), the file usage is calculated from
i_blocks via inode_get_bytes(). If the file is over 2TB, the change of
usage is less than expected. The cause is the same as above too.
o As Trond Myklebust said, statfs64's entries related to blocks are invalid
on statfs64 for a network filesystem which has more than 2^32-1 blocks with
CONFIG_LBD disabled. [PATCH 3/3]
We made patches to fix problems that occur when handling a large filesystem
and a large file. It was discussed on the mails titled "stat64 for over 2TB
file returned invalid st_blocks".
Signed-off-by: Takashi Sato <sho@tnes.nec.co.jp>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Modify well over a dozen mempool users to call mempool_create_slab_pool()
rather than calling mempool_create() with extra arguments, saving about 30
lines of code and increasing readability.
Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Create a simple wrapper function for the common case of creating a slab-based
mempool.
Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add another allocator to the common mempool code: a kzalloc/kfree allocator
This will be used by the next patch in the series to replace a mempool-backed
kzalloc allocator. It is also very likely that there will be more users in
the future.
Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add another allocator to the common mempool code: a kmalloc/kfree allocator
This will be used by the next patch in the series to replace duplicate
mempool-backed kmalloc allocators in several places in the kernel. It is also
very likely that there will be more users in the future.
Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This will be used by the next patch in the series to replace duplicate
mempool-backed page allocators in 2 places in the kernel. It is also likely
that there will be more users in the future.
Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change proc_dir_entry->size to be loff_t to represent files like
/proc/vmcore for 32bit systems with more than 4G memory.
Needed for seeing correct size for /proc/vmcore for 32-bit systems with >
4G RAM.
Signed-off-by: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Create compat_sys_adjtimex and use it an all appropriate places.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We had a copy of the compatibility version of struct timex in each 64 bit
architecture. This patch just creates a global one and replaces all the
usages of the old ones.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kyle McMartin <kyle@parisc-linux.org>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Lockd and the NFSv4 server both exercise a race condition where
posix_test_lock() is called either before or after posix_lock_file() to
deal with a denied lock request due to a conflicting lock.
Remove the race condition for the NFSv4 server by adding a new conflicting
lock parameter to __posix_lock_file() , changing the name to
__posix_lock_file_conf().
Keep posix_lock_file() interface, add posix_lock_conf() interface, both
call __posix_lock_file_conf().
[akpm@osdl.org: Put the EXPORT_SYMBOL() where it belongs]
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add __KERNEL__ block.
Use __KERNEL__ to allow ioctl interface to be usable.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add full driver model support for the IPMI driver. It links in the proper
bus and device support.
It adds an "ipmi" driver interface that has each BMC discovered by the
driver (as a device). These BMCs appear in the devices/platform directory.
If there are multiple interfaces to the same BMC, the driver should
discover this and will only have one BMC entry. The BMC entry will have
pointers to each interface device that connects to it.
The device information (statistics and config information) has not yet been
ported over to the driver model from proc, that will come later.
This work was based on work by Yani Ioannou. I basically rewrote it using
that code as a guide, but he still deserves credit :).
[bunk@stusta.de: make ipmi_find_bmc_guid() static]
Signed-off-by: Corey Minyard <minyard@acm.org>
Signed-off-by: Yani Ioannou <yani.ioannou@gmail.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
net/core/flow.c: In function 'flow_cache_flush':
net/core/flow.c:299: warning: statement with no effect
Signed-off-by: Con Kolivas <kernel@kolivas.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The return value of this function is never used, so let's be honest and
declare it as void.
Some places where invalidatepage returned 0, I have inserted comments
suggesting a BUG_ON.
[akpm@osdl.org: JBD BUG fix]
[akpm@osdl.org: rework for git-nfs]
[akpm@osdl.org: don't go BUG in block_invalidate_page()]
Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The only user ignores the return value, and the only instanace
(block_sync_page) always returns 0...
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>