# Background
I was investigating spurious non-deterministic EINTR returns from
various 9p file system operations in a Linux guest served from the
qemu 9p server.
## EINTR, ERESTARTSYS and the linux kernel
When a signal arrives that the Linux kernel needs to deliver to user-space
while a given thread is blocked (in the 9p case waiting for a reply to its
request in 9p_client_rpc -> wait_event_interruptible), it asks whatever
driver is currently running to abort its current operation (in the 9p case
causing the submission of a TFLUSH message) and return to user space.
In these situations, the error message reported is generally ERESTARTSYS.
If the userspace processes specified SA_RESTART, this means that the
system call will get restarted upon completion of the signal handler
delivery (assuming the signal handler doesn't modify the process state
in complicated ways not relevant here). If SA_RESTART is not specified,
ERESTARTSYS gets translated to EINTR and user space is expected to handle
the restart itself.
## The 9p TFLUSH command
The 9p TFLUSH commands requests that the server abort an ongoing operation.
The man page [1] specifies:
```
If it recognizes oldtag as the tag of a pending transaction, it should
abort any pending response and discard that tag.
[...]
When the client sends a Tflush, it must wait to receive the corresponding
Rflush before reusing oldtag for subsequent messages. If a response to the
flushed request is received before the Rflush, the client must honor the
response as if it had not been flushed, since the completed request may
signify a state change in the server
```
In particular, this means that the server must not send a reply with the
orignal tag in response to the cancellation request, because the client is
obligated to interpret such a reply as a coincidental reply to the original
request.
# The bug
When qemu receives a TFlush request, it sets the `cancelled` flag on the
relevant pdu. This flag is periodically checked, e.g. in
`v9fs_co_name_to_path`, and if set, the operation is aborted and the error
is set to EINTR. However, the server then violates the spec, by returning
to the client an Rerror response, rather than discarding the message
entirely. As a result, the client is required to assume that said Rerror
response is a result of the original request, not a result of the
cancellation and thus passes the EINTR error back to user space.
This is not the worst thing it could do, however as discussed above, the
correct error code would have been ERESTARTSYS, such that user space
programs with SA_RESTART set get correctly restarted upon completion of
the signal handler.
Instead, such programs get spurious EINTR results that they were not
expecting to handle.
It should be noted that there are plenty of user space programs that do not
set SA_RESTART and do not correctly handle EINTR either. However, that is
then a userspace bug. It should also be noted that this bug has been
mitigated by a recent commit to the Linux kernel [2], which essentially
prevents the kernel from sending Tflush requests unless the process is about
to die (in which case the process likely doesn't care about the response).
Nevertheless, for older kernels and to comply with the spec, I believe this
change is beneficial.
# Implementation
The fix is fairly simple, just skipping notification of a reply if
the pdu was previously cancelled. We do however, also notify the transport
layer that we're doing this, so it can clean up any resources it may be
holding. I also added a new trace event to distinguish
operations that caused an error reply from those that were cancelled.
One complication is that we only omit sending the message on EINTR errors in
order to avoid confusing the rest of the code (which may assume that a
client knows about a fid if it sucessfully passed it off to pud_complete
without checking for cancellation status). This does mean that if the server
acts upon the cancellation flag, it always needs to set err to EINTR. I
believe this is true of the current code.
[1] https://9fans.github.io/plan9port/man/man9/flush.html
[2] https://github.com/torvalds/linux/commit/9523feac272ccad2ad8186ba4fcc891
Signed-off-by: Keno Fischer <keno@juliacomputing.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
[groug, send a zero-sized reply instead of detaching the buffer]
Signed-off-by: Greg Kurz <groug@kaod.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
No good reasons to do this outside of v9fs_device_realize_common().
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
We masked the wrong bits, which prevented some of the
32-bit R registers. E.g. "fcnvxf,sgl,sgl fr22R,fr6R".
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Now that we have the prerequisites in target/hppa/,
implement the hardware for a PA7100LC.
This also enables build for hppa-softmmu.
Signed-off-by: Helge Deller <deller@gmx.de>
[rth: Since it is all new code, squashed all branch development
withing hw/hppa/ to a single patch.]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This is an extension to the base ISA, but we can use this in
the kernel idle loop to reduce the host cpu time consumed.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
HP-UX 10.20 CD contains "add r0, r0, r27" in a delay slot,
which uses at least 5 temps.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Unknown why this works, but if we return EXCP_ITLB_MISS we
will triple-fault the first userland instruction fetch.
Is it something to do with having a combined I/DTLB?
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Linux sets sr4-sr7 all to the same value, which means that we
need not do any runtime computation to find out what space to
use in forming the GVA.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Real hardware would use an external device to control the power.
But for the moment let's invent instructions in reserved space,
to be used by our custom firmware.
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
However since HPPA has a software-managed TLB, and the relevant
TLB manipulation instructions are not implemented, this does not
actually do anything.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Any one TB will have only one space value. If we change spaces,
we change TBs. Thus BE and BEV must exit the TB immediately.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
These instructions force the destination privilege level
of the branch destination to be no higher than current.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This changes the system virtual address width to 64-bit and
incorporates the space registers into load/store operations.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
While the E bit is only used for pa2.0 mfctl,w from sar,
the otherwise reserved bit does not appear to be decoded.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Most aspects of privilege are not yet handled. But this
gives us the start from which to begin checking.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
For system mode, we will need 64-bit virtual addresses even when
we have 32-bit register sizes. Since the rest of QEMU equates
TARGET_LONG_BITS with the address size, redefine everything
related to register size in terms of a new TARGET_REGISTER_BITS.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We don't actually do anything with most of the bits yet,
but at least they have names and we have somewhere to
store them.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
With the addition of default-configs/hppa-softmmu.mak, this
will compile. It is not enabled with this patch, however.
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJab54VAAoJEHWtZYAqC0IRZ+IH+QFtVX3R9fVxlSmFtPs7L9+s
a+WbbVbYf0toiTg1taRoYgyGkryc8Gtw8VJrN2iowM8KFjEx+h2cZ3qoRd15GqP6
jFAGb0lc6tjOk0O5pDiJU8hErSrIda8biBp/I0QDz3RkXeGrAZ7FrQemj0FXQjEG
0o+xGstCYKrVfGxrnDysfvyGSDOad0HnBqwc0rerbVjBJe5p8UErP8DSPsNCaj6W
qbSSgySeMnTeXGOwIXgCW43eTEJG13eBQ/rNJRqrcoIXiBd/txPb+c+E1iBBAmrF
XZHxS4v8vP+8rVRgBut4sIr2psx1DZvktHRThJDgu+Cyv6h7c6okQ0wxmo0+9bo=
=k7Fh
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/stefanberger/tags/pull-tpm-2018-01-26-2' into staging
Merge tpm 2018/01/26 v2
# gpg: Signature made Mon 29 Jan 2018 22:20:05 GMT
# gpg: using RSA key 0x75AD65802A0B4211
# gpg: Good signature from "Stefan Berger <stefanb@linux.vnet.ibm.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: B818 B9CA DF90 89C2 D5CE C66B 75AD 6580 2A0B 4211
* remotes/stefanberger/tags/pull-tpm-2018-01-26-2:
tpm: add CRB device
tpm: report backend request error
tpm: replace GThreadPool with AIO threadpool
tpm: lookup cancel path under tpm device class
tpm: fix alignment issues
tpm: Set the flags of the CMD_INIT command to 0
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The SPARC code in linux-user/signal.c defines a set of
MC_* constants. On some SPARC hosts these are also defined
by sys/ucontext.h, resulting in build failures:
linux-user/signal.c:2786:0: error: "MC_NGREG" redefined [-Werror]
#define MC_NGREG 19
In file included from /usr/include/signal.h:302:0,
from include/qemu/osdep.h:86,
from linux-user/signal.c:19:
/usr/include/sparc64-linux-gnu/sys/ucontext.h:59:0: note: this is the location of the previous definition
# define MC_NGREG __MC_NGREG
Rename all these constants to SPARC_MC_* to avoid the clash.
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1517318239-15764-1-git-send-email-peter.maydell@linaro.org
tpm_crb is a device for TPM 2.0 Command Response Buffer (CRB)
Interface as defined in TCG PC Client Platform TPM Profile (PTP)
Specification Family “2.0” Level 00 Revision 01.03 v22.
The PTP allows device implementation to switch between TIS and CRB
model at run time, but given that CRB is a simpler device to
implement, I chose to implement it as a different device.
The device doesn't implement other locality than 0 for now (my laptop
TPM doesn't either, so I assume this isn't so bad)
Tested with some success with Linux upstream and Windows 10, seabios &
modified ovmf. The device is recognized and correctly transmit
command/response with passthrough & emu. However, we are missing PPI
ACPI part atm.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>