Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-37-armbru@redhat.com>
json_parser_parse() normally returns the QObject on success. Except
it returns null when its @tokens argument is null.
Its only caller json_message_process_token() passes null @tokens when
emitting a lexical error. The call is a rather opaque way to say json
= NULL then.
Simplify matters by lifting the assignment to json out of the emit
path: initialize json to null, set it to the value of
json_parser_parse() when there's no lexical error. Drop the special
case from json_parser_parse().
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-36-armbru@redhat.com>
The classical way to structure parser and lexer is to have the client
call the parser to get an abstract syntax tree, the parser call the
lexer to get the next token, and the lexer call some function to get
input characters.
Another way to structure them would be to have the client feed
characters to the lexer, the lexer feed tokens to the parser, and the
parser feed abstract syntax trees to some callback provided by the
client. This way is more easily integrated into an event loop that
dispatches input characters as they arrive.
Our JSON parser is kind of between the two. The lexer feeds tokens to
a "streamer" instead of a real parser. The streamer accumulates
tokens until it got the sequence of tokens that comprise a single JSON
value (it counts curly braces and square brackets to decide). It
feeds those token sequences to a callback provided by the client. The
callback passes each token sequence to the parser, and gets back an
abstract syntax tree.
I figure it was done that way to make a straightforward recursive
descent parser possible. "Get next token" becomes "pop the first
token off the token sequence". Drawback: we need to store a complete
token sequence. Each token eats 13 + input characters + malloc
overhead bytes.
Observations:
1. This is not the only way to use recursive descent. If we replaced
"get next token" by a coroutine yield, we could do without a
streamer.
2. The lexer reports errors by passing a JSON_ERROR token to the
streamer. This communicates the offending input characters and
their location, but no more.
3. The streamer reports errors by passing a null token sequence to the
callback. The (already poor) lexical error information is thrown
away.
4. Having the callback receive a token sequence duplicates the code to
convert token sequence to abstract syntax tree in every callback.
5. Known bug: the streamer silently drops incomplete token sequences.
This commit rectifies 4. by lifting the call of the parser from the
callbacks into the streamer. Later commits will address 3. and 5.
The lifting removes a bug from qjson.c's parse_json(): it passed a
pointer to a non-null Error * in certain cases, as demonstrated by
check-qjson.c.
json_parser_parse() is now unused. It's a stupid wrapper around
json_parser_parse_err(). Drop it, and rename json_parser_parse_err()
to json_parser_parse().
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-35-armbru@redhat.com>
json_lexer_init() takes the function to process a token as an
argument. It's always json_message_process_token(). Makes the code
harder to understand for no actual gain. Drop the indirection.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-34-armbru@redhat.com>
The lexer always returns 0 when char feeding. Furthermore, none of the
caller care about the return value.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180326150916.9602-10-marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180823164025.12553-32-armbru@redhat.com>
Now that json-streamer tries not to leak tokens on incomplete parse,
the tokens can be freed twice if QEMU destroys the json-streamer
object during the parser->emit call. To fix this, create the new
empty GQueue earlier, so that it is already in place when the old
one is passed to parser->emit.
Reported-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1467636059-12557-1-git-send-email-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Valgrind complained about a number of leaks in
tests/check-qobject-json:
==12657== definitely lost: 17,247 bytes in 1,234 blocks
All of which had the same root cause: on an incomplete parse,
we were abandoning the token queue without cleaning up the
allocated data within each queue element. Introduced in
commit 95385fe, when we switched from QList (which recursively
frees contents) to g_queue (which does not).
We don't yet require glib 2.32 with its g_queue_free_full(),
so open-code it instead.
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463608012-12760-1-git-send-email-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1454089805-5470-12-git-send-email-peter.maydell@linaro.org
Commit 29c75dd "json-streamer: limit the maximum recursion depth and
maximum token count" attempts to guard against excessive heap usage by
limiting total token size (it says "token count", but that's a lie).
Total token size is a rather imprecise predictor of heap usage: many
small tokens use more space than few large tokens with the same input
size, because there's a constant per-token overhead: 37 bytes on my
system.
Tighten this up: limit the token count to 2Mi. Chosen to roughly
match the 64MiB total token size limit.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1448486613-17634-13-git-send-email-armbru@redhat.com>
Replace the contents of the tokens GQueue with a simple struct. This cuts
the amount of memory allocated by tests/check-qjson from ~500MB to ~20MB,
and the execution time from 600ms to 80ms on my laptop. Still a lot (some
could be saved by using an intrusive list, such as QSIMPLEQ, instead of
the GQueue), but the savings are already massive and the right thing to
do would probably be to get rid of json-streamer completely.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1448300659-23559-5-git-send-email-pbonzini@redhat.com>
[Straightforwardly rebased on my patches]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Even though we still have the "streamer" concept, the tokens can now
be deleted as they are read. While doing so convert from QList to
GQueue, since the next step will make tokens not a QObject and we
will have to do the conversion anyway.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1448300659-23559-4-git-send-email-pbonzini@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
JSONLexer only needs a simple resizable buffer. json-streamer.c
can allocate memory for each token instead of relying on reference
counting of QStrings.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1448300659-23559-2-git-send-email-pbonzini@redhat.com>
[Straightforwardly rebased on my patches, checkpatch made happy]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Simplifies things, because we always check for a specific one.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1448486613-17634-6-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
We limit nesting depth and input size to defend against input
triggering excessive heap or stack memory use (commit 29c75dd
json-streamer: limit the maximum recursion depth and maximum token
count). However, when the nesting limit is exceeded,
parser_context_peek_token()'s assertion fails.
Broken in commit 65c0f1e "json-parser: don't replicate tokens at each
level of recursion".
To reproduce stuff 1025 open braces or brackets into QMP.
Fix by taking the error exit instead of the normal one.
Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1448486613-17634-3-git-send-email-armbru@redhat.com>
The nesting limit from commit 29c75dd "json-streamer: limit the
maximum recursion depth and maximum token count" applies separately to
braces and brackets. This makes no sense. Apply it to their sum,
because that's actually a measure of recursion depth.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1448486613-17634-2-git-send-email-armbru@redhat.com>