فهرست منبع

json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8")

Since the JSON grammer doesn't accept U+0000 anywhere, this merely
exchanges one kind of parse error for another.  It's purely for
consistency with qobject_to_json(), which accepts \xC0\x80 (see commit
e2ec3f97680).

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-26-armbru@redhat.com>
Markus Armbruster 7 سال پیش
والد
کامیت
4b1c0cd7c7
3فایلهای تغییر یافته به همراه3 افزوده شده و 9 حذف شده
  1. 1 1
      qobject/json-lexer.c
  2. 1 1
      qobject/json-parser.c
  3. 1 7
      tests/check-qjson.c

+ 1 - 1
qobject/json-lexer.c

@@ -93,7 +93,7 @@
  *   interpolation = %((l|ll|I64)[du]|[ipsf])
  *   interpolation = %((l|ll|I64)[du]|[ipsf])
  *
  *
  * Note:
  * Note:
- * - Input must be encoded in UTF-8.
+ * - Input must be encoded in modified UTF-8.
  * - Decoding and validating is left to the parser.
  * - Decoding and validating is left to the parser.
  */
  */
 
 

+ 1 - 1
qobject/json-parser.c

@@ -200,7 +200,7 @@ static QString *qstring_from_escaped_str(JSONParserContext *ctxt,
             }
             }
         } else {
         } else {
             cp = mod_utf8_codepoint(ptr, 6, &end);
             cp = mod_utf8_codepoint(ptr, 6, &end);
-            if (cp <= 0) {
+            if (cp < 0) {
                 parse_error(ctxt, token, "invalid UTF-8 sequence in string");
                 parse_error(ctxt, token, "invalid UTF-8 sequence in string");
                 goto out;
                 goto out;
             }
             }

+ 1 - 7
tests/check-qjson.c

@@ -152,12 +152,6 @@ static void string_with_quotes(void)
 static void utf8_string(void)
 static void utf8_string(void)
 {
 {
     /*
     /*
-     * Problem: we can't easily deal with embedded U+0000.  Parsing
-     * the JSON string "this \\u0000" is fun" yields "this \0 is fun",
-     * which gets misinterpreted as NUL-terminated "this ".  We should
-     * consider using overlong encoding \xC0\x80 for U+0000 ("modified
-     * UTF-8").
-     *
      * Most test cases are scraped from Markus Kuhn's UTF-8 decoder
      * Most test cases are scraped from Markus Kuhn's UTF-8 decoder
      * capability and stress test at
      * capability and stress test at
      * http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
      * http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
@@ -586,7 +580,7 @@ static void utf8_string(void)
         {
         {
             /* \U+0000 */
             /* \U+0000 */
             "\xC0\x80",
             "\xC0\x80",
-            NULL,
+            "\xC0\x80",
             "\\u0000",
             "\\u0000",
         },
         },
         {
         {