Branch: Tag:

2018-05-24

2018-05-24 14:55:37 by Henrik Grubbström (Grubba) <grubba@grubba.org>

Sql.mysql: Use/support UTF-8 encoded UTF-16.

MySQL/MariaDB default to a "utf8" character set that may only
encode the BMP (max 3 bytes). In MySQL/MariaDB 5.5 and later
there is an additional character set "utf8mb4" that also supports
the code points outside the BMP. This new character set however
requires redefining tables, etc for it to be able to be used.

As a work-around we instead default to keep using the "utf8"
character set while encoding characters outside the BMP with
surrogate pairs. This works seemlessly with old table definitions,
while having the minor defect of characters outside the BMP not
collating as single characters.

Fixes PIKE-112 (#8112).

421:    ])));   }    - string utf8_encode_query (string q, function(string:string) encode_fn) + string utf8_encode_query (string q, +  function(string, mixed|void...:string) encode_fn, +  mixed ... extras)   //! Encodes the appropriate sections of the query with @[encode_fn].   //! Everything except strings prefixed by an introducer (i.e.   //! @expr{_something@} or @expr{N@}) is encoded.
430:    string e = "";    while (1) {    sscanf(q, "%[^\'\"]%s", string prefix, string suffix); -  e += encode_fn (prefix); +  e += encode_fn (prefix, @extras);       if (suffix == "") break;   
526:    }    e += s;    } else { -  e += encode_fn (suffix[..end]); +  e += encode_fn (suffix[..end], @extras);    }       q = suffix[end+1..];
685:    */ \    if ((send_charset == "utf8") || !_can_send_as_latin1(query)) { \    CH_DEBUG ("Converting query to utf8.\n"); \ -  query = utf8_encode_query (query, string_to_utf8); \ +  query = utf8_encode_query (query, string_to_utf8, 2); \    new_send_charset = "utf8"; \    } \    } \