Skip to content
This repository was archived by the owner on Apr 27, 2023. It is now read-only.
This repository was archived by the owner on Apr 27, 2023. It is now read-only.

Supporting wide character (UTF-16) strings #8

@EnTerr

Description

@EnTerr

I faced the need to unpack zero-terminated strings where each character is 2 bytes in big-endian (hi-lo) order, even as most all strings are ASCII range. So what i added to unpack is:

    elseif opt == 'S' then   -- wide-character string, hi-lo

      local str = ''
      while true do
        local wch = stream:byte(iterator) + 256 * stream:byte(iterator + 1)
        iterator = iterator + 2
        if wch == 0 then
          break
        end        
        str = str .. (wch < 128 and string.char(wch) or '~')
      end
      table.insert(vars, str)      

    elseif

This is the most controversial/unfinished of my mods, since it assumes little-endian encoding (many apps do lo-hi, even as the default per RFC-2781 is big endian - see https://en.wikipedia.org/wiki/UTF-16#Byte_order_encoding_schemes ). In addition i don't check for https://en.wikipedia.org/wiki/Byte_order_mark .

Nor do i handle correctly code points over 255. Which is a puzzle, how to correctly handle that in Lua? I am guessing the right thing would be to convert to UTF-8 for the internal string (which matches ASCII for <128). In any case - not production ready but existing need.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions