View Issue Details

IDProjectCategoryView StatusLast Update
0027886RunnerHTML5Public2019-05-09 15:49
ReporterYellowAfterlifeAssigned ToMike Dailly 
PriorityMediumSeverityB - MajorReproducibility100%
Status ClosedResolutionFixed 
PlatformWindowsOSWindows 8OS Version8.1
Product Version 
Target VersionFixed in Version 
Summary0027886: HTML5: buffer_string and string_byte_ functions are not UTF-8 aware (implementation included)
DescriptionCurrently buffer_read(_, buffer_string), buffer_write(_, buffer_string, _), string_byte_at(_, _), and string_byte_length(_) are not UTF-8 aware, which renders them useless for interoperation with anything that expects the client to handle UTF-8 correctly.

buffer_read and buffer_write use 16-bit integers for char codes, which means that any >= 3 byte glyphs are lost on write even if there's no external code.

string_byte_ functions return regular (char) length/codes, which makes them useless for their purpose.

Attached is a project with a test case (reading/writing 1, 2, 3, 4 byte glyphs; polling byte length; polling bytes) to highlight the issues.

Also included is a GML-only implementation for the proposed way of dealing with the issue (having UTF8 range checks). Doing so is standard practice (I believe that the runtime already works like this on native) and produces both accurate and fast enough results when implemented on runtime level.
Additional Information
TagsNo tags attached.
1.4 Found In1.4.17
2.x Runtime Found In2.1.0.144
2.x Runtime Verified In9.9.1.1431

Activities

YellowAfterlife

2017-09-21 14:14

Developer  

a_bug.yyz (773,494 bytes)

Russell Kay

2018-05-18 15:22

Manager   ~0059839

This appears to have been fixed previously (vague memory of Mike doing this in the past) just bug had not been resolved