Eigenstate: myrddin-dev mailing list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH 0/2] Implement bygrapheme()


A week or so ago, Ori suggested to me on irc that std.bygrapheme()
would be good to have. I'm trying to improve support of zalgo text
in libtermdraw, and I find myself wanting this as well.

This patch has a slightly narrower view than the Unicode spec on
what a “grapheme” is. Unicode's definition of grapheme is context-
and user-dependent. For simplicity of implementation, this patch
treats a grapheme as a codepoint of width > 0 (as determined by
std.cellwidth()), followed by 0 or more codepoints of width 0.

If the argument to bygrapheme() doesn't start with a grapheme, or
if it isn't valid UTF-8, the function will attempt to read off
enough bytes to generate something that would display with positive
width.

I'm coming at this entirely from the perspective of libtermdraw,
hopefully it is not too awkward for other applications.

Also, in order to make the test patch readable, patch 1 removes all
0x00 bytes from lib/std/test/utf.myr, so that git knows to display
diffs correctly.


S. Gilles (2):
  Make lib/std/test/utf.myr a non-binary file
  Implement bygrapheme()

 lib/std/test/utf.myr | Bin 1781 -> 4927 bytes
 lib/std/utf.myr      |  25 +++++++++++++++++++++++++
 2 files changed, 25 insertions(+)

-- 
2.15.0


Follow-Ups:
[PATCH 1/2] Make lib/std/test/utf.myr a non-binary file"S. Gilles" <sgilles@xxxxxxxxxxxx>
[PATCH 2/2] Implement bygrapheme()"S. Gilles" <sgilles@xxxxxxxxxxxx>