The length of ContentString is wrong when the content contains emoji

Hi, the function getLength() of ContentString return the length is wrong when the filed str contains emoji. I think this is javascript problem, not yours, but you should using the following style of writting.

  let str = this.str
  return [...str].length

test code

var str = "👶"
console.log(str.length) // output 2
console.log([...str].length) // output 1

Hi @sky_terra,

It’s not wrong. It returns the length of the utf16 encoded string (internally just an array of codepoints that have a size of 16 bits). Many people (like you) prefer to calculate the size using grapheme clusters instead. The downside is that calculating grapheme clusters repetitively is fairly expensive. Hence most programming deals with lengths of utf8/16/32 code points instead of grapheme clusters.

Personally, I’m not a strong advocator of either approach. However, the truth is that literally, all web editors deal with code points and not with grapheme clusters. Hence it makes sense to use the conventional approach in Yjs as well. Yrs will support both graheme cluster size and code point size.


Thanks for your detail reply, it is very import information for me to understand Yjs.