(There seems to be a problem with that anyways, but that needs a bit more investigating from my side.) Getting that list or updating the current one probably won't be as easy as last time, I couldn't find the emoji.json on the new Discord APKs. This would also prevent #599 from becoming an issue again. Lastly, it's also worth noting that our emoji list needs an update as well since it's missing emojis from Emoji 14.0 and 15.0, which will soon be released on Discord. There is also this database, but I think it just contains what an to Emoji 14.0 updated version of our emoji list contains. I will send this comment anyways, no harm in doing so. The most naive approach would be to use our list of emoji (which has over 4000 items in it) to check every series of characters we encounter, prioritizing from longest to shortest. Oh wait, I just noticed that this is probably the same as After all, the emoji still gets rendered, just not using the Twitter's/Discord's image set. Overall I just thought it wasn't worth the effort because not getting some emoji was not as bad as incorrectly matching unrelated text and breaking the export. an emoji for man, woman, and a child, renders the emoji for family). It's worth noting that Discord's emoji support is particularly extensive, for example it supports compound emoji where multiple separate emoji can be combined to render a different one (e.g. I was unable to come up with a reliable way to match emoji in the same way that Discord does. I'm in favor of some support being added for the missed single-byte emoji since there are quite a few of those, but it looks like it'd be very difficult to get an entirely accurate mapping. I believe Discord used to skip these but the current version seems to have images for them The export would show broken images for random unicode sequences that the parser assumed were emoji but actually weren't. Yes, that matcher used to be much more greedy, which resulted in lots of false matches. The first bullet point above is trivial to correct, but the second and third are both a significant amount of hardcoding to fix a relatively small issue. I think at this point it's a matter of deciding how much is worth the effort. It really doesn't help that Unicode's official files are quite messy (and the one I linked isn't even ordered fully by codepoint despite claiming that it is). Attempting to render surrogate pairs that don't map to a valid unicode point.False positives for \u26** non-emoji such as ☭ or ✃.Specifically excluding ♂️, ♀️, and ♾️ from being rendered in IgnoredEmojiTextNodeMatcher (I believe Discord used to skip these but the current version seems to have images for them).The most glaring issue with the current matcher seems to be the omission of all the emoji that can be expressed in a single byte (except those from \u2600 to \u26FF). The unicode values for Discord's current version of emoji can be found here. Unfortunately, it turns out that emoji codepoints are a mess. The current emoji matcher appears to be very crude. Private static readonly IMatcher StandardEmojiNodeMatcher = new RegexMatcher ( (this does not match all emojis in Discord but it's reasonably accurate enough) Capture any country flag emoji (two regional indicator surrogate pairs)
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |