[bugfix] Extend parser to handle more non-Latin hashtags (#3700)

* Allow marks after NFC normalization

Includes regression test for the Tamil example from #3618

* Disallow just numbers + marks + underscore as hashtag
This commit is contained in:
Vyr Cossont
2025-01-31 02:42:55 -08:00
committed by GitHub
parent ab758cc233
commit b9e0689359
5 changed files with 48 additions and 37 deletions

View File

@@ -177,7 +177,7 @@ func (p *hashtagParser) Parse(
// Ignore initial '#'.
continue
case !isPlausiblyInHashtag(r) &&
case !isPermittedInHashtag(r) &&
!isHashtagBoundary(r):
// Weird non-boundary character
// in the hashtag. Don't trust it.