How does it perform on really messy datasets, like social media comments with emojis, slang, or mixed languages? | discoverkit | discoverkit