Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm more than willing to forgive ChatGPT for wrong readings of Japanese names.

Even common names can have non-standard readings. Native Japanese folks mix up readings for names, and need kana sometimes. Personally, I've never yet encountered any Japanese names beyond extremely common ones that I've not at least become suspicious whether I'm reading them correctly.

So I don't think ChatGPT's bad in that regard, if it can at least offer a few possible suggestions. The readings of names can be so damn arbitrary.



You're right about Japanese names in general, but current LLMs' mistakes can go far beyond the range of possible readings. I've seen errors on the level of 田中 (Tanaka) being rendered as Suzuki--in other words, one common name being replaced with another.

I'm sure this problem can be solved. The linked article suggests a promising approach--more and better Japanese data.


Arbitrary example: 上川. It can be both うえかわ or かみかわ (or even other variants). Which one is it? Depends where the family is originally from, I guess.

It's not only people's name, it's also place names. Example: https://ja.m.wikipedia.org/wiki/%E5%85%AB%E5%B9%A1




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: