throwawuyar3231's comments

throwawuyar3231 · on June 11, 2017

> geometric meaning theories

Apologies, I'm an outsider to the field, but what exactly are you referring to here ? The whole vector-space semantic embedding that was popularized by works like word2vec ?

throwawuyar3231 · on June 11, 2017

I have to wonder if English is really the best language for NLP research. Things like the Winograd schemas which have attracted a lot of attention simply aren't possibilities in other languages.

Why not start working with more structured agglutinative * languages like Japanese/Korean and Indic family (Sanskrit esp.) .

How about other European languages ? Are they better structured empirically ? I hear German is very grammatical, and that Hungarian is ... erm odd ?

( Note: I know occidental tradition likes to split Indic tongues, and Indo in Indo-European is not considered agglutinative. I don't subscribe to this view. I use agglutinative in the sense of Panini: "particles" sticking to stems/roots/words - phonetic modifications are irrelevant for grammar.)

lgessler · on June 11, 2017

> I hear German is very grammatical, and that Hungarian is ... erm odd ?

Just want to point out that "grammatical" probably isn't the word you want here. Every language is grammatical by definition in the sense that there are rules that govern its sound system, word formation system, syntax, etc.

The concept you're getting at, though--that some languages are easier for computer programs and/or speakers of Indo-European languages to understand--is sound.

mark_edward · on June 12, 2017

do you think analytic would be a good term here? i heard mandarin is very analytic language, maybe that could be a good choice

WorldMaker · on June 12, 2017

"Regular" would be the classic linguistics term, would it not? Although computer science limits the term to the use of regular languages in the Chomsky hierarchy sense (that is, more specifically to regular expressions and the languages they describe), I am under the impression linguistics as a whole treats regularity as a multivariate spectrum. Some languages have more regularity in terms of grammar productions or morphology than English.

mark_edward · on June 12, 2017

i meant analytic in this sense of the word https://en.wikipedia.org/wiki/Analytic_language

I don't know too much about computational linguistics but it seems highly analytic languages could be easier to work with, but I'm not sure.

WorldMaker · on June 13, 2017

That points to Isolating [1] and I think highly isolating may be the more useful distinction to this specific example. (Modern English is rather analytic, having dropped most, but not all, inflections in the Middle English era. Mandarin Chinese is much more isolating than Modern English.)

[1] https://en.wikipedia.org/wiki/Isolating_language

vedant · on June 11, 2017

One reason is that the amount of training data is many many orders of magnitude smaller.

FWIW it seems the structure you're talking about exploiting is at a morphological and syntactic level, which modern language models tend to effectively handle. Semantics are a much harder problem.

mchaver · on June 12, 2017

> Things like the Winograd schemas which have attracted a lot of attention simply aren't possibilities in other languages.

I do not think that is correct. Anaphora exists in many languages. Check out the Anaphora article on wikipedia and click on different language versions. There are example sentences for many languages.

https://en.wikipedia.org/wiki/Anaphora_(linguistics)

There are translation for the Winograd Schemas into a couple of languages. Granted I found some of the translations a little unnatural in some cases but they are still understandable and expose the problem.

http://www.cs.nyu.edu/faculty/davise/papers/WinogradSchemas/...

http://arakilab.media.eng.hokudai.ac.jp/~kabura/collection_j...

http://www.llf.cnrs.fr/winograd-fr