Exploring the effect of experience on a recursive neural network model of structural preferences

Fabrizio Costa,1 Paolo Frasconi,1 Patrick Sturt2 & Vincenzo Lombardo3
University of Florence, 2 University of Glasgow, 3 University of Turin



Experience-based theories of structural disambiguation preferences (e.g., Mitchell et al, 1995) claim that people disambiguate in a way which is consistent with their past experience of syntactic configurations.  If this is correct, it is important to ask which features of linguistic experience are used to inform people's disambiguation preferences.  One way to consider this question is to test the effect of different linguistic "experiences" on an explicit model, thus generating predictions for human sentence processing.

We have previously presented a recursive neural network model, which is trained to disambiguate by recognising the correct partial tree (henceforth "incremental tree") spanning the sentence from the first word to the current word, given a (usually very large) set of alternatives generated from a large-scale treebank grammar.  The model has been shown to capture some well-known structural preferences in human parsing.  Here we test the effect of linguistic experience on this model.

Experiment 1.  One network was trained on full incremental trees from a large treebank sample.  Another was trained on reduced trees from the same sample, from which we removed all nodes not c-commanding the right frontier of the incremental tree.  The tree reduction had no adverse effects on disambiguation performance; in fact, performance improved (81.2% vs. 85.0% correct choice).  This shows that the model does not use information which is deeply embedded beyond the right frontier (c.f. Right Roof Constraint; Ross, 1967).

Experiment 2.  Syntactic disambiguation requires choosing the correct attachment site (anchor), and the correct tree fragment (connection path) to connect the current word with the previous incremental tree.  Experiment 2 showed that the network has very high accuracy in anchor prediction (91.5%), and that, given the correct anchor, its performance can almost be matched by choosing the most frequent connection path.  Thus, the network appears to give priority to its choice of anchor over its choice of connection path.

Experiment 3.  Several networks were trained, each on a sample of text which was identical apart from the relative frequencies of high and low relative clause attachments (0% to 100% low attachment).  The resulting networks were tested on a sample of unseen relative clause ambiguities from the treebank.  The different biases clearly affected the network's preferences, although there was also an underlying low-attachment bias; even the network trained with 50% high and 50% low attachments showed a reliable low attachment preference.  A further network, trained on a sample with no relative clause ambiguities, also showed a reliable low-attachment bias.  Thus, the network exploits underlying biases as well as experience of specific examples, and it is able to generalize from experience to process novel ambiguities.