In the context of my master’s thesis, I developed a deep learning model to predict natural language inference (NLI, also called textual entailment). In NLI, the objective of a model is to analyse pairs of sentences and to predict whether the first one, called premise, entails the second one, called hypothesis. A sentence entails another if all of the information contained in the second is also true in the first, that is, if the information contained in the premise is a superset of that in the hypothesis. In addition to entailment, a NLI model also needs to be able to predict contradiction between sentences, and to detect when they are unrelated (that is, when they do not entail nor contradict each other, a situation labelled neutral).

Below are typical examples of premises, hypotheses and labels that could be fed to a NLI model to train or test it:

premise: Two boys are playing football in a field.
hypothesis: Children are playing outside.
label: entailment

premise: A man throws a frisbee to his dog.
hypothesis: A man and his dog are laying on the couch.
label: contradiction

premise: Men are sitting at a table.
hypothesis: People are playing cards.
label: neutral

In order to properly predict NLI, a model needs to be able to “understand” the meaning of sentences, with all of their specificities and subtleties. This is especially difficult, as natural language is complex and often ambiguous. However, thanks to recent advances in deep learning and natural language processing, a plethora of new models have been proposed for NLI with impressive degrees of success. One of these models is the Enhanced Sequential Inference Model (ESIM), introduced by Chen et al. in their 2016 paper.

In my thesis, I selected the ESIM model as a baseline and implemented it with PyTorch. I then modified the model to include two additional lexical entailment metrics in it, computed with the Word2Hyp and LEAR word embeddings, and called the resulting neural network LEAN, for Lexical Entailment Augmented Network.

After training and testing the two models on the SNLI and MultiNLI data sets, I observed an improvement in classification accuracy of up to 0.4% on SNLI and 1.5% on MultiNLI with the LEAN model, compared to the ESIM baseline.

More information about this project can be found in my master’s thesis, available here. I also wrote a complete literature review on the subject of NLI, which is basically an extended version of the related works in my thesis. Finally, the implementations of the ESIM and LEAN models can be found on my Github page.