I recently read a post by Ben Afia asking the community for approaches regading how to handle the scenario's where artificial intelligence within the context of NLP falls down when it gets to the emotional meaning behind language.
In Ben's post, he highlighted the point that British people tend to communicate in a passive-aggressive manner. To give you context, lets look at the sentence “That’s very interesting”. If you say this to a Swede, the Swede would probaly think – “He’s impressed” yet a British person though would probally assume “I don’t like it”. Same sentence, different context. In this article i am not explaining the use of simple entity mapping in the dispactching model that allows us to map multiple words to one entity. For example you could use the dispatcher to map all instances you get of hood to trunk which would work for this use case. This should be your first option in most cases.
Given that we should all be striving to build more human, customer focused brands and so by extension, Chatbots supporting the brand, Ben asked what approaches chatbot developers take to solve this? I found this interesting as i have come accross similiar scenarios in NLP before. The requirement, as I understood it, is to understand the context of a sentence. As he already identified, though, the locale could have a dramatic variance on the meaning. In my humble opinion, the key to remember is that this is only for some sentences. Many times the context may not differ at all. In Ban's use case, he's sentences are in English, so i always start with the ISO 639-1 standard language code which would be "en" for English.
For my fictitious use case, lets assume our client is a gym and to start i am assuming the question I need to understand is "Where can I keep my personal belongs when I go to the gym?" The client has told us that the company policy is to lock them up in your trunk or bonnet, so the response seems simple until you deal with the issues Ben highlighted.
Let's take the sentence "Lock your personal belongings in the trunk", which is what the company policy is. Except I grew up in South Africa, and my response would be "Do I look like an Elephant? I don't have a trunk". In South Africa, its called a boot. The variances are what we are trying to map here. Assuming you are an inclusive company, then you can't just say "To hell with those pesky South Africans" so what do you do?
A first take may lead us to create a model to understand all other English countries and then create a separate model for South Africa. Problem solved, right? Sadly, No, because what happens when you find out that in America, a car hood is a car bonnet etc? Now you need three NLP models etc. If you go down this path, you land up with a lot of duplication. In no time at all, this becomes unsustainable.
So to solve this, I personally chose to implement a dispatcher. The dispatcher does NOT try to work out the context. It aggregates all three models responses. This model takes our sentence input (Or output) and passes it to our base model and the two variance models we identified. In our case, UK English is our base model. The dispatcher then uses all the LUIS (Language Understanding) models (Base, USA, SA) to determine which model best matches the user input or output.
If you take the request, "What does an international gym membership cost?". The client has told us its 100 US regardlles of where you live. Therefore, the base model gives us a very high probability that we want the cost of an international membership, and so we can deduce the user wants to know its 100 dollars (I'm assuming the company has one global flat rate for simplicity). The other two models (SA and USA) would provide a very low probability because it was never trained to understand the sentence composition, given that we never identified a variance based on our training data. Therefore the action we perform is based on the highest percentage match, and the SA and USA model responses would be discarded in this instance.
Next take the request "Do I need to lock my personal belongings in my hood?". When the dispatcher passes this to the three models, in this instance, the base model would only understand that we want to know how to lock up something. It would not understand a hood as it doesnt have any understanding for the word "hood" trained in the base and SA model. The USA model would understand "hood" though, so its percentage match is higher. Mapping the Base model and the USA model responses together, give us a far better understanding of the actual context.
At this point, all the logic is mapped to variances in the sentences, but as you say, the nuances would not be picked up if the sentences are the same and as such NLP falls short. A example of this approach works for us is to look at the question "How much is a Gym membership?". Our cleint has given us a price list of $100 in the USA, £80 in the UK and R1000 in South Africa. Simply passing this sentence to the base would not enable us to give the correct response to the client tough, because our reply of "Membership costs £80 a month, due on the first." would not be correct if you were a client in another country other than the UK.
In our case, we use the dispatcher to collect other inputs too and to pass variables like language code into the dispatcher. So lets assume we called the call centre from the USA or or we were browsing from the USA. In this case, the dispatcher passes in "en-us", and as both the base and the USA model were both trained to understand the context, the matching values for both models would be very close. To help in these instances were the models matches are both fairly confident they match, we use the ISO codes to be the judge which one to use for the correct context.
Of course, this doesn't solve the issue of Homonyms which are words that have the same spelling and pronunciation, but different meanings. A great example is left ( the verb) and left (the noun). Bet you didn't think of that Ben.
I would be curious to see how other developers approach this so feel free to leave comments.