Technology assessment on the current major chatbot offering

hyang · February 7, 2019, 10:45pm

I spent some time today studying documentations of chatbot offerings of a few major technology vendors: Amazon Lex, Facebook Messenger, Google DialogFlow, IBM Waston, and BotKit recently acquired by Microsoft. My ranking of the above platform in term of technical capability is the following: IBM Waston > Microsoft Botkit > Google DialogFlow = Amazon Lex > Facebook Messenger.

My assessment is purely based on the documentations. Since I have not tried any of the listed systems, you need to take my opinion with a giant grain of salt. An additional disclaimer: I am one of the main developers of Juji, a chatbot creation platform, so I will comment on the comparison with Juji wherever appropriate.

Agency

The first technical capability I am looking for is what I call agency, i.e. is the platform for building real agents that have their own execution loops? There is an important distinction between bots that just respond to user input or external events and those who run its own execution loops. The former share a system event loop and is reactive, and the later lets each agent to have its own execution loop, hence can potentially be proactive, i.e. acting on their own or have their own mind, so to speak. It is the later that should be regarded as real agent.

Among the listed technologies above, only IBM Waston bots seem to have agency. In Waston bot, the dialog is defined as trees of nodes, where each node is a production rule (i.e. If-Then). The developer does not control the execution of the dialog, but the agent itself runs an execution loop to go through these trees and act accordingly. In principle, this enables the agent to be proactive. I do not know if IBM Waston actually does this proactive firing of rules in practice, but this is precisely how Juji bot works. Each Juji bot actually has two execution loops, one reactive and another proactive, that run simultaneously, so the bot may speak any time on its own, not just react to user input. The rest of the offerings are either deployed as Web hooks (DialogFlow, Lex), or run as callback handlers (Botkit). These are all reactive bots that cannot act on their own.

Abstraction

The second technical aspect I examine is the abstraction of conversation embedded in the system. One factor I look at is the concept of natural language understanding (NLU) used. The majority of NLU systems are modeled on a concept of intent, referring to what user wants to do. This reflects a fundamental bias of these systems that were originated from academic researches, where the goal is to help users to accomplish certain tasks, hence it is central to understand the intent of user utterances.

I regard the reliance on intent as a severe limitation, because human-bot conversation may not be about user’s intent at all. For example, what about the bot’s intent? Considering only user’s intent limits the application of bots to some boring application domains such as customer support, question answering, internet of things, or e-commerce, where user ask questions that bots try to answer or speak their wishes that bots try to fulfill. For more interesting applications such as marketing research, job interviews, gaming characters, customer on-boarding, educational companion, mental health assistant, and so on, the chatbots need to have their own agenda, which the often used intent concept simply does not cover. Among the systems, the only exception is BotKit, where intent is not explicitly hard coded in the system. On this front, Juji goes a step further, Juji bot can have complex and explicit agenda that go beyond either user or bot’s simple intent.

Another factor in the conversation abstraction is the unit of dialog considered in the system. Most systems treat a turn as the basic unit of conversation. This is too granular, because the developers of the bot then have to think of all the possible user utterance at each turn and respond to them accordingly. This is a task not very suitable for a developer to do and the system should give them as much help as possible. BotKit is again the only solution that introduces a higher level concept. BotKit has a concept of a thread, which handle a sequence of turns. However, this is not good enough, because a thread can only be executed, and the only thing one can do with them is to jump among threads.

Juji’s abstraction is called a topic, which may have zero, one or multiple turns in them. Most importantly, topic is the first class citizen in Juji platform, where one can create a topic on the fly, pass arguments to it, pass a topic around, look up a topic, and do all kinds of things with them. This flexibility enables Juji to supply a large library of reusable mini-conversations (represented as topic of course) that users can simply compose into a full bot. Developers are largely alleviated from the burden of trying to anticipate user’s next input, because Juji has many reusable topic that handle all kinds of user digressions and dis-behaviors that a developer is not well equipped to anticipate.

NLP

With the popularity of deep learning (DL) and machine learning (ML) technology, it is not surprising that most of the systems have natural language processing (NLP) capabilities based on them. In my opinion, pursuing competitive advantage in raw NLP performance alone is rather a futile exercise, because these technology are rapidly commoditized and the differences among vendors are minimum. However, these are the must haves in a chatbot platform. As far as I can tell, BotKit is the only one that does not currently have these DL/ML based NLP capabilities, but I am sure Microsoft will remedy that soon enough.

The real competitive advantage is the ease and speed with which a new NLP model can be deployed in production. Here DialogFlow and Lex seems to be very capable, as they are effectively plumbing mechanism of data flows, so new NLP models should be easy to be plugged in. The NLP integration story of Waston and Facebook (wit.ai) is not as clear because NLP capability seems to be part of the system, which is actually a weakness, because one wants to iterate on these often and fast. Juji has a complete story on NLP integration. One can either use Juji’s built-in NLP models, run one’s own code in Juji’s sandbox, or call out to third party code easily.

In summary, I consider agency to be the most important aspect of building AI. Without agency, there is no real AI, therefore Waston is ranked the first among the major technology vendors. Proper abstraction is also very important for building conversational agent, BotKit is on the right path, hence the second.

NLP should be comparatively the least important factor when one consider choosing a chatbot platform, because the distinction in NLP is minimum at best and ephemeral at least: all the good ideas are published as papers and everyone can implement them. Also major vendors all freely release their data sets publicly, because NLP is none of their core businesses, so they have all chosen to commoditize the complement.