'Feels More Human', Say Users as Facebook Open-Sources Its Blender Chatbot

Venturebeat | April 30, 2020

  • FAIR claims that Blender, which is available in open source on GitHub, is the largest-ever open-domain chatbot.

  • Blender promises to make interactions with conversational AI systems like Alexa, Siri, and Cortana more natural than before.

  • To achieve Blender’s state-of-the-art performance, researchers at FAIR focused on two engineering steps: blending skills and generation strategy.

Facebook AI Research (FAIR), Facebook’s AI and machine learning division, today detailed work on a comprehensive AI chatbot framework called Blender. FAIR claims that Blender, which is available in open source on GitHub, is the largest-ever open-domain chatbot and outperforms existing approaches to generating dialogue while “feel[ing] more human,” according to human evaluators.

FAIR says Blender is the culmination of years of research to combine empathy, knowledge, and personality into one system. To this end, the underlying models — which benefit from improved decoding and skill blending techniques — contain up to 9.4 billion parameters (configuration variables that define skill on a given problem), or 3.6 times more than previous systems.

Blender promises to make interactions with conversational AI systems like Alexa, Siri, and Cortana more natural than before, whether in enterprise, industrial, or consumer-facing contexts. That’s because they’re able to ask and answer a wide range of questions; display knowledge about specific topics; and express sentiments like empathy, seriousness, or playfulness as circumstances dictate.


Blending skills and generation strategies

To achieve Blender’s state-of-the-art performance, researchers at FAIR focused on two engineering steps: blending skills and generation strategy.

“Blending skills” refers to selecting tasks that outperform larger models that lack tuning. As the FAIR researchers point out in a paper, chatbot improvements can be attained by fine-tuning models on data that emphasizes desirable conversational skills. As it turns out, tuning can also minimize undesirable traits learned from large data sets, such as toxicity.

With respect to generation strategy, the choice of decoding algorithm — the algorithm used to generate text from a language model — has an outsized impact on a chatbot’s responses. Because the length of a bot’s responses tend to correspond to human judgments of quality, decoders that strike an appropriate balance are desirable. Responses that are too short are typically perceived as dull or showing a lack of interest, while those that are too long imply waffling or distraction.

Over the course of these engineering steps, the researchers tested three types of model architectures, all of which used Transformers as a base. Transformers — a Google innovation — contain neurons (mathematical functions) arranged in layers that transmit signals from input data and adjust the strength (weights) of each connection, as with all deep neural networks. That’s how they extract features and learn to make predictions, but Transformers also have attention. This means every output element is connected to every input element and the weightings between them are calculated dynamically.

First up was a retriever model that, given a dialogue history (or context) as input, selected the next dialogue response by scoring a large set of candidate responses and outputting the highest-scoring one. The FAIR researchers employed a poly-encoder architecture that encoded features of the context using representations attended to by each candidate response, which they say resulted in improved performance while remaining “tractable” to compute, compared with other architectures, like cross-encoders.

The second model was a generator that produced responses rather than retrieving them from a fixed set. Three models were considered by size, ranging from 90 million parameters to 2.7 billion parameters to 9.4 billion parameters.

The third model attempted to address issues with the generator, namely its tendency to synthesize repetitive responses and to “hallucinate” knowledge. It took a “retrieve and refine” (RetNRef) approach, where the above-described retrieval model produced a response when provided a dialogue history, which was then appended to the input sequence of the generator. In this way, the generator learned when to copy elements of responses from the retriever and when not to so it could output more interesting, engaging, and “vibrant” responses. (Retriever models produce human-written responses that tend to include more vibrant language than standard generative models.)

The FAIR team paired a Wizard Generative model with another retriever that together determined when to incorporate knowledge into chatbot responses. The two models produce a set of initial knowledge candidates and then rank those candidates, after which they select a single sentence and use it to condition response generation. A classifier chooses whether to perform retrieval or not on a per-dialogue basis so as to avoid serving knowledge when it’s not required.




For the generative models, the FAIR researchers used a beam search decoder method to generate responses to given dialogue contexts. Beam search maintains a set of partially decoded sequences, called hypotheses, that are appended to form sequences and then scored so the best sequences bubble to the top.

To control the length of the chatbot’s responses, the FAIR team considered two approaches: a hard constraint on the minimum generation length and a classifier that predicted the length of responses and set the minimum generation length constraint to its corresponding prediction. The latter was more complex but resulted in variable-length responses to questions, ensuring the chatbot served long responses when they seemed appropriate.


Training the models

To prep the various models that make up Blender, the researchers first performed pretraining, a step that conditions machine learning models for particular tasks. They used Facebook’s own Fairseq, a toolkit that supports the training of custom language models, with data samples from a Reddit corpus containing 1.5 billion comments (with two sets of 360,000 comments each reserved for validation and testing) pruned for known bots, non-English subreddits, deleted comments, comments with a URL, and comments of a certain length.

Next, the FAIR team fine-tuned the models using another Facebook-developed suite — ParlAI — designed for training and testing dialogue models. One training corpus selected was ConvAI2, which contains 140,000 utterances involving paired volunteers getting to know each other by asking and answering friendly questions. Another was Empathetic Dialogues, which consists of 50,000 crowdsourced utterances grounded in an emotional situation. Yet another data set — the Wizard of Wikipedia — comprises 194,000 utterances of 1,250 topics, where each conversation begins with a randomly chosen topic and the goal is to display expert knowledge.

A fourth fine-tuning data set — Blended Skill Talk — aimed to blend the previous three sets (ConvAI2, Empathetic Dialogues, and Wizard of Wikipedia) to combine their respective skills during dialogue. Here, 76,000 utterances were collected with a guided and unguided human speaker, where the guided speaker could select utterances suggested by bots trained on the three individual data sets.



Post-training, the researchers evaluated Blender’s performance by comparing it with Google’s latest Meena chatbot, a machine learning model with 2.6 billion parameters. Human volunteers were tasked with answering two questions — “Who would you prefer to talk to for a long conversation?” and “Which speaker sounds more human?” — given 100 publicly released and randomized logs from Meena and the same number of logs generated by Blender. In each case, the volunteers were shown series of dialogues between humans paired with the respective chatbots.

The topics of conversation ranged from cooking, music, movies, and pets to yoga, veganism, instruments, and malls — with the Blender models often going into detail when asked and naming relevant stores, bands, movies, actors, pet species, and pet names. In one example, Blender offered a nuanced answer to a question about how Bach compared with Justin Beiber, while a request that Blender write a song indeed yielded lyrics — although nothing particularly poetic.

When presented with chats showing Meena in action and chats showing Blender in action, 67% of the evaluators said the best-performing Blender-powered chatbot — the one with a generative model containing 9.4 billion parameters pretrained on the Blended Skill Talk corpus — sounded more human. About 75% said they’d rather have a long conversation with the 2.7 billion-parameter fine-tuned model than with Meena. And in an A/B comparison between human-to-human and human-to-Blender conversations, the volunteers expressed a preference for models fine-tuned on Blended Skill Talk 49% of the time, while models trained only on public domain conversations were preferred just 36% of the time.

Problematically, further experiments showed that Blender sometimes produced responses in the style of offensive samples from the training corpora — mostly from Reddit comments. The FAIR researchers say that fine-tuning on the Blended Skill Talk data set mitigated this to an extent but addressing it comprehensively would require using an unsafe word filter and a kind of safety classifier.

We’re excited about the progress we’ve made in improving open-domain chatbots,” wrote Facebook in a blog post. “However, building a truly intelligent dialogue agent that can chat like a human remains one of the largest open challenges in AI today … True progress in the field depends on reproducibility — the opportunity to build upon the best technology possible. We believe that releasing models is essential to enable full, reliable insights into their capabilities.”

The pretrained and fine-tuned Blender models with 90 million parameters, 2.7 billion parameters, and 9.4 billion parameters are available on GitHub, along with a script for interacting with the bot (with safety filtering built in). All code for model evaluation and fine-tuning, including the data sets themselves, is available in ParAI.



Inc Magazine ranked Intetics among the fastest growing US private companies for the 8th time in 2015. Intetics went to the conference to celebrate.

Other News

Modulos Launches a Data-Centric AI Platform That Simplifies the Development of Trustworthy AI Applications

Modulos | May 23, 2022

Data-centric AI software company Modulos AG today announced the availability of its revolutionary data-centric AI platform. The platform enables companies to identify flaws in their data in a fraction of the time required by conventional data cleaning methods. These practical recommendations then help users build better AI/ML models based on the improved data. Recent studies of how data scientists spend their time regularly highlight that curating data and then manually inspecting and cleaning it can take up to 80% of their time. (Ref: hbr.org) These efforts by highly trained specialists lengthen the time and increase the cost of AI/ML projects. Even with all this human effort spent on improving data quality, only 13% of AI/ML applications make it into production. The Modulos platform recommendations can reduce the time spent on data cleaning and quality checks by pinpointing exactly which data samples most affect the performance of AI models trained with them. "The goal of data-centric AI is to shift the focus of AI development from fine-tuning models to curating better data. AI trained on flawed data can't result in accurate and trustworthy models. That's why most of the human effort in building AI systems should focus on data quality." -Kevin Schawinski, CEO of Modulos The European Union is currently working on an EU AI Act which will set the global standard for how AI products and services must be developed and brought to market. Amongst the key requirements of this Act is that the data used to train AI is high quality, complete and fair.

Read More


Perpetua launches first-to-market self-serve Amazon DSP optimization software unlocking an over 20% increase in performance

Perpetua | July 29, 2022

Perpetua, a leader in eCommerce advertising software, today becomes the first to offer clients a self-serve platform for creating, optimizing, and measuring Amazon DSP (Demand-Side Platform) ads. Designed to help unlock scalable revenue generation on the Amazon DSP for agencies and brand aggregators, Perpetua's Amazon DSP Optimization empowers users to build each component of a DSP order in a single, linear workflow. Paired with 11 pre-built audience templates embedded with industry best-practices and Perpetua's industry-leading optimization engines, advertisers can seamlessly create full Amazon DSP orders in seconds. "The Amazon DSP is incredibly effective at driving full-funnel growth for advertisers. "We saw an opportunity to help increase efficiencies for our largest customers by developing a self-serve solution that enables them to scale their business and drive growth." Perpetua Co-President Adam Epstein As the industry pivots into a post-cookie world, the Perpetua self-serve Amazon DSP software is the leading choice for advertisers looking to target audiences by lifestyle segments and shopping behavior. The platform provides unparalleled transparency, the capacity to empower advertisers to manage ads, bid real-time, track and optimize performance across Amazon properties. Highlights of Perpetua's Amazon DSP offering: Create audiences, generate creatives, and attach them to line items in seconds New 'Target Markets' creator reduces the time spent building custom audiences in an external workflow Full customization capabilities to target your intended audience, no matter how broad or narrow in scope Dynamically shift budgets between line items to maximize your performance and budget utilization ASIN-level reporting across Amazon Sponsored Ads and Amazon DSP with consolidated dashboards Access to advanced reporting via Amazon Marketing Cloud (AMC) About Perpetua Perpetua is building the growth infrastructure for eCommerce which includes optimization and reporting technology for the world's smartest eCommerce businesses. Through the platform, advertisers create goals based on strategy and leverage Perpetua's best in class experts and automation to execute tactically. Integrations with Amazon, Instacart and Google ensure brands achieve optimal reach and engagement across the full shopper journey, and provide unified performance intelligence for maximum visibility. Perpetua is an Ascential company and has offices in San Francisco, Toronto, London and Tokyo.

Read More


CallMiner Named Best Overall AI-based Analytics Company in 2022 AI Breakthrough Awards

CallMiner | June 28, 2022

CallMiner, the leading provider of conversation intelligence to drive business improvement, announced today that it has been named the Best Overall AI-based Analytics Company in the 5th annual AI Breakthrough Awards. CallMiner was recognized for its ability to help organizations analyze customer interactions as scale, such as the voice or text-based conversations that happen in an organization’s customer service center, to uncover insights and take action. These insights help companies improve customer experience, strengthen brand loyalty, increase operational efficiency, influence sales outcomes and more, ultimately driving business transformation and growth. Powered by machine learning algorithms and artificial intelligence (AI), the CallMiner platform can organize and bring value out of structured and unstructured data, including contact center conversations, chats, emails, social media, surveys and other customer interactions. Because understanding emotion within these conversations is increasingly important, CallMiner’s AI technology takes sentiment analysis one step further than the competition, identifying emotions within a conversation, such as surfacing moments when customers are stressed, angry or elated by marrying words and acoustic measures to form a complete picture of customer emotions in an interaction. “Organizations hold a massive amount of untapped data, particularly within their customer service and contact centers, which leaves meaningful insights on the table. “Yet, it’s impossible to uncover these insights with human power alone. Technology, like CallMiner’s AI-powered conversation intelligence platform, can deliver the right insights at the right time to both customers and internal stakeholders, enabling organizations to truly be customer-centric. Being recognized as the Best Overall AI-based Analytics Company by the AI Breakthrough Awards is further proof of CallMiner’s industry-leading AI capabilities and our ability to drive value for our customers.” Rick Britt, VP of AI at CallMiner The AI Breakthrough Awards recognize innovation across a range of AI and machine learning-related categories, including AI platforms, deep learning, business intelligence, natural language processing and industry-specific AI applications. Market intelligence analysts from AI Breakthrough evaluated nearly 3,000 nominations from around the world, reviewing, scoring and analyzing each entry to name the top performers. About CallMiner CallMiner is the global leader in conversation intelligence to drive business performance improvement. Powered by artificial intelligence and machine learning, CallMiner delivers the industry’s most comprehensive platform to analyze omnichannel customer interactions at scale, allowing organizations to interpret sentiment and identify patterns to reveal deep understanding from every conversation. By connecting the dots between insights and action, CallMiner enables companies to identify areas of opportunity to drive business improvement, growth and transformational change more effectively than ever before.

Read More


Veryfi Reports 750% Year-Over-Year Growth in AI-Driven Intelligent Document Processing Platform Use

Veryfi | August 17, 2022

Veryfi, using artificial intelligence (AI) technology to instantly transform documents into structured data, today announced that the company has seen 750% year-over-year growth in the Veryfi OCR API Platform usage. This follows a successful Intelligent Automation Week event in Chicago, where Veryfi announced its momentum in powering the world’s leading finance, ERP (enterprise resource planning), and accounting software providers. The growth of Veryfi’s OCR API Platform usage signals that companies are seeing better time-to-value with Intelligent Document Processing (IDP), rather than Robotic Process Automation (RPA) alone. Additionally, IDP solutions with pre-trained AI outperform home-grown solutions built using commodity OCR and AI tooling, in terms of accuracy and time-to-value. IDP delivers the most accurate data extraction technology, providing the fastest time to value and greatest efficiency. Over the past five years, Veryfi’s AI-Driven OCR API Platform with Day 1 Accuracy* was pre-trained on hundreds of millions of documents of all types, for 85 currencies, 39 languages, and 110 data fields. “Our AI-driven IDP solution is the first to promise Day 1 Accuracy with no humans in the loop, and our customers’ growing trust in our platform is a clear sign that we’re living up to that promise. “We’re absolutely thrilled that our customers are joining us on our mission to eliminate manual data entry, and are seeing increased efficiency, revenue, and time to value, starting on Day 1, with our platform.” Ernest Semerda, co-founder and CEO of Veryfi RPA isn’t an out-of-the-box AI solution; as a standalone solution, it requires implementation and training that dramatically delays a business' time to value. When RPA is coupled with Veryfi IDP, however, customers benefit from pre-trained AI that delivers Day 1 Accuracy and perfectly complements the enhanced automation capabilities of RPA solutions. With Veryfi, enterprises can accelerate financial document processing by 200 times, with significantly fewer errors than humans can achieve, and companies can more effectively comply with the increasing number of international security and privacy regulations. According to the Everest Group, “Many organizations are devoting more financial and human resources to deploy intelligent document processing capabilities. Success by forward-looking organizations is driving confidence in a market expected to grow 70-80% over the next two years to US $1.1 billion.” By eliminating manual data entry, Veryfi enables organizations to accurately capture, extract and transform documents such as receipts, invoices, purchase orders, checks, credit cards, and W-9 forms into structured data, at scale. Veryfi uses advanced AI/ML technology, trained by hundreds of millions of documents over the past four years, to extract data and transform it into a structured format for 85 currencies, 39 languages, and 110 defined fields such as vendor, total, bill to/ship to, purchase order and invoice numbers, any line item (product name, SKU, description), taxes, and more, which can then be accessed for a wide variety of business applications. About Veryfi Veryfi empowers organizations to capture, extract and transform unstructured documents including receipts, invoices, purchase orders, checks, credit cards, and W-2s into structured data at scale. The company’s technology reduces or eliminates manual data entry and unlocks valuable business intelligence in seconds. Trusted by enterprises and software companies alike, Veryfi’s AI-driven platform delivers fast, accurate, and secure data to hundreds of companies globally.

Read More