Google rolls out Open Images V6 with localized narratives to strengthen AI

silicon angle | February 27, 2020

Google rolls out Open Images V6 with localized narratives to strengthen AI
  • Google rolled out a new version of Open Images that adds millions of additional data points.

  • Along with human motion annotations, and image-level labels, it has added a new type of multimodal annotations called localized narratives.

  • With these new additions, it has created localized narratives for about 500,000 Open Images files so far and it is expected that v6 will further stimulate progress towards genuine scene understanding.

Today, Google’s Open Images corpus for computer vision tasks got a boost with new visual relationships, human action annotations, and image-level labels, as well as a new form of multimodal annotations, called localized narratives. It expands the annotation of the Open Images dataset with a large set of new visual relationships (e.g., “dog catching a flying disk”), human action annotations (e.g., “person jumping”), and image-level labels (e.g., “paisley”). In Open Images V6, these localized narratives are available for 500k of its images. Additionally, in order to facilitate comparison to previous works, Google also releases localized narrative annotations for the full 123k images of the COCO dataset.

Google says this last addition could create “potential avenues of research” for studying how people describe images, which could lead to interface design insights (and subsequent improvements) across the web, desktop, and mobile apps.

Learn more: Google Cloud AI removing the ability to label people in images based on gender

In 2016, Google introduced Open Images, a data set of millions of labeled images spanning thousands of object categories. Major updates arrived in 2018 and 2019, bringing with them 15.4 million bounding-boxes for 600 object categories. It is the largest annotated image dataset in many regards, for use in training the latest deep convolutional neural networks for computer vision tasks. With the introduction of version 5 last May, the Open Images dataset includes 9M images annotated with 36M image-level labels, 15.8M bounding boxes, 2.8M instance segmentations, and 391k visual relationships. Along with the dataset itself, the associated Open Images Challenges have spurred the latest advances in object detection, instance segmentation, and visual relationship detection.

“Along with the data set itself, the associated Open Images challenges have spurred the latest advances in object detection, instance segmentation, and visual relationship detection.”

Jordi Pont-Tuset, a research scientist at Google Research.

As Pont-Tuset explains, one of the motivations behind localized narratives is to leverage the connection between vision and language, which is typically done via image captioning (i.e., images paired with written descriptions of their content). But image captioning lacks visual “grounding.” To mitigate this, some researchers have drawn bounding boxes for the nouns in captions after the fact — in contrast to localized narratives, where every word in the description is grounded.


The localized narratives in Open Images were generated by annotators who provided spoken descriptions of images while hovering over regions they were describing with a computer mouse. The annotators manually transcribed their description, after which Google researchers aligned it with automatic speech transcriptions, ensuring that the speech, text, and mouse trace were correct and synchronized.

“Speaking and pointing simultaneously is very intuitive, which allowed us to give the annotators very vague instructions about the task. We hope that it will further stimulate progress toward genuine scene understanding, ” said Pont-Tuset.

Speaking and pointing simultaneously is very intuitive, which allowed Google to give the annotators very vague instructions about the task. This creates potential avenues of research for studying how people describe images. For example, observe different styles when indicating the spatial extent of an object — circling, scratching, underlining, etc. — the study of which could bring valuable insights for the design of new user interfaces.

“To get a sense of the amount of additional data these localized narratives represent, the total length of mouse traces is ~6400 km long, and if read aloud without stopping, all the narratives would take ~1.5 years to listen to!”

New Visual Relationships, Human Actions, and Image-Level Annotations

In addition to the localized narratives, in Open Images V6 we increased the types of visual relationship annotations by an order of magnitude (up to 1.4k), adding for example “a person riding a skateboard”, “persons holding hands”, and “dog catching a flying disk”.

Learn more: An algorithm from Google where robots can detect the transparent object

People in images have been at the core of computer vision interests since its inception and understanding what those people are doing is of utmost importance for many applications. That is why Open Images V6 also includes 2.5M annotations of humans performing standalone actions, such as “jumping”, “smiling”, or “laying down”. As Google images are contributing to healthcare as well, recently, the company confirmed its AI can detect skin diseases with accuracy comparable to dermatologists.

In short, Open Images V6 is a significant qualitative and quantitative step towards improving the unified annotations for image classification, object detection, visual relationship detection, and instance segmentation, and takes a novel approach in connecting vision and language with localized narratives. Let’s hope that Open Images V6 will further stimulate progress towards genuine scene understanding.


Augmented Intelligence is in everything from coffee makers to smartphones these days. Why should AI be a differentiator in Business Intelligence? We’re glad you asked! AI, found natively in IBM Cognos Analytics, serves two key functions.


Augmented Intelligence is in everything from coffee makers to smartphones these days. Why should AI be a differentiator in Business Intelligence? We’re glad you asked! AI, found natively in IBM Cognos Analytics, serves two key functions.

Related News

DefinedCrowd Rebrands as, Reflecting Expanded Position as a Developer Platform for Artificial Intelligence , the leading provider of data, models and tools for Artificial Intelligence, today announced a rebranding in response to continued company growth and the evolution of the development and application of Artificial Intelligence, impacting companies in all sectors from healthcare and retail to finance and consumer goods. At the center of this rebranding is a change of the company name to and an update to the corporate logo and tagline. With the product suites now folding under one name, the company can continue to scale the business to new heights, moving beyond a resource for crowd-sourced data gathering to a comprehensive AI platform and marketplace to embrace and empower a new era of AI development. While will continue to offer its existing services, including custom collection, data crowdsets, and white-glove support, the product brands DefinedData, DefinedWorkflows, DefinedSolutions and DefinedCrew will all merge under the cohesive umbrella. As companies look to invest in AI technology, development teams have increasingly realized how critical AI models with robust and diverse datasets are to building a good product. If AI is built on data that is representative of the populations it is serving, the end result will only be more successful and have a higher degree of consumer engagement and support. The platform gives AI model builders a place to create and trade those tools and datasets that are necessary to develop successful AI models that drive key business goals. Developers can purchase datasets, through subscriptions or one-off transactions, sell their own datasets as third-party vendors, or request highly specialized, custom datasets built by the team. "Today marks a major milestone for, as AI technology continues to be incorporated into every aspect of our lives, both in the tech stack and geographically. Our team is reshaping the way the AI industry innovates, by changing the way AI developers build and add to the value chain of AI, starting with the data. By advocating for ethical and bias-free models, and enabling the world with the tools that will make their products as inclusive as possible, the team is creating the future in real time. We're setting the standard and promoting the already rapid evolution, adoption, and application of AI technology." Daniela Braga, CEO and Founder, About is on a mission to enable the creators of the future. At, we believe AI should be created as we raise our children, with the responsibility to make it the best version possible, to be fair, kind, inclusive and to strive for a better world. That's why we provide high-quality AI training data, tools, and models to the creators of the future. We offer data scientists the solutions to get it just right, from datasets to bootstrap their models which keep their projects moving, to the final tuning in domains and perfection in accents and phonetics. We host the leading AI marketplace, where data scientists can buy and sell off-the-shelf datasets, tools and models, and we provide customizable workflows that can be used to generate datasets tuned to their needs. And, because the future of AI is complicated, can also offer professional services to help deliver success in complex machine learning projects.

Read More


Uniphore Announces “Uniphore Unite” Partner Program to Accelerate Global AI and Automation Innovation

Uniphore, the leader in Conversational Automation, today announced its Uniphore Unite partner program to support a rapidly expanding market that is seeing the benefits of using Artificial Intelligence (AI) and automation technology to significantly improve customer experience (CX). Uniphore Unite is a robust partner program that includes essential resources to support the partner lifecycle end-to-end and enables partners to leverage Uniphore’s best-of-breed, innovative technology to expand their portfolio and profitability. Uniphore provides a unique value proposition that combines improved CX along with a great return-on-investment, increasing customer satisfaction while driving cost savings. Customers can now view and take advantage of the services expertise, capabilities, and complementary technology of the partners in Uniphore’s Unite program to achieve these returns. “Uniphore has always been committed to building a robust partner ecosystem to support our customers. With the launch of Uniphore Unite, we enhance the value of our industry-leading AI and automation solutions by partnering with world-class services and complementary technology firms. Uniphore Unite provides structure and foundation for enhanced partner collaboration and will facilitate the creation of a strong community built around the mission to transform CX across the board.” Jafar Syed, SVP, Global Head of Channel Alliances & Partnerships at Uniphore Uniphore Unite offers a range of programs to support each partner’s business model, including referral, resell, managed services, co-selling, and services, delivering the resources this global community needs for success. There are three program levels in the reseller and Business Process Outsourcer (BPO) programs, providing support for partners of all sizes: Uniphore Reseller/Unite BPO: Unite’s entry level that allows new partners to ramp up, build skills and drive increased revenue Unite Pro: For companies who have established a relationship with Uniphore and participated in key sales and technical training Unite Pro+: Designed for organizations that have developed a strong partnership with Uniphore and are consistently collaborating on sales, marketing and training opportunities Partners who join Uniphore Unite will benefit from the program in numerous ways, including: Significant Partner Resources: The initial package of partner resources includes sales training, technical training and support, dedicated channel teams, deal registration and co-selling, marketing and sales assets and support, and a comprehensive rewards program Partner Helpdesk: The Partner Help Desk will be available to all Unite members to provide world-class support via web conferencing, email and phone Marketing Development Funds (MDF): The Uniphore Unite MDF program provides not only funding but also access to an experienced global marketing agency to assist our partners in planning, messaging, positioning, demand generation and other go-to-market activities Partner Advisory Council: The advisory council enables strategic partners to easily give direct feedback and to engage consistently with key members of the Uniphore team to build a strong partner community App Alliances Program: This complementary ISV program includes benefits around co-selling and positioning our solutions with these partners The launch of Uniphore Unite is yet another milestone indicative of Uniphore’s accelerating momentum in the market. In addition to its latest $150M Series D funding that was announced in March 2021, Uniphore has announced numerous product innovations and two acquisitions so far this year – the acquisition of Emotion Research Labs and Jacada. With the acquisition of Jacada, Uniphore is the leading vendor that can truly deliver front and back-office automation across every customer and agent interaction by optimizing every conversation and delivering it in a simplified, business user friendly UX environment and desktop. Uniphore Unite will enable the company’s global partners and their customers to take full advantage of this innovative platform. About Uniphore Uniphore is the global leader in Conversational Automation. Every day, billions of conversations take place across industries — customer service, sales, HR, education and more. Whether they are human to human, human to machine or machine to machine, conversations are at the heart of everything we do, and the new currency of the enterprise.

Read More


The NL API Now Available in AWS Marketplace announced today that its natural language (NL) API providing deep language understanding is now available in the AWS Marketplace, a digital catalog with thousands of software listings from independent software vendors that make it easy to find, test, buy, and deploy software that runs on Amazon Web Services (AWS). The NL API is a powerful way to structure unstructured language data leveraging deep language intelligence with minimal effort. The API identifies which meaning of a word is used in context ("disambiguation") to quickly analyze text for key elements, relations, classifications and more. It can also determine sentiment and even capture a range of 117 behavioral and emotional traits, providing the richest, most comprehensive and granular emotional and behavioral taxonomy available throughout the AI-based API ecosystem. Furthermore, using built-in technologies and its extensive knowledge graph, the NL API can be used in more targeted ways to identify sensitive data (to protect customers, victims, users or research subjects, as well as to comply with data privacy regulations), media-related topics, geographical taxonomies and more. "At, we aim to make it easy for developers and data scientists to design, build and test NL-aware functions and easily embed advanced natural language understanding and natural language processing capabilities into their apps. The availability of our NL API in the AWS Marketplace expands this opportunity to more users: we are excited to offer all of the insights the NL API provides to enrich business data, understanding it in less time, at scale and in the most precise way." Brian Munz, product manager, NL API & developed experience at AWS customers can quickly begin extracting insight from their unstructured language data by using the NL API with their existing AWS account. The NL capabilities can be accessed via two feature options: Core Bundle which includes semantic analysis, part-of-speech tagging, morphological analysis, text subdivision, dependency parsing, lemmatization, named entity recognition, key phrase extraction, relation extraction. Premium Bundle that includes sentiment analysis, IPTC media topics, geographic, emotional traits and behavioral traits taxonomies, personally identifiable information (PII) detection and writeprint for performing a stylometric analysis of business documents. About is the premier artificial intelligence platform for language understanding. Its unique approach to hybrid natural language combines symbolic human-like comprehension and machine learning to extract useful knowledge and insight from unstructured data to improve decision making. With a full range of on-premises, private and public cloud offerings, enhances business operations, accelerates and scales natural language data science capabilities while simplifying AI adoption across a vast range of industries, including insurance, banking & finance, publishing & media, defense & intelligence, life science & pharma, and oil, gas & energy. has cemented itself at the forefront of natural language solutions and serves global businesses such as AXA XL, Zurich Insurance Group, Generali, The Associated Press, Bloomberg INDG, BNP Paribas, Rabobank, Gannett, and EBSCO.

Read More