Google rolls out Open Images V6 with localized narratives to strengthen AI

Sujata Bondge | February 27, 2020

Google rolls out Open Images V6 with localized narratives to strengthen AI
  • Google rolled out a new version of Open Images that adds millions of additional data points.

  • Along with human motion annotations, and image-level labels, it has added a new type of multimodal annotations called localized narratives.

  • With these new additions, it has created localized narratives for about 500,000 Open Images files so far and it is expected that v6 will further stimulate progress towards genuine scene understanding.


Today, Google’s Open Images corpus for computer vision tasks got a boost with new visual relationships, human action annotations, and image-level labels, as well as a new form of multimodal annotations, called localized narratives. It expands the annotation of the Open Images dataset with a large set of new visual relationships (e.g., “dog catching a flying disk”), human action annotations (e.g., “person jumping”), and image-level labels (e.g., “paisley”). In Open Images V6, these localized narratives are available for 500k of its images. Additionally, in order to facilitate comparison to previous works, Google also releases localized narrative annotations for the full 123k images of the COCO dataset.
 

Google says this last addition could create “potential avenues of research” for studying how people describe images, which could lead to interface design insights (and subsequent improvements) across the web, desktop, and mobile apps.
 

Learn more: Google Cloud AI removing the ability to label people in images based on gender
 

In 2016, Google introduced Open Images, a data set of millions of labeled images spanning thousands of object categories. Major updates arrived in 2018 and 2019, bringing with them 15.4 million bounding-boxes for 600 object categories. It is the largest annotated image dataset in many regards, for use in training the latest deep convolutional neural networks for computer vision tasks. With the introduction of version 5 last May, the Open Images dataset includes 9M images annotated with 36M image-level labels, 15.8M bounding boxes, 2.8M instance segmentations, and 391k visual relationships. Along with the dataset itself, the associated Open Images Challenges have spurred the latest advances in object detection, instance segmentation, and visual relationship detection.

“Along with the data set itself, the associated Open Images challenges have spurred the latest advances in object detection, instance segmentation, and visual relationship detection.”

Jordi Pont-Tuset, a research scientist at Google Research.


As Pont-Tuset explains, one of the motivations behind localized narratives is to leverage the connection between vision and language, which is typically done via image captioning (i.e., images paired with written descriptions of their content). But image captioning lacks visual “grounding.” To mitigate this, some researchers have drawn bounding boxes for the nouns in captions after the fact — in contrast to localized narratives, where every word in the description is grounded.

 

The localized narratives in Open Images were generated by annotators who provided spoken descriptions of images while hovering over regions they were describing with a computer mouse. The annotators manually transcribed their description, after which Google researchers aligned it with automatic speech transcriptions, ensuring that the speech, text, and mouse trace were correct and synchronized.
 

“Speaking and pointing simultaneously is very intuitive, which allowed us to give the annotators very vague instructions about the task. We hope that it will further stimulate progress toward genuine scene understanding, ” said Pont-Tuset.
 

Speaking and pointing simultaneously is very intuitive, which allowed Google to give the annotators very vague instructions about the task. This creates potential avenues of research for studying how people describe images. For example, observe different styles when indicating the spatial extent of an object — circling, scratching, underlining, etc. — the study of which could bring valuable insights for the design of new user interfaces.
 

“To get a sense of the amount of additional data these localized narratives represent, the total length of mouse traces is ~6400 km long, and if read aloud without stopping, all the narratives would take ~1.5 years to listen to!”
 

New Visual Relationships, Human Actions, and Image-Level Annotations


In addition to the localized narratives, in Open Images V6 we increased the types of visual relationship annotations by an order of magnitude (up to 1.4k), adding for example “a person riding a skateboard”, “persons holding hands”, and “dog catching a flying disk”.
 

Learn more: An algorithm from Google where robots can detect the transparent object
 

People in images have been at the core of computer vision interests since its inception and understanding what those people are doing is of utmost importance for many applications. That is why Open Images V6 also includes 2.5M annotations of humans performing standalone actions, such as “jumping”, “smiling”, or “laying down”. As Google images are contributing to healthcare as well, recently, the company confirmed its AI can detect skin diseases with accuracy comparable to dermatologists.
 

In short, Open Images V6 is a significant qualitative and quantitative step towards improving the unified annotations for image classification, object detection, visual relationship detection, and instance segmentation, and takes a novel approach in connecting vision and language with localized narratives. Let’s hope that Open Images V6 will further stimulate progress towards genuine scene understanding.

Spotlight

 Trend Micro™ Mobile Security for Android™ provides comprehensive protection for Android devices. It safeguards against malicious apps, fraudulent websites, and identity theft, and includes performance-boosting tools. Key benefits: Protects against ransomware, fake banking, shopping


Other News
SOFTWARE

Cure, The Industry's First Self-Repairing Software, Is Released By Whitesource

WhiteSource | July 30, 2021

WhiteSource Cure, the first-ever security auto-remediation programme developed for bespoke code, was released today. This ground-breaking release enables enterprises to increase the speed with which safe software is delivered at scale. Today's software developers and security professionals are struggling to resolve an ever-growing backlog of security vulnerabilities while adhering to ambitious delivery timetables. Indeed, according to WhiteSource customer feedback, the average developer effort for a single security repair is about half a day, which can lead to significant delays in product deliveries. WhiteSource Cure functions as a developer's personal security specia...

Read More

SOFTWARE

Ampere Will Purchase Onspecta In Order to Accelerate AI Inference on Cloud-Native Applications

Ampere | July 29, 2021

Ampere® Computing announced today that it has agreed to purchase the AI technology firm OnSpecta, which will improve Ampere® Altra® performance with AI inference applications. The OnSpecta Deep Learning Software (DLS) AI optimization engine can significantly outperform commonly used CPU-based machine learning (ML) frameworks. The businesses have already worked together and showed over 4x acceleration on Ampere-based instances performing typical AI-inference workloads. An optimised model zoo with object identification, video processing, and recommendation engines will be included in the acquisition. The terms of the transaction were not disclosed, but it is expected to conc...

Read More

SOFTWARE

Nintex Workflow Cloud Now Includes AI-Based Capabilities and Integrations

Nintex | July 28, 2021

Nintex, the global system for process management and automation, announced today the latest enhancements to its next generation Nintex Workflow Cloud, which is designed for operations, IT, process professionals, and system administrators to reform the way employees work by making it quicker and easier to manage, digitise, and optimise business processes and workflows. The following are some of the most desired new features and functionality in Nintex Workflow Cloud's current release: Intelligent PDF Form Converter - This AI-powered tool converts static PDFs into interactive digital forms in real time. Nintex W...

Read More

SOFTWARE

BlackBerry Jarvis 2.0 Is Released to Address the Expanding Global Embedded Cybersecurity Landscape

BlackBerry | July 27, 2021

BlackBerry Limited today announced the availability of BlackBerry Jarvis 2.0, the company's premier software composition analysis tool. BlackBerry Jarvis 2.0 introduces a SaaS version of the original Jarvis capabilities, providing developers and integrators with a more user-friendly, focused feature set centred on the three most important areas that those developing mission-critical applications must validate to ensure the quality of their multi-tiered software supply chain: Open-source Software (OSS), Common Vulnerabilities, and Exposurability. BlackBerry Jarvis 2.0, designed to address the increasing complexity and growing cybersecurity threats among multi...

Read More

Spotlight

 Trend Micro™ Mobile Security for Android™ provides comprehensive protection for Android devices. It safeguards against malicious apps, fraudulent websites, and identity theft, and includes performance-boosting tools. Key benefits: Protects against ransomware, fake banking, shopping

Resources

Events