Home > News > Top Stories > Google rolls out Open Images V6 with localized narratives to strengthen AI

Google rolls out Open Images V6 with localized narratives to strengthen AI

Google rolled out a new version of Open Images that adds millions of additional data points.
Along with human motion annotations, and image-level labels, it has added a new type of multimodal annotations called localized narratives.
With these new additions, it has created localized narratives for about 500,000 Open Images files so far and it is expected that v6 will further stimulate progress towards genuine scene understanding.

Today, Google’s Open Images corpus for computer vision tasks got a boost with new visual relationships, human action annotations, and image-level labels, as well as a new form of multimodal annotations, called localized narratives. It expands the annotation of the Open Images dataset with a large set of new visual relationships (e.g., “dog catching a flying disk”), human action annotations (e.g., “person jumping”), and image-level labels (e.g., “paisley”). In Open Images V6, these localized narratives are available for 500k of its images. Additionally, in order to facilitate comparison to previous works, Google also releases localized narrative annotations for the full 123k images of the COCO dataset.

Google says this last addition could create “potential avenues of research” for studying how people describe images, which could lead to interface design insights (and subsequent improvements) across the web, desktop, and mobile apps.

Learn more: Google Cloud AI removing the ability to label people in images based on gender

In 2016, Google introduced Open Images, a data set of millions of labeled images spanning thousands of object categories. Major updates arrived in 2018 and 2019, bringing with them 15.4 million bounding-boxes for 600 object categories. It is the largest annotated image dataset in many regards, for use in training the latest deep convolutional neural networks for computer vision tasks. With the introduction of version 5 last May, the Open Images dataset includes 9M images annotated with 36M image-level labels, 15.8M bounding boxes, 2.8M instance segmentations, and 391k visual relationships. Along with the dataset itself, the associated Open Images Challenges have spurred the latest advances in object detection, instance segmentation, and visual relationship detection.

“Along with the data set itself, the associated Open Images challenges have spurred the latest advances in object detection, instance segmentation, and visual relationship detection.”

Jordi Pont-Tuset, a research scientist at Google Research.

As Pont-Tuset explains, one of the motivations behind localized narratives is to leverage the connection between vision and language, which is typically done via image captioning (i.e., images paired with written descriptions of their content). But image captioning lacks visual “grounding.” To mitigate this, some researchers have drawn bounding boxes for the nouns in captions after the fact — in contrast to localized narratives, where every word in the description is grounded.

The localized narratives in Open Images were generated by annotators who provided spoken descriptions of images while hovering over regions they were describing with a computer mouse. The annotators manually transcribed their description, after which Google researchers aligned it with automatic speech transcriptions, ensuring that the speech, text, and mouse trace were correct and synchronized.

“Speaking and pointing simultaneously is very intuitive, which allowed us to give the annotators very vague instructions about the task. We hope that it will further stimulate progress toward genuine scene understanding, ” said Pont-Tuset.

Speaking and pointing simultaneously is very intuitive, which allowed Google to give the annotators very vague instructions about the task. This creates potential avenues of research for studying how people describe images. For example, observe different styles when indicating the spatial extent of an object — circling, scratching, underlining, etc. — the study of which could bring valuable insights for the design of new user interfaces.

“To get a sense of the amount of additional data these localized narratives represent, the total length of mouse traces is ~6400 km long, and if read aloud without stopping, all the narratives would take ~1.5 years to listen to!”

New Visual Relationships, Human Actions, and Image-Level Annotations

In addition to the localized narratives, in Open Images V6 we increased the types of visual relationship annotations by an order of magnitude (up to 1.4k), adding for example “a person riding a skateboard”, “persons holding hands”, and “dog catching a flying disk”.

Learn more: An algorithm from Google where robots can detect the transparent object

People in images have been at the core of computer vision interests since its inception and understanding what those people are doing is of utmost importance for many applications. That is why Open Images V6 also includes 2.5M annotations of humans performing standalone actions, such as “jumping”, “smiling”, or “laying down”. As Google images are contributing to healthcare as well, recently, the company confirmed its AI can detect skin diseases with accuracy comparable to dermatologists.

In short, Open Images V6 is a significant qualitative and quantitative step towards improving the unified annotations for image classification, object detection, visual relationship detection, and instance segmentation, and takes a novel approach in connecting vision and language with localized narratives. Let’s hope that Open Images V6 will further stimulate progress towards genuine scene understanding.

Spotlight

More featured news

Spotlight

Related News

AI Tech

Qlik Launches AI Council to Responsibly Accelerate Enterprise Adoption of AI

Qlik | January 25, 2024

Qlik, a global leader in data analytics and integration, today announces the establishment of its inaugural AI Council – an initiative that further embeds leading edge, ethical AI development at the heart of the company’s mission and industry proposition. By convening a distinguished set of advisors, Qlik will accelerate the responsible development of its AI-driven product portfolio, benefitting from the expertise of some of the world’s most prominent AI experts, to help customers use their data to achieve more significant business outcomes. Qlik’s Generative AI Benchmark Report found that 31% of senior executives plan to spend over $10 million on generative AI initiatives in the coming year and 79% have already invested in generative AI tools or projects. Despite this enthusiasm, it also found that they understand the need to surround them with the right data strategies to realize their potential. If the data building blocks of AI are not governed properly as it is democratized across the entire workforce, it could present a serious threat to the efficiency and integrity of business operations. The AI Council has been established to help Qlik’s customers navigate these challenges and advise on best practices. Members of the Council will work within Qlik to guide the company’s R&D direction, inform its product roadmap and ensure its customers’ use of Qlik’s AI is built with responsibility and ethics front of mind. The Council will also educate Qlik leaders and employees on how to harness the full potential of AI, while providing insights into the priorities of business leaders tasked with demonstrating value from AI investment. The AI Council features some of the most renowned subject matter experts from around the world. More information on these members can be found on our Qlik Staige website: <ul> <li> Nina Schick – Author, Advisor and Founder of an advisory firm focused on GenAI A world-leading authority on generative AI, Nina has long been analyzing emerging technology trends for society. With over two decades of geopolitical experience, she has advised global leaders, including Joe Biden, President of the United States, and was articulating her vision of the ‘AI inflexion point’ years before ChatGPT made AI a global phenomenon.</li> </ul> <ul> <li> Dr. Rumman Chowdhury – Responsible AI leader, engineer, auditor and investor Rumman is a pioneer in the field of applied algorithmic ethics, creating cutting-edge socio-technical solutions for ethical, explainable and transparent AI. She is currently the CEO and founder of Humane Intelligence, a tech nonprofit that builds a community of practice around algorithmic evaluations. She has also served on multiple boards, including the UK Center for Data Ethics and Innovation, and on UN’s Broadband Commission for Sustainable Development, Oxford University’s Commission on AI and Governance, and Patterns data science journal. Previously, Rumman was the Director of META (ML Ethics, Transparency, and Accountability) team at Twitter, leading a team of applied researchers and engineers to identify and mitigate algorithmic harms on the platform.</li> </ul> <ul> <li> Kelly Forbes – Co-Founder and Executive Director, AI Asia Pacific Institute Kelly sits at the intersection of policy, research and industry, working with leading organizations and governments to address the risks associated with AI through international cooperation. With extensive experience in the Asia-Pacific region, Kelly has conducted research on AI governance, public-private dialogue and government policy issues.</li> </ul> <ul> <li> Dr. Michael Bronstein – DeepMind Professor of Artificial Intelligence, University of Oxford An award-winning academic, Michael was previously Head of Graph Learning Research at Twitter, a professor at Imperial College London and has held visiting appointments at Stanford, MIT, and Harvard. Michael is also a serial entrepreneur, having founded startups such as Novafora, Invision (acquired by Intel in 2012), Videocites and Fabula AI (acquired by Twitter in 2019).</li> </ul> "The formation of Qlik's AI Council is a strategic leap, reflecting our deep-seated commitment to not just advancing AI, but doing so with ethical integrity and practical applicability," said Mike Capone, CEO of Qlik. "Our goal is crystal clear: to enable our customers to harness AI in a way that's not only transformative, but also responsible. By uniting a cadre of AI luminaries, we are sharpening our focus on delivering AI solutions that are not just cutting-edge, but also seamlessly integrated and governed. This initiative is a pivotal chapter in our journey, marking a bold move towards democratizing AI in a manner that is both accessible and aligned with our core mission of driving substantial, data-driven business outcomes." Data and analytics leaders from around the world can hear from the AI Council at Qlik Connect, which takes place on June 3-5 in Orlando, Florida. At the pre-eminent event for data analytics, integration, and AI, Council members will share their take on the opportunities and challenges for businesses exploring the value of automation in their data strategy. Additional details and event registration is at www.qlikconnect.com “I am excited to join Qlik’s AI Council and work with some of the greatest minds in AI to optimize how businesses around the world use data,” said Rumman Chowdhury, member of Qlik’s AI Council. “We’ve reached an inflection point where innovations like generative AI are impacting the world as the internet did. This is not the time for complacency. ‘Adopting AI’ is not as simple as some suggest, but getting left behind is a risky game. By taking responsible steps, organizations can enter an era of unprecedented innovation – I look forward to being able to contribute to this evolution.” “In working at JBS USA, I recognize the significance of Qlik's advancements in AI, embodying a responsible and pragmatic approach to enterprise AI development,” said Stephanie Robinson, IT Business Intelligence Manager at JBS. “Qlik's dedication to enhancing AI applications aligns with our focus on employing technology to drive substantial business outcomes. We value Qlik's commitment to ethical AI practices and are optimistic about the beneficial impact this will have on the industry.” About Qlik Qlik converts complex data landscapes into actionable insights, driving strategic business outcomes. Serving over 40,000 global customers, our portfolio leverages advanced, enterprise-grade AI/ML and pervasive data quality. We excel in data integration and governance, offering comprehensive solutions that work with diverse data sources. Intuitive analytics from Qlik uncover hidden patterns, empowering teams to address complex challenges and seize new opportunities. Our AI/ML tools, both practical and scalable, lead to better decisions, faster. As strategic partners, our platform-agnostic technology and expertise make our customers more competitive. 2024 QlikTech International AB. All rights reserved. All company and/or product names may be trade names, trademarks and/or registered trademarks of the respective owners with which they are associated. The development, release and timing of any product or functionality described herein remain at the sole discretion of Qlik and should not be relied upon in making a purchasing decision.

AI Tech

PSPDFKit Leads the AI Revolution in Intelligent Document Processing with Release of XtractFlow

PSPDFKit | January 23, 2024

PSPDFKit, a leading document processing and manipulation platform, announces the release of XtractFlow – a groundbreaking intelligent document processing (IDP) engine powered by generative AI. XtractFlow provides advanced automation for large-scale document classification and data extraction across a broad range of formats, with human-level accuracy. Due to the vast and varied document landscape, traditional IDP platforms are inefficient to run, requiring more resources, time and complex processes. Developers and automation project managers may require several days for setup and deployment of document classification, as well as face complexities with data extraction workflows when working with a broad range of formats in order to achieve high levels of accuracy. This is exacerbated by the limitation of using templates and pre-set patterns to extract data. XtractFlow addresses these challenges by simplifying setup and deployment to a single day and using generative AI from OpenAI and Azure in the first release to intelligently identify the document format, classify the types of documents co-mingled in unstructured storage, and consistently extract data, regardless of its location in the document, with human-level accuracy. AI-Powered Features Supports Hundreds of Formats XtractFlow efficiently extracts data from hundreds of document formats, including PDF, JPEG, Office and CAD files, regardless of document complexity. Automate Document Classification With minimal setup, XtractFlow automatically categorizes documents including contracts, legal filings, lab reports, bank statements and more, in high-volume workflows. Developers can easily customize deployment with either the XtractFlow SDK or API. Extract Data Accurately and Effortlessly XtractFlow effortlessly interprets and retrieves the data users need, avoiding extensive coding and the strict rules for data extraction with a no-code approach, and enables a natural-language experience for the end users. "We've always felt we could go far beyond traditional IDP technology — more accuracy and more intelligence — along with less work spent setting and tuning it for specific workflows," says Miloš Đekić, Vice President of Product Management at PSPDFKit. "Generative AI has enabled us to deliver XtractFlow and bring human-level accuracy to document classification and data extraction in a way that significantly accelerates time to value for our customers." XtractFlow supports PSPDFKit customer data security standards with a strict non-storage policy and aligns with global data retention standards, ensuring integrity and security at every step of your applications and business processes. About PSPDFKit PSPDFKit is helping the world innovate beyond paper with its developer tools, API services, and low-code solutions covering the entire document lifecycle from creation, manipulation, real-time collaboration, signing and markup. The company's products cover all major platforms and support a wide range of programming languages and can be deployed on-premise or in the cloud with ease and at any scale. PSPDFKit has earned its developer first reputation by pioneering products that are easily integrated, completely customizable to fit any deployment and workflow, and trusted by startups, SMBs and some of the largest multinational enterprises.

AI Tech

AI and Big Data Expo North America announces leading Speaker Lineup

TechEx Events | March 07, 2024

AI and Big Data Expo North America announces new speakers! SANTA CLARA, CALIFORNIA, UNITED STATES, February 26, 2024 /EINPresswire.com/ -- TheAI and Big Expo North America, the leading event for Enterprise AI, Machine Learning, Security, Ethical AI, Deep Learning, Data Ecosystems, and NLP, has announced a fresh cohort of distinguishedspeakersfor its upcoming conference at the Santa Clara Convention Center on June 5-6, 2024. Some of the top industry speakers set to take the stage are: - Sam Hamilton - Head of Data & AI – Visa - Dr Astha Purohit - Director - Product (Tech) Ops – Walmart - Noorddin Taj - Head of Architecture and Design of Intelligent Operations - BP - Temi Odesanya - Director - AI Governance Automation - Thomson Reuters - Katie Sanders - Assistant Vice President – Tech - Union Pacific Railroad - Prasanth Nandanuru – SVP - Wells Fargo - Rodney Brooks - Professor Emeritus - MIT These esteemed speakers bring a wealth of knowledge and expertise to an already impressive lineup, promising attendees a truly enlightening experience. In addition to the speakers, theAI and Big Data Expo North Americawill feature a series of presentations covering a diverse range of topics in AI and Big Data exploring the latest innovations, implementations and strategies across a range of industries. Attendees can expect to gain valuable insights and practical strategies from presentations such as: How Gen AI Positively Augments Workforce Capabilities Trends in Computer Vision: Applications, Datasets, and Models Getting to Production-Ready: Challenges and Best Practices for Deploying AI Ensuring Your AI is Responsible and Ethical Mitigating Bias and Promoting Fairness in AI Systems Security Challenges in the Era of Gen AI and Data Science AI for Good: Social Impact and Ethics Selling Data Democratization to Executives Spreading Data Insights across the Business Barriers to Overcome: People, Processes, and Technology Optimizing the Customer Experience with AI Using AI to Drive Growth in a Regulated Industry Building an MLOps Foundation for AI at Scale The Expo offers a platform for exploration and discovery, showcasing how cutting-edge technologies are reshaping a myriad of industries, including manufacturing, transport, supply chain, government, legal sectors, financial services, energy, utilities, insurance, healthcare, retail, and more. Attendees will have the chance to witness firsthand the transformative power of AI and Big Data across various sectors, gaining insights that are crucial for staying ahead in today's rapidly evolving technological landscape. Anticipating a turnout of over 7000 attendees and featuring 200 speakers across various tracks, AI and Big Data Expo North America offers a unique opportunity for CTO’s, CDO’s, CIO’s , Heads of IOT, AI /ML, IT Directors and tech enthusiasts to stay abreast of the latest trends and innovations in AI, Big Data and related technologies. Organized by TechEx Events, the conference will also feature six co-located events, including the IoT Tech Expo, Intelligent Automation Conference, Cyber Security & Cloud Congress, Digital Transformation Week, and Edge Computing Expo, ensuring a comprehensive exploration of the technological landscape. Attendees can choose from various ticket options, providing access to engaging sessions, the bustling expo floor, premium tracks featuring industry leaders, a VIP networking party, and a sophisticated networking app facilitating connections ahead of the event. Secure your ticket with a 25% discount on tickets, available until March 31st, 2024. Save up to $300 on your ticket and be part of the conversation shaping the future of AI and Big Data technologies. For more information and to secure your place at AI and Big Data Expo North America, please visit https://www.ai-expo.net/northamerica/. About AI and Big Data Expo North America: The AI and Big Data Expo North America is a leading event in the AI and Big Data landscape, serving as a nexus for professionals, industry experts, and enthusiasts to explore and navigate the ever-evolving technological frontier. Through its focus on education, networking, and collaboration, the Expo continues to be a beacon for those eager to stay at the forefront of technological innovation. “AI and Big Data Expo North Americais a part ofTechEx. For more information regardingTechExplease see onlinehere.”

AI Tech

Qlik Launches AI Council to Responsibly Accelerate Enterprise Adoption of AI

Qlik | January 25, 2024

AI Tech

PSPDFKit Leads the AI Revolution in Intelligent Document Processing with Release of XtractFlow

PSPDFKit | January 23, 2024

AI Tech

AI and Big Data Expo North America announces leading Speaker Lineup

TechEx Events | March 07, 2024

More featured news