Home > News > Latest > Google launches TyDi QA, a dataset to decide the uniqueness of languages

Google launches TyDi QA, a dataset to decide the uniqueness of languages

Google introduced TyDi question and answer set that attempts to capture the idiosyncrasies and features of tongues.
TyDi QA is a set of questions and answers that contains 200k QA pairs from various languages.
Researchers conducted Google search to find a suitable question in various languages and asked people to highlight the answer from the same.

Google hopes to encourage the development of AI capable of understanding the ways in which languages express different meanings. To this end, company researchers detailed a data set — TyDi QA, a question-answering data set that covers 11 languages — inspired by typological variety, or the notion that different languages express meaning in structurally uncommon ways.

TyDi QA is one thing of a supplement to the English-language Herbal Questions corpus Google launched the previous year, and it makes an attempt to seize the idiosyncrasies and lines of tongues like Jap and Arabic. The researchers indicate, as an example, that English adjustments phrases to suggest one object (“ebook”) as opposed to many (“books”), and that Arabic has a 3rd shape to suggest if there are two of one thing (“كتابان”, kitaban) past simply singular (“كتاب”, kitab) or plural (“كتب”, kutub).

As a result of we selected a set of languages that are typologically distant from each other for this corpus, we expect models performing well on this dataset to generalize across a large number of the languages in the world.

~ Jonathan Clark, Research Scientist at Google wrote in a blog post.

TyDi QA includes over 200,000 question-answer pairs from languages representing a “diverse range” of linguistic phenomena and data challenges, many of which use non-Latin alphabets (such as Arabic, Bengali, Korean, Russian, Telugu, and Thai) and form words in complex ways (including Arabic, Finnish, Indonesian, Kiswahili, and Russian). The languages also range from those with an abundance of available data on the web (English and Arabic) to those with very little (Bengali and Kiswahili).

Learn more: American airlines to use Google Assistance as an AI translator

Unlike Google Assistance, Google translator is soon offering real-time translation. The questions in TyDi QA data set were collected from people who wanted an answer but still did not know the answer to avoid original questions that contained the same words as the answer. To inspire questions, the researchers showed taxpayers a Wikipedia passage written in their native language. This made the task easier and doable for them. Then they were asked to ask a question, any question provided it was not answered by the passage and really wanted to know the answer. For instance, “Does a passage on the ice make you think of ice lollies in summer? Great! Ask who invented the ice lollipops.”

It is important to note that the questions were written directly in each language, not translated, so many questions were different from those seen in the first English corpus. (For example, সফেদা ফল খেতে কেমন?, Or “What does the sapote taste like?”)

For each of the questions, the researchers conducted a Google search to find the most appropriate Wikipedia article in the appropriate language and asked a person to search and highlight the answer in that article. In some languages, they discovered that words were represented very differently in questions and answers, so differently that they expect the design of a system to successfully select an answer from a Wikipedia article is a challenge.

To track the progress of the community, they have established a leaderboard where participants can assess the quality of their machine learning systems.

We hope that this dataset will push the research community to innovate in a way that creates more useful systems of questions and answers for users around the world,

~ Jonathan Clark, Research Scientist at Google wrote in a blog post.

What is TyDi QA?

Typologically Diverse Question Answering is a benchmark for information-seeking question-answering in typologically diverse languages. Google presents TyDi QA, a question-answering dataset covering 11 typologically diverse languages with 200K question-answer pairs. The languages of TyDi QA are diverse with regard to their typology — the set of linguistic features that each language expresses — such that we expect models performing well on this set to generalize across a large number of the languages in the world. It presents a quantitative analysis of the data quality and example-level qualitative linguistic analyses of observed language phenomena that would not be found in English-only corpora. To provide a realistic information-seeking task and avoid priming effects, questions are written by people who want to know the answer but don’t know the answer yet, and the data is collected directly in each language without the use of translation.

Learn more: Facebook AI researchers are relying on Maths for automatic translations of words

Spotlight

Other News

AI Tech

AI and Big Data Expo North America announces leading Speaker Lineup

TechEx Events | March 07, 2024

AI and Big Data Expo North America announces new speakers! SANTA CLARA, CALIFORNIA, UNITED STATES, February 26, 2024 /EINPresswire.com/ -- TheAI and Big Expo North America, the leading event for Enterprise AI, Machine Learning, Security, Ethical AI, Deep Learning, Data Ecosystems, and NLP, has announced a fresh cohort of distinguishedspeakersfor its upcoming conference at the Santa Clara Convention Center on June 5-6, 2024. Some of the top industry speakers set to take the stage are: - Sam Hamilton - Head of Data & AI – Visa - Dr Astha Purohit - Director - Product (Tech) Ops – Walmart - Noorddin Taj - Head of Architecture and Design of Intelligent Operations - BP - Temi Odesanya - Director - AI Governance Automation - Thomson Reuters - Katie Sanders - Assistant Vice President – Tech - Union Pacific Railroad - Prasanth Nandanuru – SVP - Wells Fargo - Rodney Brooks - Professor Emeritus - MIT These esteemed speakers bring a wealth of knowledge and expertise to an already impressive lineup, promising attendees a truly enlightening experience. In addition to the speakers, theAI and Big Data Expo North Americawill feature a series of presentations covering a diverse range of topics in AI and Big Data exploring the latest innovations, implementations and strategies across a range of industries. Attendees can expect to gain valuable insights and practical strategies from presentations such as: How Gen AI Positively Augments Workforce Capabilities Trends in Computer Vision: Applications, Datasets, and Models Getting to Production-Ready: Challenges and Best Practices for Deploying AI Ensuring Your AI is Responsible and Ethical Mitigating Bias and Promoting Fairness in AI Systems Security Challenges in the Era of Gen AI and Data Science AI for Good: Social Impact and Ethics Selling Data Democratization to Executives Spreading Data Insights across the Business Barriers to Overcome: People, Processes, and Technology Optimizing the Customer Experience with AI Using AI to Drive Growth in a Regulated Industry Building an MLOps Foundation for AI at Scale The Expo offers a platform for exploration and discovery, showcasing how cutting-edge technologies are reshaping a myriad of industries, including manufacturing, transport, supply chain, government, legal sectors, financial services, energy, utilities, insurance, healthcare, retail, and more. Attendees will have the chance to witness firsthand the transformative power of AI and Big Data across various sectors, gaining insights that are crucial for staying ahead in today's rapidly evolving technological landscape. Anticipating a turnout of over 7000 attendees and featuring 200 speakers across various tracks, AI and Big Data Expo North America offers a unique opportunity for CTO’s, CDO’s, CIO’s , Heads of IOT, AI /ML, IT Directors and tech enthusiasts to stay abreast of the latest trends and innovations in AI, Big Data and related technologies. Organized by TechEx Events, the conference will also feature six co-located events, including the IoT Tech Expo, Intelligent Automation Conference, Cyber Security & Cloud Congress, Digital Transformation Week, and Edge Computing Expo, ensuring a comprehensive exploration of the technological landscape. Attendees can choose from various ticket options, providing access to engaging sessions, the bustling expo floor, premium tracks featuring industry leaders, a VIP networking party, and a sophisticated networking app facilitating connections ahead of the event. Secure your ticket with a 25% discount on tickets, available until March 31st, 2024. Save up to $300 on your ticket and be part of the conversation shaping the future of AI and Big Data technologies. For more information and to secure your place at AI and Big Data Expo North America, please visit https://www.ai-expo.net/northamerica/. About AI and Big Data Expo North America: The AI and Big Data Expo North America is a leading event in the AI and Big Data landscape, serving as a nexus for professionals, industry experts, and enthusiasts to explore and navigate the ever-evolving technological frontier. Through its focus on education, networking, and collaboration, the Expo continues to be a beacon for those eager to stay at the forefront of technological innovation. “AI and Big Data Expo North Americais a part ofTechEx. For more information regardingTechExplease see onlinehere.”

AI Tech

AI and Big Data Expo North America announces leading Speaker Lineup

TechEx Events | March 07, 2024

More Trending news

Google launches TyDi QA, a dataset to decide the uniqueness of languages

Spotlight

Other News

AI Tech

AI and Big Data Expo North America announces leading Speaker Lineup

AI Tech

AI and Big Data Expo North America announces leading Speaker Lineup

Spotlight

Resources

API Management Essentials for Optimized UX in 2024

Top 10 DevOps Tools and Platforms to Excel in Operations

Hypermedia APIs: Connecting the Future of API Design

API Management Essentials for Optimized UX in 2024

Top 10 DevOps Tools and Platforms to Excel in Operations

Hypermedia APIs: Connecting the Future of API Design