Uber AI scientists describe Fiber, a framework for distributed AI model training

  • Uber describes Fiber, an AI development and distributed training platform for methods including reinforcement learning and population-based learning.

  • The team says that Fiber expands the accessibility of large-scale parallel computation without the need for specialized hardware or equipment.

  • Fiber introduces the concept of job-backed processes, where processes can run remotely on different machines or locally on the same machine, and it makes use of containers to encapsulate the running environment of current processes to ensure everything is self-contained.


A preprint paper coauthored by Uber AI scientists and Jeff Clune, a research team leader at San Francisco startup OpenAI, describes Fiber, an AI development and distributed training platform for methods including reinforcement learning (which spurs AI agents to complete goals via rewards) and population-based learning. The team says that Fiber expands the accessibility of large-scale parallel computation without the need for specialized hardware or equipment, enabling non-experts to reap the benefits of genetic algorithms in which populations of agents evolve rather than individual members.
 


Fiber — which was developed to power large-scale parallel scientific computation projects like POET — is available in open source as of this week, on Github. It supports Linux systems running Python 3.6 and up and Kubernetes running on public cloud environments like Google Cloud, and the research team says that it can scale to hundreds or even thousands of machines.


As the researchers point out, increasing computation underlies many recent advances in machine learning, with more and more algorithms relying on distributed training for processing an enormous amount of data. (OpenAI Five, OpenAI’s Dota 2-playing bot, was trained on 256 graphics cards and 1280,000 processor cores on Google Cloud.) But reinforcement and population-based methods pose challenges for reliability, efficiency, and flexibility that some frameworks fall short of satisfying.


 

Fiber addresses these challenges with a lightweight strategy to handle task scheduling. It leverages cluster management software for job scheduling and tracking, doesn’t require preallocating resources, and can dynamically scale up and down on the fly, allowing users to migrate from one machine to multiple machines seamlessly.


Fiber comprises an API layer, backend layer, and cluster layer. The first layer provides basic building blocks for processes, queues, pools, and managers, while the backend handles tasks like creating and terminating jobs on different cluster managers. As for the cluster layer, it taps different cluster managers to help manage resources and keep tabs on different jobs, reducing the number of items Fiber needs to track.


Fiber introduces the concept of job-backed processes, where processes can run remotely on different machines or locally on the same machine, and it makes use of containers to encapsulate the running environment (e.g., required files, input data, and dependent packages) of current processes to ensure everything is self-contained. The framework has built-in error handling when running a pool of workers to enable crashed workers to quickly recover. Helpfully, Fiber does all this while directly interacting with computer cluster managers, such that running a Fiber application is akin to running a normal app on a cluster.


Learn more: IBM releases annotation tool that taps AI to label images


In experiments, Fiber had a response time of a couple of milliseconds. With a population size of 2,048 workers (e.g., processor cores), it scaled better than two baseline techniques, with the length of time it took to run gradually decreasing with the increasing of the number of workers (in other words, it took less time to train 32 workers than the full 2,048 workers). With 512 workers, finishing 50 iterations of a training workload took 50 seconds, compared with the popular IPyParellel framework’s 1,400 seconds.

[Our work shows] that Fiber achieves many goals, including efficiently leveraging a large amount of heterogeneous computing hardware, dynamically scaling algorithms to improve resource usage efficiency, reducing the engineering burden required to make [reinforcement learning] and population-based algorithms work on computer clusters, and quickly adapting to different computing environments to improve research efficiency.

-Uber AI Scientists


Further they said, “We expect it will further enable progress in solving hard [reinforcement learning] problems with [reinforcement learning] algorithms and population-based methods by making it easier to develop these methods and train them at the scales necessary to truly see them shine.”

 

Fiber’s reveal comes days after Google released SEED ML, a framework that scales AI model training to thousands of machines. Google said that SEED ML could facilitate training at millions of frames per second on a machine while reducing costs by up to 80%, potentially leveling the playing field for startups that couldn’t previously compete with large AI labs. (edited)


Learn more: Uber explains VerCD-AI tech empowering self driving cars

Spotlight

Other News
AI Tech

AI and Big Data Expo North America announces leading Speaker Lineup

TechEx Events | March 07, 2024

AI and Big Data Expo North America announces new speakers! SANTA CLARA, CALIFORNIA, UNITED STATES, February 26, 2024 /EINPresswire.com/ -- TheAI and Big Expo North America, the leading event for Enterprise AI, Machine Learning, Security, Ethical AI, Deep Learning, Data Ecosystems, and NLP, has announced a fresh cohort of distinguishedspeakersfor its upcoming conference at the Santa Clara Convention Center on June 5-6, 2024. Some of the top industry speakers set to take the stage are: - Sam Hamilton - Head of Data & AI – Visa - Dr Astha Purohit - Director - Product (Tech) Ops – Walmart - Noorddin Taj - Head of Architecture and Design of Intelligent Operations - BP - Temi Odesanya - Director - AI Governance Automation - Thomson Reuters - Katie Sanders - Assistant Vice President – Tech - Union Pacific Railroad - Prasanth Nandanuru – SVP - Wells Fargo - Rodney Brooks - Professor Emeritus - MIT These esteemed speakers bring a wealth of knowledge and expertise to an already impressive lineup, promising attendees a truly enlightening experience. In addition to the speakers, theAI and Big Data Expo North Americawill feature a series of presentations covering a diverse range of topics in AI and Big Data exploring the latest innovations, implementations and strategies across a range of industries. Attendees can expect to gain valuable insights and practical strategies from presentations such as: How Gen AI Positively Augments Workforce Capabilities Trends in Computer Vision: Applications, Datasets, and Models Getting to Production-Ready: Challenges and Best Practices for Deploying AI Ensuring Your AI is Responsible and Ethical Mitigating Bias and Promoting Fairness in AI Systems Security Challenges in the Era of Gen AI and Data Science AI for Good: Social Impact and Ethics Selling Data Democratization to Executives Spreading Data Insights across the Business Barriers to Overcome: People, Processes, and Technology Optimizing the Customer Experience with AI Using AI to Drive Growth in a Regulated Industry Building an MLOps Foundation for AI at Scale The Expo offers a platform for exploration and discovery, showcasing how cutting-edge technologies are reshaping a myriad of industries, including manufacturing, transport, supply chain, government, legal sectors, financial services, energy, utilities, insurance, healthcare, retail, and more. Attendees will have the chance to witness firsthand the transformative power of AI and Big Data across various sectors, gaining insights that are crucial for staying ahead in today's rapidly evolving technological landscape. Anticipating a turnout of over 7000 attendees and featuring 200 speakers across various tracks, AI and Big Data Expo North America offers a unique opportunity for CTO’s, CDO’s, CIO’s , Heads of IOT, AI /ML, IT Directors and tech enthusiasts to stay abreast of the latest trends and innovations in AI, Big Data and related technologies. Organized by TechEx Events, the conference will also feature six co-located events, including the IoT Tech Expo, Intelligent Automation Conference, Cyber Security & Cloud Congress, Digital Transformation Week, and Edge Computing Expo, ensuring a comprehensive exploration of the technological landscape. Attendees can choose from various ticket options, providing access to engaging sessions, the bustling expo floor, premium tracks featuring industry leaders, a VIP networking party, and a sophisticated networking app facilitating connections ahead of the event. Secure your ticket with a 25% discount on tickets, available until March 31st, 2024. Save up to $300 on your ticket and be part of the conversation shaping the future of AI and Big Data technologies. For more information and to secure your place at AI and Big Data Expo North America, please visit https://www.ai-expo.net/northamerica/. About AI and Big Data Expo North America: The AI and Big Data Expo North America is a leading event in the AI and Big Data landscape, serving as a nexus for professionals, industry experts, and enthusiasts to explore and navigate the ever-evolving technological frontier. Through its focus on education, networking, and collaboration, the Expo continues to be a beacon for those eager to stay at the forefront of technological innovation. “AI and Big Data Expo North Americais a part ofTechEx. For more information regardingTechExplease see onlinehere.”

Read More