Google is working on a project to make its AI chips run PyTorch, the world’s most popular AI software framework. The effort aims to weaken Nvidia’s grip on AI computing. According to Reuters, the initiative is part of Google’s plan to make its Tensor Processing Units a real alternative to Nvidia’s dominant GPUs.
New Software Initiative Targets Developer Adoption
The project, called TorchTPU internally, seeks to remove a key barrier slowing TPU adoption. Google wants to make its chips fully compatible with PyTorch, which most AI developers already use. The company is also considering making parts of the software open source to speed uptake among customers.
PyTorch is an open-source project heavily supported by Meta. It is one of the most widely used tools for developers building AI models. Nvidia’s engineers have spent years ensuring that PyTorch runs fast on its chips. Google, by contrast, has long used a different framework called Jax for its internal work. This gap has made it harder for customers to adopt TPU chips.
Closing the Performance Gap
Google Cloud has ramped up production and sales of TPUs to external customers since 2022. But the mismatch between PyTorch and Jax means most developers face significant extra work to get TPUs performing as well as Nvidia’s GPUs. Enterprise customers have told Google that TPUs are harder to adopt because they historically required a switch to Jax. If TorchTPU succeeds, it could reduce switching costs for companies seeking alternatives to Nvidia.
Meta Joins the Effort
Google is working closely with Meta, the creator and steward of PyTorch, to speed development. The two tech giants have been discussing deals for Meta to access more TPUs. Early offerings for Meta were structured as Google-managed services, with Google providing operational support.
Meta has a strategic interest in making it easier to run TPUs. The move could help Meta lower inference costs and diversify its AI infrastructure away from Nvidia’s GPUs. That would give Meta more negotiating power. This year, Google began selling TPUs directly into customers‘ data centers rather than limiting access to its own cloud. Amin Vahdat was named head of AI infrastructure this month, reporting directly to CEO Sundar Pichai.