- Exo supports Llama, Mistral, Llava, Qwen and Deepseek
- Can operate on Linux, MacOS, Android and iOS, but not Windows
- AI models requiring 16 GB of RAM can operate on two 8 GB laptops
The execution of large languages models (LLM) generally requires expensive and high performance equipment with substantial memory and GPU power. However, Exo software now seeks to offer an alternative by allowing the inference of distributed artificial intelligence (AI) on a network of devices.
The company allows users to combine the computer power of several computers, smartphones and even Monomodes (SBC) computers such as Raspberry Pis to execute models that would otherwise be inaccessible.
This decentralized approach shares similarities with the Seti @ domestic project, which has distributed computer tasks on volunteer machines. By taking advantage of a Peer-to-Peer network (P2P), Exo eliminates the need for a powerful and powerful system, making the inference IA more accessible to individuals and organizations.
How exo distributes workloads ai
Exo aims to question the domination of large technological companies in the development of AI. By decentralizing inference, he seeks to give individuals and small organizations more control over AI models, similar to initiatives focused on expanding access to GPU resources.
“The fundamental constraint with the AI is calculated,” says Alex Cheema, co-founder of Exo Labs. “If you don’t have the calculation, you cannot compete. But if you create this distributed network, maybe we can.”
The software dynamically partitions LLMS on the devices available in a network, by assigning model layers as a function of the available memory and processing power of each machine. The LLM supported include Llama, Mistral, Llava, Qwen and Deepseek.
Users can install Exo on Linux, MacOS, Android or iOS, although Windows support is not currently available. A minimum python version of 3.12.0 is required, as well as additional dependencies for systems executing Linux equipped with NVIDIA GPU.
One of the main forces of Exo is that, unlike traditional configurations that are based on high -end GPUs, it allows collaboration between different material configurations.
For example, an AI model requiring 16 GB of RAM can operate on two 8 GB laptops working together. A more demanding model as Deepseek R1, requiring about 1.3 TB of RAM, could theoretically work on a group of 170 Raspberry Pi 5 devices with 8 GB of RAM each.
Network speed and latency are critical concerns, and Exo developers recognize that adding low performance devices can slow down the inference latency, but insists that the overall speed improves with each device added to the network.
Security risks also occur when several machines share workloads, requiring guarantees to prevent data leaks and unauthorized access.
Adoption is another obstacle, because AI tool developers are currently based on large -scale data centers. The low cost of the exo approach can appeal. But the exo approach simply does not correspond to the speed of these high -end IA clusters.
Via CNX software