The NVIDIA Eos Is A Groundbreaking AI Supercomputer That Is Capable of up to 18.4 Exaflops

Peter_Brosdahl · Feb 17, 2024

The NVIDIA Eos is an AI supercomputer comprised of 576 NVIDIA DGX H100 systems in a large-scale enterprise DGX SuperPOD. The NVIDIA Eos is named in honor of the Greek goddess of the dawn who opened the gates of dawn to advance each day's beginning. NVIDIA says that its supercomputer reflects its commitment to advancing AI technology. Eos holds the number nine spot in the top 500 list of the world's fastest computers.

See full article...

Grimlakin · Feb 17, 2024

So I have to ask... if a company wants their own version of EOS, with building a center, cooling, power, connectivity, staffing and finally the compute itself... how much are we talking? Personally... I'm betting on right around 2 billion.

Elf_Boy · Feb 17, 2024

I am curious, not just specifically about this system though it is included, is this still like the days of old using a Prime system where if you are the only one logged in you get all the horses to yourself (provided the code can parallel enough? Or is it about how many simultaneous things it can cooperatively do?

I know I dont know much about large scale computing - other the what little I learned in the 1980's, on a system that was even in its day small, or as an end user working with TAO at a couple of jobs I have had.

Grimlakin · Feb 17, 2024

@Elf_Boy If you're doing AI queries today... heck even my computer can be conversationally fast on a moderately sized 20gb LLM. SO I'd imagine something like this would be for many users, OR for scientific AI use. Right now I bet the real challenge is getting an LLM that cand work AND restricting it's dataset to a specific rule set for responses. Meaning it can't give 'wrong' data or hallucinate. But it could derive results and still need to be checked. Perhaps even feeding it's results to another process to verify.

I think the question is more what are the specific use cases this is intended for. Sure it could handle tens of thousands if not millions of generic AI questions and conversations. BUT could it also do X.

Brian_B · Feb 17, 2024

Grimlakin said:
So I have to ask... if a company wants their own version of EOS, with building a center, cooling, power, connectivity, staffing and finally the compute itself... how much are we talking? Personally... I'm betting on right around 2 billion.

At the time, Aurora was funded by DOE and was one of the fastest - this was Allllllllll the way back in 2019. It was 2 exaflops for 500 million dollars.

That said - that’s just the computer cost. You still need the infrastructure to run it. It requires almost 25 MW of power, insane HVAC requirements, and a suitable building for installation.

Riccochet · Feb 19, 2024

Elf_Boy said:
I am curious, not just specifically about this system though it is included, is this still like the days of old using a Prime system where if you are the only one logged in you get all the horses to yourself (provided the code can parallel enough? Or is it about how many simultaneous things it can cooperatively do?

I know I dont know much about large scale computing - other the what little I learned in the 1980's, on a system that was even in its day small, or as an end user working with TAO at a couple of jobs I have had.

The compute of these systems is typically sliced up and leased out in time chunks. Like "whatever" university needs this much compute for this amount of time and will cost X dollars.

Someone getting all the beans, even for a short amount of time, would be very expensive. These massive compute centers are so that many people can get chunks for a certain amount of time to complete their work.

The NVIDIA Eos Is A Groundbreaking AI Supercomputer That Is Capable of up to 18.4 Exaflops

Peter_Brosdahl

Moderator

Grimlakin

Forum Posting Supreme

Elf_Boy

Quasi-regular

Grimlakin

Forum Posting Supreme

Brian_B

Forum Posting Supreme

Riccochet

FPS Regular