Personal llm - pipe dream really

Elf_Boy

Quasi-regular
Joined
Nov 9, 2019
Messages
431
Points
43
Preferably under win 11. As I expect that using a VM to run Linux will lose to much cpu/gpu power or has that changed.

Other than using CoPilot I have no experience or knowledge on the topic of the underside of an LLM.

My hope is to set up an LLM that has a persistent memory, one where I can create characters (as in a book/fiction, describe settings and the like, and then ask for prose.

Is this doable?
 
You'll likely want to use Docker with WSL (windows subsystem for linux) to run these - shouldn't lose much in the way of performance.

Depending on how fancy you want to get, something like Langflow can help stitch together a few different sources that may be helpful, but might be fancier than you're looking for. You'd use Ollama as the AI backend there.

Basic setup would be one docker container running Ollama (where you pull the model(s) down you want to use), then another chat agent (or langflow) in a container that hits the Ollama service.
 
....and if you don't want to install / enable Hyper-V (I leave it off on most of my systems as I've had it cause issues in the past) you can just use Virtualbox. Hyper-V is required for WSL.

If you're more serious about performance, dual-boot into Linux; you can probably do this in a live environment (boot and run off USB).
 
Oh and running just about ANY decently sized LLM will consume your video card. Might be doable if you want a dual GPU box, one GPU for your LLM/AI the other for gaming and such. Might be an idea to make my 7900xtx a LLM card only... but that would cut down lanes for a 5090 provided I get one. More and more on the fence for that now.
 
Thank you all.

I do have Hyper V enabled, I've played around a little with VM's - getting Win 98 going to play old games kind of thing, or playing with a Linux system. How could I verify I have the linux subsystem installed/enabled? I think I have but it was a while back.

I get the LLM will heavily use the GPU, help me understand why I would need two GPU's to game rather then just shut down/Pause the VM while gaming?

What is Docker (I'll start googling shortly :) and Langflow?

I have (who doesnt) a few USB sticks laying around and booting off one is no issue. Last I looked at dual booting (with a boot manager) there were some quite serious concerns/issues with Linux corrupting Windows volumes and it was recommended to be read only. That was a very long time ago so I am thinking that has likely been fixed, but dont know for sure that is has. I dont want to deal with a boot manager so ya USB, should I pick up a new one of a certain size or larger and is the read/write speed of the USB relevant?

Looking at specs, my mobo has both USB A and C 3.2 gen 1 and Gen 2 ports. (looking that up) Ok Gen 2 is 10Gbps - can a usb even get that fast? I haz researching to do. Hmmm I have a 1 tb stick here somewhere if I can find it. Time to start looking.
 
Oh yea 100% you can shut down the VM/AI to game. I thought you were saying you wanted one running 24/7. You CAN do that with backyard and even have it accessible over the network.

I found backyard AI MUCH easier than running a little linux vm on my box. Though I have done both.
 
Training or Inference ?

Pure inference is possible on 2x epyc cpu with 768 gb ram & deepseek (without any gpu). It can process at rate of 7 to 8 tokens/sec
 
Become a Patron!
Back
Top