𞋴𝛂𝛋𝛆

  • 2 Posts
  • 17 Comments
Joined 2 years ago
cake
Cake day: June 9th, 2023

help-circle
  • There are over 100k homeless people within 100 miles of me right now. I have fallen through the cracks of this system and been subject to it directly after a broken neck and back. No one can survive on their own with the benefits and getting those benefits is nearly impossible now. It takes years of effort that is demeaning and degrading with dozens of intentional loopholes and cost barriers for systemic denials and terrible treatment. There are even treatments for my problems with stem cells in Japan, but the inbred halfwits of Western normalized cultural mysticism prevent any stem cell research and treatments here in this Luddite backwater.









  • I haven’t looked into the issue of PCIe lanes and the GPU.

    I don’t think it should matter with a smaller PCIe bus, in theory, if I understand correctly (unlikely). The only time a lot of data is transferred is when the model layers are initially loaded. Like with Oobabooga when I load a model, most of the time my desktop RAM monitor widget does not even have the time to refresh and tell me how much memory was used on the CPU side. What is loaded in the GPU is around 90% static. I have a script that monitors this so that I can tune the maximum number of layers. I leave overhead room for the context to build up over time but there are no major changes happening aside from initial loading. One just sets the number of layers to offload on the GPU and loads the model. However many seconds that takes is irrelevant startup delay that only happens once when initiating the server.

    So assuming the kernel modules and hardware support the more narrow bandwidth, it should work… I think. There are laptops that have options for an external FireWire GPU too, so I don’t think the PCIe bus is too baked in.



  • 𞋴𝛂𝛋𝛆@lemmy.worldtoSelfhosted@lemmy.worldConsumer GPUs to run LLMs
    link
    fedilink
    English
    arrow-up
    9
    arrow-down
    1
    ·
    edit-2
    22 days ago
    Anything under 16 is a no go. Your number of CPU cores are important. Use Oobabooga Textgen for an advanced llama.cpp setup that splits between the CPU and GPU. You'll need at least 64 GB of RAM or be willing to offload layers using the NVME with deepspeed. I can run up to a 72b model with 4 bit quantization in GGUF with a 12700 laptop with a mobile 3080Ti which has 16GB of VRAM (mobile is like that).

    I prefer to run a 8×7b mixture of experts model because only 2 of the 8 are ever running at the same time. I am running that in 4 bit quantized GGUF and it takes 56 GB total to load. Once loaded it is about like a 13b model for speed but is ~90% of the capabilities of a 70b. The streaming speed is faster than my fastest reading pace.

    A 70b model streams at my slowest tenable reading pace.

    Both of these options are exponentially more capable than any of the smaller model sizes even if you screw around with training. Unfortunately, this streaming speed is still pretty slow for most advanced agentic stuff. Maybe if I had 24 to 48gb it would be different, I cannot say. If I was building now, I would be looking at what hardware options have the largest L1 cache, the most cores that include the most advanced AVX instructions. Generally, anything with efficiency cores are removing AVX and because the CPU schedulers in kernels are usually unable to handle this asymmetry consumer junk has poor AVX support. It is quite likely that all the problems Intel has had in recent years has been due to how they tried to block consumer stuff from accessing the advanced P-core instructions that were only blocked in microcode. It requires disabling the e-cores or setting up a CPU set isolation in Linux or BSD distros.

    You need good Linux support even if you run windows. Most good and advanced stuff with AI will be done with WSL if you haven’t ditched doz for whatever reason. Use https://linux-hardware.org/ to see support for devices.

    The reason I mentioned avoid consumer e-cores is because there have been some articles popping up lately about all p-core hardware.

    The main constraint for the CPU is the L2 to L1 cache bus width. Researching this deeply may be beneficial.

    Splitting the load between multiple GPUs may be an option too. As of a year ago, the cheapest option for a 16 GB GPU in a machine was a second hand 12th gen Intel laptop with a 3080Ti by a considerable margin when all of it is added up. It is noisy, gets hot, and I hate it many times, wishing I had gotten a server like setup for AI, but I have something and that is what matters.


  • Most of the sand went up in Los Angeles county proper. It was diverted near the end of imports. The Newport Beach area has a ton too, but down here in South Orange County the beaches were not as supplemented and have already gone back to their rocky nature.

    This area is actually one of the few deep water upwelling regions on the planet. The reason why is the combination of wind direction and shore angle but also because just off of the coast the water is quite deep. Just a few hundred feet offshore the water can easily hit 100+ feet deep in many areas and there are valleys descending underwater. Like there is a dive park on Catalina Island over near the Casino. At around 50 feet from the shore the depth is already at the recreational dive limits, there is a ship at 106 feet down IIRC from two decades ago.

    Any sand gets washed down hill and into the valleys. That is why certain beaches were built to massive depths of sand like in Santa Monica and Newport. Eventually it will all wash away. There is no shallow coral reef structure or anything like that around Los Angeles, the water is too deep and cold to support anything like that. During the summer, if it is calm for a few weeks the upper thermocline will be around 20 feet down but it only takes one solid wind event and it will be back up around 8 feet down. On Catalina there were 3 major thermoclines in the middle of summer and at 100 feet it was quite chilly.

    But yeah, all that white sand is from Australia and stopped getting imported around 20 years ago IIRC. Natural beaches here are rocky with small spots of sedimentary sand.