What’s a great piece of advice when you venture on a creative path that requires efficiency? Don’t reinvent the wheel. In a nutshell, that’s model merging. Model merging in the context of AI is the practice of taking existing AI models, including Large Language Models (LLMs), and using them to create your own. Like in […]
Author Archives: Massed Compute
NVIDIA’s CEO, Jensen Huang, revealed some of the most exciting technological innovations during his keynote presentation at Computex 2024 in early June. Surpassing Apple to become the second most valuable company in the U.S., NVIDIA and their research team is working on finding breakthroughs that will drive further adoption of AI. Below we explain a […]
In the era of Artificial Intelligence (AI), Large Language Models (LLMs) are redefining our interaction with technology, work, and how we process and understand information. Performing complex language-related tasks, LLMs are offering new possibilities for many industries, like healthcare, retail, finance and law. Have you considered how LLMs could help you in your day-to-day tasks? […]
Update: Looking for Llama 3.1 70B GPU Benchmarks? Check out our blog post on Llama 3.1 70B Benchmarks On April 18, 2024, the AI community welcomed the release of Llama 3 70B, a state-of-the-art large language model (LLM). This model is the next generation of the Llama family that supports a broad range of use […]
Often considered the “brain” of a computer, processors interpret and execute programs and tasks. In an ever-evolving tech landscape, it’s crucial to understand the instances in which various types of processors perform best. Below, let’s break down GPUs and CPUs, what they do and when to use one over the other for various workloads. What’s […]
ComfyUI provides users with a simple yet effective graph/nodes interface that streamlines the creation and modification of image generation tasks. The nodes in this interface correspond to different components of the image generation process, including text prompts, image inputs, and various AI-powered filters and augmentations. By connecting these nodes, users can create complex and dynamic […]
Recently, there have been a few posts about how open-source models like Llama 3 are catching up to the performance level of some proprietary models. Andrew Reed from Hugging Face created a visual representation of a progress tracker to compare various models. It is clearly showing a growing trend that open-source models are gaining ground. […]
Considerations Testing Scenario Startup Commands Token/Sec Results vLLM4xA600014.7 tokens/sec14.7 tokens/sec15.2 tokens/sec15.0 tokens/sec15.0 tokens/secAverage token/sec 14.92 2xH10020.3 tokens/sec20.5 tokens/sec20.3 tokens/sec21.0 tokens/sec20.7 tokens/secAverage token/sec 20.56 Hugging Face TGI4xA600012.38 tokens/sec12.53 tokens/sec12.60 tokens/sec12.55 tokens/sec12.33 tokens/secAverage token/sec 12.48 2xH10021.29 tokens/sec21.40 tokens/sec21.50 tokens/sec21.60 tokens/sec21.41 tokens/secAverage token/sec 21.44 Purely looking at a token/sec result, Hugging Face TGI produces the most tokens/sec on […]
Considerations Testing Scenario Results Conclusion
Introduction – Multiple LLM APIs If you haven’t already, go back and read Part 1 of this series. In this guide we take a look at how you can serve multiple models in the same VM. As you start to decide how you want to serve models as an inference endpoint you have a few […]