What’s a great piece of advice when you venture on a creative path that requires efficiency? Don’t reinvent the wheel. In a nutshell, that’s model merging.
Model merging in the context of AI is the practice of taking existing AI models, including Large Language Models (LLMs), and using them to create your own. Like in music, when producers sample hit tracks to create something new, you can leverage successful AI models to serve your needs. The new model will be a modified, expanded or improved version of the ideas you’ve “borrowed.”
What’s the benefit of model merging on AI?
When working on a machine learning or deep learning project, we want to create a model that makes accurate predictions.
By combining the strengths of different models, you can improve your model’s capacity to understand data, which can ultimately reduce errors (since mistakes made by individual models can be offset by others).
It’s worth noting that model merging does require more computational resources. However, the improved accuracy and reduced need for retraining can save costs in the long run (since you can add new models that account for new data trends).
How can you merge models?
To merge LLMs, you can typically follow three basic steps:
- Install a Python toolkit called mergekit. This mergekit is designed to merge pre-trained models using various techniques.
- In Python, specify which models and parameters you want to use. Hugging Face provides access to thousands of open source models.
- Finally, merge the models.
To merge models, you should have a basic understanding of how Python’s terminal works.
For a more in-depth description of model merging steps, check out this video on how to Merge LLMs to Make the Best Performing AI Model.
What are the most common model merging techniques?
The most common model merging techniques include Task Vector Arithmetic, SLERP, TIES and DARE. All these techniques are based on modifying task vectors.
What are vectors in a neural network?
Vectors in a neural network are an ordered sequence of numbers that represent the weights (or relevance) of the different neurons that pass information to each other.
These weights (adjusted during training and fine-tuning) play a big part in how models perform and how accurate their answers are.
What is a neuron or neural network in AI?
A neuron in a neural network is a basic processing unit that attempts to emulate the function of the neurons that humans have in our brains.
These neurons are the building blocks that form a neural network, and they are responsible for performing mathematical calculations and receiving and transmitting information through the network.
Task Vector Arithmetic
With Task Vector Arithmetic, work is done linearly, and vector values are modified using addition (learning) and subtraction (forgetting).
How many models can you merge with this technique?
This technique allows you to merge multiple models.
SLERP (Spherical Linear Interpolation)
With SLERP, the vectors are converted to spherical coordinates—preserving the characteristics and distinct information biases of a model while finding a middle ground between the two.
How many models can you merge with this technique?
This technique allows you to merge only two models at a time.
TIES and DARE
Task Vector Arithmetic and SLERP model merging methods often don’t take into account how the parameters of different models may interfere with each other (as they may not all be equal).
This can lead to a decrease in performance when combining multiple models. Also, this can lead to important information getting lost in the merging process.
TIES and DARE are techniques that are similar to each other and are more advanced. They are used to solve the problem of conflicting parameters.
TIES focuses on identifying parameters with the most significant changes, creating vectors that align the parameters, and merging those aligned parameters.
DARE focuses on resetting parameters and rescaling the weights of the models.
How many models can you merge with these techniques?
TIES and DARE can be used to merge multiple models.
Creating a new model through model merging
The use of tools like mergekit allows businesses and individuals to combine LLMs into one. By choosing the right model merging techniques, you can develop AI systems that cater to your specific needs.
Remember, you’ll need hardware and GPUs powerful enough to merge your models. If you don’t want to invest in building your own, renting a cloud GPU is an excellent option.
AI Developer GPU
If developing an AI model, you need serious compute power. Get access to virtually unlimited NVIDIA GPUs and on-demand virtual machines backed by Tier III data centers—at half the market price.
Massed Compute offers the industry’s fastest, high-performance computing solutions with the most flexible and affordable plans. Check out our offerings.