查看: 3532|回复: 4

人工智能大模型参数高效微调(PEFT) [复制链接]

guolipa

军衔等级：

新兵

注册：2024-3-27

发表于 2024-5-9 10:14:54 |显示全部楼层

当前以 ChatGPT 为代表的预训练语言模型（PLM）规模变得越来越大，在消费级硬件上进行全量微调（Full Fine-Tuning）变得不可行。此外，为每个下游任务单独存储和部署微调模型变得非常昂贵，因为微调模型与原始预训练模型的大小相同。参数高效微调方法（Parameter-Efficient Fine-Tuning，PEFT）方法被提出来解决这两个问题，PEFT 可以使 PLM 高效适应各种下游应用任务，而无需微调预训练模型的所有参数。微调大规模 PLM 所需的资源成本通常高得令人望而却步。在这方面，PEFT 方法仅微调少量或额外的模型参数，固定大部分预训练参数，降低了计算和存储成本，同时最先进的 PEFT 技术也能实现了与全量微调相当的性能。

Huggface 开源的一个高效微调大模型的库PEFT，该算法库支撑以下四类方法：

LoRA: LoRA: Low-Rank Adaptation of Large Language Models
Prefix Tuning: Prefix-Tuning: Optimizing Continuous Prompts for Generation, P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks
P-Tuning: GPT Understands, Too
Prompt Tuning: The Power of Scale for Parameter-Efficient Prompt Tuning

LLM-Adapters[1] 是对 PEFT 库的扩展，是一个简单易用的框架，将各种适配器集成到 LLM 中，可针对不同的任务实行 LLM 的基于适配器的 PEFT 方法，除了 PEFT 支撑的 LoRA、Prefix Tuning、P-Tuning、Prompt Tuning 方法外，主要扩增了 AdapterH、AdapterP 和 Parallel 三种方法。

AdapterH: Parameter-Efficient Transfer Learning for NLP
AdapterP: GMAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer
Parallel: Towards a Unified View of Parameter-Efficient Transfer Learning

PEFT 方法可以分为三类，不同的方法对 PLM 的不同部分进行下游任务的适配：

Prefix/Prompt-Tuning：在模型的输入或隐层添加 <span class="MathJax_SVG" id="MathJax-Element-1-Frame" tabindex="0" data-mathml="k" role="presentation" style="display: inline-block; line-height: normal; word-spacing: normal; overflow-wrap: normal; text-wrap: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; position: relative;">𝑘 个额外可训练的前缀 tokens（这些前缀是连续的伪 tokens，不对应真实的 tokens），只训练这些前缀参数；
Adapter-Tuning：将较小的神经网络层或模块插入预训练模型的每一层，这些新插入的神经模块称为 adapter（适配器），下游任务微调时也只训练这些适配器参数；
LoRA：通过学习小参数的低秩矩阵来近似模型权重矩阵 <span class="MathJax_SVG" id="MathJax-Element-2-Frame" tabindex="0" data-mathml="W" role="presentation" style="display: inline-block; line-height: normal; word-spacing: normal; overflow-wrap: normal; text-wrap: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; position: relative;">𝑊 的参数更新，训练时只优化低秩矩阵参数。

举报本楼

本帖有 4 个回帖，您需要登录后才能浏览登录 | 注册

返回列表

版规|手机版|C114 ( 沪ICP备12002291号-1 )|联系大家 |网站地图

GMT+8, 2025-4-12 19:22 , Processed in 0.124762 second(s), 17 queries , Gzip On.

Discuz Licensed

		自动登录	找回密码
密码			注册