分类
正在加载今日诗词...
  文章分类
DeepSpeed-Chat Llama/Llama-2 DeepSpeed-Chat Llama/Llama-2
blogDeepSpeed-Chat for llama/llama2 简介DeepSpeed-Chat 是一个用于 RLHF 训练的通用系统框架,它能够轻松、快速、经济、可扩展地训练类似于 ChatGPT 的模型, GitHub。 已更新
FULL PARAMETER FINE-TUNING FOR LARGE LANGUAGE MODELS WITH LIMITED RESOURCES FULL PARAMETER FINE-TUNING FOR LARGE LANGUAGE MODELS WITH LIMITED RESOURCES
论文FULL PARAMETER FINE-TUNING FOR LARGE LANGUAGE MODELS WITH LIMITED RESOURCES 简介该论文主要集中于低资源下大模型的训练问题,提出LOMO优化器,使得大模型全参数微
QLoRA-Efficient Finetuning of Quantized LLMs QLoRA-Efficient Finetuning of Quantized LLMs
论文QLORA: Efficient Finetuning of Quantized LLMs 摘要作者提出了QLORA,一种高效的微调方法,可以在单个48GB的GPU上微调一个65B参数的模型,同时保持完整的16位微调任务性能,从而降低内
deepspeed deepspeed
deepspeed config 键值使用 train_batch_size:设置训练时的批量大小。 gradient_accumulation_steps:设置梯度累积的步数,以减少通信开销和内存占用。 fp16:设置是否使用混合精
MobileBERT a Compact Task-Agnostic BERT for Resource-Limited Devices MobileBERT a Compact Task-Agnostic BERT for Resource-Limited Devices
论文MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices introductionMobileBERT被设计成和$BERT_{large}$一样深,而每一
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
论文Distilling Task-Specific Knowledge from BERT into Simple Neural Networks 1 介绍2 相关工作模型压缩 一项突出的工作致力于压缩大型神经网络以加速推理.早期的开创性