可后台滴滴小英熊学长欧~
大语言模型(LLaMa、qwen等)进行微调时,考虑到减少显存占用,会使用如下方式加载模型。
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(
model_dir,
use_cache=False,
device_map="cuda:0",
torch_dtype=torch.bfloat16,
quantization_config=quantization_config)
网上搜了若干方法,依旧报错,信息大致如下:
RuntimeError:
:
python -m bitsandbytes
Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them to your LD_LIBRARY_PATH. ...
attributeerror: module 'bitsandbytes.nn' has no attribute 'linear4bit'
the `load_in_4bit` and `load_in_8bit` arguments are deprecated
and will be removed in the future versions.
please, pass a `bitsandbytesconfig` object in `quantization_config` argument instead.
attributeerror: 'nonetype' object has no attribute 'cquantize_blockwise_bf16_nf4'
the installed version of bitsandbytes was compiled without gpu support. 8-bit optimizers, 8-bit multiplication, and gpu quantization are unavailable.
终极解决办法:
pip uninstall bitsandbytes
pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.0-py3-none-win_amd64.whl
关于其他版本可自行查看下载:
https://github.com/jllllll/bitsandbytes-windows-webui/releases/