NVIDIA logo

独显核显共存

多数主板的默认设置中,如果监测到独立显卡就会自动屏蔽集成显卡,下列操作会让系统在检测到独显的同时使用核显。

  1. 开机后按 Del 键进入 BIOS -- Advanced -- System Agent (SA) Configuration(北桥)

  2. 进入Graphics Configuration(显示设置)

  3. 找到集成显卡相关的设置(如 iGPU、iGPUx 等字样),设置为启用(默认为 auto ),Primary Display(首选显卡)应设置为核显,将独显仅用作计算卡

驱动安装

飞牛应用商店安装

飞牛的软件商店中提供了 NVIDIA 显卡的驱动,但版本只是 560.28 (截至2025/4/30)。

驱动卸载

sudo apt purge nvidia-*  # 删除所有NVIDIA驱动包及配置文件
sudo apt autoremove      # 清理不再需要的依赖项
sudo /usr/bin/nvidia-uninstall  # 路径可能因安装方式不同而变化
sudo rmmod nvidia_drm nvidia_modeset nvidia  # 强制移除已加载的模块
sudo rm -f /etc/X11/xorg.conf              # Xorg主配置文件
sudo rm -rf /etc/X11/xorg.conf.d/10-nvidia*.conf  # 相关子配置
sudo rm /lib/modprobe.d/nvidia-installer-disable-nouveau.conf  # Nouveau黑名单
sudo update-initramfs -u                   # 更新initramfs以应用变更
# 防止系统自动加载开源Nouveau驱动
sudo echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
sudo echo "options nouveau modeset=0" >> /etc/modprobe.d/blacklist.conf
sudo update-initramfs -u

重启系统并验证 reboot

检查内核模块状态

lsmod | grep nvidia  # 应无输出

检查 APT 包列表

apt list --installed | grep nvidia  # 应无相关包

检查 nvidia-smi 命令应无输出

Debug:依赖错误且安装该依赖时无法覆盖文件

The following packages have unmet dependencies: libegl-nvidia0 :
Depends: libnvidia-egl-wayland1 but it is not going to be installed nvidia-egl-icd :
Depends: nvidia-egl-common but it is not going to be installed nvidia-kernel-support :
Depends: nvidia-modprobe (>= 535) but it is not going to be installed
E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).

强制覆盖安装,安装写入后继续卸载即可:

sudo dpkg -i --force-overwrite /var/cache/apt/archives/nvidia-egl-common_535.216.01-1~deb12u1_amd64.deb \
/var/cache/apt/archives/libnvidia-egl-wayland1_1%3a1.1.10-1_amd64.deb \
/var/cache/apt/archives/nvidia-modprobe_535.161.07-1~deb12u1_amd64.deb

run 手动安装

  1. 安装必要依赖(安装文档位于驱动下载页面的下方)

apt-get update   #更新软件列表
apt-get install -y gcc
  1. 驱动下载

    1. 前往 NVIDIA 官网下载驱动:https://www.nvidia.cn/geforce/drivers/,选择 Linux 系统的生产分支版本

    2. 将 .run 文件上传到 飞牛OS 上。

  2. 安装驱动

chmod +x ./NVIDIA-Linux-x86_64-*.run
./NVIDIA-Linux-x86_64-*.run

注意:

  • 应选择安装专有驱动(NVIDIA Proprietary)而不是开源驱动(MIT/GPL)以实现全面的 CUDA 功能

  • 安装时需要编译内核组件

  • 安装过程可能提示无法生成 32 位配置文件,可忽略

  • 不需要生成 Xorg 配置文件,因为飞牛目前没有本地图形化界面

Debug:无法找到内核资源文件

ERROR: Unable to find the kernel source tree for the currently running kernel. Please make sure you have installed the kernel source files for your kernel and that they are properly configured; on Red Hat Linux systems, for example, be sure you have the 'kernel-source' or 'kernel-devel' RPM installed. If you know the correct kernel source files are installed, you may specify the kernel source path with the '--kernel-source-path' command line option.

apt install linux-headers-$(uname -r)  #安装匹配当前内核的headers
ls -l /lib/modules/$(uname -r)/build  #​​检查头文件路径,正常情况应指向/usr/src/linux-headers-$(uname -r)符号链接

#手动修复符号链接(如路径异常)
#sudo ln -sf /usr/src/linux-headers-$(uname -r) /lib/modules/$(uname -r)/build 

安装 NVIDIA Container Toolkit

NVIDIA Container Toolkit 可为容器提供 NVIDIA 驱动

文档:https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

步骤:

  1. 安装相关依赖

sudo apt-get update && sudo apt-get install -y --no-install-recommends curl gnupg2
  1. 配置软件仓库(国内使用 USTC 镜像)

curl -fsSL https://mirrors.ustc.edu.cn/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://mirrors.ustc.edu.cn/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://nvidia.github.io#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://mirrors.ustc.edu.cn#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

若已经使用官方源,可运行 sed -i 's#nvidia.github.io#mirrors.ustc.edu.cn#g' /etc/apt/sources.list.d/nvidia-container-toolkit.list 进行替换

  1. 更新仓库列表

apt-get update
  1. 安装 Container Toolkit(以下代码为安装 1.18.0-1 版本)

export NVIDIA_CONTAINER_TOOLKIT_VERSION=1.18.0-1
  sudo apt-get install -y \
      nvidia-container-toolkit=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
      nvidia-container-toolkit-base=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
      libnvidia-container-tools=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
      libnvidia-container1=${NVIDIA_CONTAINER_TOOLKIT_VERSION}
  1. 配置 Docker

sudo nvidia-ctk runtime configure --runtime=docker
  1. 重启 Docker

sudo systemctl restart docker

Debug:Failed to trigger CDI refresh

表明 NVIDIA CDI (Container Device Interface) 服务启动失败。

# 停止相关服务
sudo systemctl stop nvidia-cdi-refresh.service

# 手动生成 CDI 配置
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

# 重新启动服务
sudo systemctl start nvidia-cdi-refresh.service

# 重新配置 nvidia-container-toolkit
sudo nvidia-ctk config --config /etc/nvidia-container-runtime/config.toml

# 重启相关服务
sudo systemctl daemon-reload
sudo systemctl restart nvidia-container-toolkit.service
  • 验证安装

# 测试基本功能
nvidia-ctk --version

持续模式 Persistence Mode

官方文档:https://docs.nvidia.com/deploy/driver-persistence/index.html

验证持久化模式:nvidia-smi -q | grep -i persistence

开启

  1. 创建 systemd 服务文件 sudo nano /etc/systemd/system/nvidia-persistenced.service

[Unit]
Description=NVIDIA Persistence Daemon
Documentation=https://docs.nvidia.com/deploy/driver-persistence/
Wants=syslog.target

[Service]
Type=forking
ExecStart=/usr/bin/nvidia-persistenced --user nvidia-persistenced
ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced
PIDFile=/var/run/nvidia-persistenced/nvidia-persistenced.pid

[Install]
WantedBy=multi-user.target
  1. 创建专用用户(避免使用 root 运行守护进程)

sudo useradd -r -s /sbin/nologin nvidia-persistenced
  1. 创建必要的目录和权限

# 创建运行目录
sudo mkdir -p /var/run/nvidia-persistenced
sudo chown nvidia-persistenced:nvidia-persistenced /var/run/nvidia-persistenced
  1. 重新加载并启用服务

# 重新加载systemd配置
sudo systemctl daemon-reload

# 启用开机自启动
sudo systemctl enable nvidia-persistenced

# 启动服务
sudo systemctl start nvidia-persistenced
  1. 验证服务状态

# 检查服务状态
sudo systemctl status nvidia-persistenced

# 查看日志
sudo journalctl -u nvidia-persistenced -f

# 验证持久化模式
nvidia-smi -q | grep -i persistence

# 确认PID文件正确创建
ls -la /var/run/nvidia-persistenced/

关闭

关闭持续模式,以及关闭持续模式启用脚本的开机自启(Linux 下默认关闭持续模式)。

nvidia-smi -pm 0
systemctl disable nvidia-persistenced