top of page
server-parts.eu

server-parts.eu Blog

NVIDIA H100 PCIe, SXM5 & NVL GPU Comparison: What Is the Difference?

  • 5 hours ago
  • 10 min read

The NVIDIA H100 GPU comes in different form factors — mainly PCIe, SXM5, and the special H100 NVL version — and choosing the wrong one is an expensive mistake.


NVIDIA H100 GPUs

In Stock: PCIe, SXM5 & NVL GPUs



This guide compares H100 PCIe, H100 SXM5, and H100 NVL GPUs technically, explains the bandwidth architecture, covers real server options, and helps you decide which configuration fits your infrastructure and workload.


NVIDIA H100 PCIe vs SXM5 GPU comparison for AI servers, HGX H100 systems, LLM training, inference workloads, HPC clusters, NVLink, NVSwitch, HBM3 memory and enterprise GPU infrastructure by server-parts.eu


Comparison: NVIDIA H100 PCIe, SXM5 & NVL GPUs


Quick comparison

Variant

Best for

VRAM

Power

H100 PCIe 80GB

Flexible AI servers

Inference, smaller fine-tuning

80GB

350–400W

H100 SXM5 80GB

HGX H100, large training, HPC

80GB

Up to 700W

H100 NVL 94GB

LLM inference, memory-heavy inference

94GB per GPU

350–400W


Technical differences

Variant

Memory type

Memory bandwidth

GPU-to-GPU bandwidth

H100 PCIe 80GB

HBM2e

~2.0 TB/s

PCIe Gen5 / optional NVLink bridge

H100 SXM5 80GB

HBM3

~3.35 TB/s

900 GB/s via NVLink / NVSwitch

H100 NVL 94GB

HBM3

~3.9 TB/s

600 GB/s between paired GPUs



Decision Shortcut: NVIDIA H100 PCIe, SXM5 & NVL GPUs


If you only need one thing from this article:
  • 1 GPU → H100 PCIe

  • 2 GPUs, LLM inference or memory-heavy inference → H100 NVL

  • 2 GPUs, general inference or smaller fine-tuning → H100 PCIe

  • 4–8 GPUs, large model training → H100 SXM5 / HGX H100

  • Large AI clusters, foundation models, HPC → H100 SXM5 with InfiniBand networking


Everything else in this article is the technical reasoning behind that table.



What Is the Difference Between PCIe, SXM5 & NVL NVIDIA H100 GPUs?


NVIDIA H100 PCIe uses a standard PCIe Gen5 x16 interface, so it works in many enterprise GPU servers and is a flexible choice for inference, smaller training jobs, and standard server deployments. Its main limitation is GPU-to-GPU communication: PCIe Gen5 is fast, but it cannot match a full HGX system with NVLink and NVSwitch.


NVIDIA H100 NVL is a special PCIe-based H100 version built mainly for LLM inference. It offers 94GB memory per GPU and up to 600 GB/s NVLink bandwidth between two GPUs, making it a strong choice for dual-GPU inference without a full HGX platform.


NVIDIA H100 SXM5 does not use a standard PCIe slot. It runs on an HGX H100 platform with NVLink and NVSwitch, giving 4-GPU and 8-GPU servers much stronger GPU-to-GPU communication.


The simple difference:

Form factor

Best for

GPU-to-GPU bandwidth

H100 PCIe

Flexibility, inference, smaller workloads

PCIe Gen5 fabric, optional NVLink bridge for 2 GPUs

H100 NVL

LLM inference, memory-heavy inference, dual-GPU setups

Up to 600 GB/s between paired GPUs

H100 SXM5

Multi-GPU training, HPC, large models

900 GB/s via NVLink / NVSwitch

PCIe, NVL, and SXM5 are different H100 configurations. SXM5 needs a complete HGX H100 system, while NVL is a PCIe-based dual-GPU inference solution, not a replacement for HGX H100.



What Is the Difference Between H100 PCIe, H100 SXM5, and H100 NVL GPUs?


Variant

Main purpose

Memory

H100 PCIe 80GB

Standard enterprise GPU servers

80GB

H100 SXM5 80GB

HGX training systems

80GB

H100 NVL 94GB

LLM inference, usually dual-GPU NVLink pair

94GB per GPU

  • The H100 PCIe is usually the flexible option.

  • The H100 SXM5 is usually the maximum-performance training option.

  • The H100 NVL is a special version mainly designed for large language model inference, where more memory per GPU and a strong two-GPU NVLink connection can help.


The H100 SXM5 also has much higher memory bandwidth than H100 PCIe: around 3.35 TB/s vs around 2.0 TB/s. The H100 NVL goes even higher at around 3.9 TB/s.



NVIDIA H100 PCIe, SXM5 & NVL GPUs: Bandwidth Architecture in Detail


Connection

Bandwidth

Topology

H100 SXM5 NVLink 4

900 GB/s per GPU

HGX H100 with NVSwitch

H100 PCIe Gen5 x16

128 GB/s

PCIe fabric

H100 PCIe with NVLink bridge

Up to 600 GB/s between 2 GPUs

Two-GPU bridge only

H100 SXM5 memory bandwidth

~3.35 TB/s

HBM3

H100 PCIe memory bandwidth

~2.0 TB/s

HBM2e

H100 NVL memory bandwidth

~3.9 TB/s

HBM3

The SXM5 gives you much stronger GPU-to-GPU communication.


That matters when several GPUs must work together as one system - for example:

  • large language model training

  • tensor parallelism

  • model parallel workloads

  • large HPC simulations

  • multi-GPU scientific computing

  • workloads where GPUs exchange data constantly


For inference, smaller fine-tuning jobs, computer vision, recommendation systems, and single-GPU workloads, H100 PCIe can still be very strong.



NVIDIA H100 PCIe, SXM5 & NVL GPUs: Full Technical Specifications


Architecture
  • All variants: NVIDIA Hopper architecture

  • GPU: GH100

  • Manufacturing process: TSMC 4N

  • Tensor Cores: 4th generation

  • Transformer Engine: Yes

  • FP8 support: Yes

  • Confidential Computing: Yes


CUDA cores

Variant

CUDA cores

H100 PCIe

14,592

H100 SXM5

16,896

H100 NVL

16,896

H100 NVL is based on the higher-performance H100 configuration and is mainly designed for LLM inference in dual-GPU systems.


Tensor Core performance

Precision

H100 PCIe

H100 SXM5

H100 NVL

FP64

26 TFLOPS

34 TFLOPS

34 TFLOPS

FP64 Tensor Core

51 TFLOPS

67 TFLOPS

67 TFLOPS

FP32

51 TFLOPS

67 TFLOPS

67 TFLOPS

TF32 Tensor Core

756 TFLOPS

989 TFLOPS

989 TFLOPS

FP16 / BF16 Tensor Core

1,513 TFLOPS

1,979 TFLOPS

1,979 TFLOPS

FP8 Tensor Core

3,026 TFLOPS

3,958 TFLOPS

3,958 TFLOPS


VRAM and memory bandwidth

Variant

VRAM

Memory type

Memory bandwidth

H100 PCIe

80GB

HBM2e

~2.0 TB/s

H100 SXM5

80GB

HBM3

~3.35 TB/s

H100 NVL

94GB per GPU

HBM3

~3.9 TB/s


GPU-to-GPU interconnect

Variant

Interconnect

H100 PCIe

PCIe Gen5 x16, 128 GB/s

H100 PCIe with NVLink bridge

Up to 600 GB/s between two GPUs

H100 NVL

Up to 600 GB/s between paired GPUs

H100 SXM5

Fourth-generation NVLink, 900 GB/s per GPU

HGX H100 8-GPU systems

NVLink + NVSwitch topology


Power draw

Variant

Power draw

H100 PCIe

350W default / up to 400W configurable

H100 SXM5

Up to 700W configurable

H100 NVL

350–400W configurable


Server compatibility

Variant

Server compatibility

H100 PCIe

Enterprise GPU servers with PCIe Gen5 support

H100 SXM5

HGX H100 baseboard only

H100 NVL

Special PCIe-based dual-GPU configuration with NVLink


MIG support
  • H100 PCIe supports Multi-Instance GPU

  • H100 SXM5 supports Multi-Instance GPU

  • H100 NVL supports Multi-Instance GPU

  • Up to 7 isolated GPU instances per GPU


H100 PCIe is the flexible enterprise server option. H100 SXM5 is the HGX version for large training and HPC. H100 NVL is the special LLM inference version with 94GB memory per GPU and strong dual-GPU NVLink bandwidth.



NVIDIA H100 PCIe, SXM5 & NVL GPUs: Scale-Up vs Scale-Out


SXM5 is better for scale-up.

This means fewer servers, but each server has tightly connected GPUs through NVLink and NVSwitch. This is important when one large model must be split across several GPUs.


PCIe is better for scale-out.

This means more standard servers, usually connected with InfiniBand or Ethernet. It is often more flexible and easier to deploy in existing data centers.


H100 NVL sits between these two options.

It is PCIe-based, but built for dual-GPU LLM inference, with more memory than standard H100 PCIe and fast NVLink between two GPUs. It is not the same as a 4-GPU or 8-GPU HGX H100 SXM5 system.


Strategy

Best option

Best for

Scale-up

H100 SXM5

Large models, tensor parallelism, heavy GPU-to-GPU communication

Scale-out

H100 PCIe

Inference, smaller training jobs, flexible server deployments

Dual-GPU LLM inference

H100 NVL

Large inference workloads, high memory per GPU

Flexible enterprise AI

H100 PCIe

Standard data center servers, easier deployment

Maximum multi-GPU performance

H100 SXM5 / HGX H100

4-GPU and 8-GPU training systems


Choose H100 SXM5 when GPUs inside one server must work very closely together.

Choose H100 PCIe when you want flexible standard servers.

Choose H100 NVL when you need a strong two-GPU setup for LLM inference, especially when GPU memory is important.



Multi-Instance GPU: NVIDIA H100 PCIe, SXM5 & NVL GPUs


MIG is available on NVIDIA H100 GPUs. It allows one physical GPU to be split into smaller isolated GPU instances.


This is useful when several users, teams, or workloads need access to GPU resources without interfering with each other.


For example, one H100 can be used for:

MIG setup

Use case

7 small GPU instances

Many small inference workloads

3 medium GPU instances

Medium inference workloads

1 full GPU

One large training or inference workload

This makes H100 very useful for:

  • AI cloud providers

  • Kubernetes GPU clusters

  • internal enterprise AI platforms

  • research teams

  • multi-tenant inference

  • GPU-as-a-Service environments


MIG is especially important when you do not want every user to reserve a full H100 GPU.



NVIDIA H100 PCIe, SXM5 & NVL GPU Workloads


NVIDIA H100 PCIe 80GB

H100 PCIe is a strong choice for inference, fine-tuning, data analytics, and smaller training jobs where standard server architecture is enough.


Good fit for:

  • single-GPU inference

  • 2–4 GPU inference servers

  • smaller LLM fine-tuning

  • enterprise AI pilots

  • multi-tenant GPU platforms

  • Kubernetes-based GPU clusters

  • companies that want H100 performance without HGX complexity


NVIDIA H100 SXM5 80GB

H100 SXM5 is the stronger option for heavy multi-GPU work. It is built for HGX H100 systems with 4 or 8 GPUs, NVLink, and NVSwitch.


Good fit for:

  • large language model training

  • large-scale fine-tuning

  • HPC workloads

  • simulation

  • scientific computing

  • AI research clusters

  • foundation model development

  • workloads where GPU-to-GPU bandwidth matters


This is the version customers usually need when the GPUs must act like one tightly connected system.


NVIDIA H100 NVL 94GB

H100 NVL is a special H100 version mainly designed for LLM inference. It uses two H100 PCIe-style GPUs connected with NVLink and gives 94GB memory per GPU.


Good fit for:

  • LLM inference

  • large model serving

  • high batch-size inference

  • dual-GPU deployments

  • customers who need more memory than standard H100 PCIe


It is not the same as an 8-GPU HGX H100 system. It is more of a focused two-GPU inference solution.



NVIDIA H100 Servers: PCIe, SXM5 & NVL GPU Models


These are the server models most commonly used with NVIDIA H100 GPUs, both from new inventory and the secondary market.


SXM5 / HGX H100 servers — for multi-GPU training and HPC:
  • NVIDIA DGX H100 — NVIDIA’s own 8-GPU H100 system with HGX H100, NVLink, NVSwitch, and high-speed networking.

  • Dell PowerEdge XE9680 — 6U enterprise AI server, commonly used with 8× H100 SXM5 GPUs.

  • Dell PowerEdge XE8640 — 4U platform for 4× H100 SXM GPUs, useful when 8 GPUs are not required.

  • Supermicro HGX H100 systems — common in GPU cloud, AI infrastructure, and research environments, usually in 4-GPU or 8-GPU HGX configurations.

  • Lenovo ThinkSystem SR675 V3 / SR680a V3 — Lenovo GPU platforms for AI and HPC workloads, depending on the exact configuration.

  • HPE Cray XD / Apollo-style GPU systems — high-density GPU platforms for enterprise AI, HPC, and research clusters.


PCIe H100 servers — for inference and flexible deployments:
  • Dell PowerEdge R760xa — 2U enterprise GPU server, often used with up to 4× double-wide PCIe GPUs.

  • Dell PowerEdge R760 / R7625 GPU configurations — standard rack servers for smaller PCIe GPU setups.

  • Supermicro PCIe GPU servers — common for GPU cloud, AI labs, and secondary market H100 PCIe systems.

  • Lenovo ThinkSystem SR675 V3 — flexible PCIe GPU platform for AI, HPC, and visualization workloads.

  • HPE ProLiant DL380 / DL385 GPU configurations — useful when the customer already uses HPE infrastructure.


H100 NVL servers — for LLM inference:
  • H100 NVL systems — PCIe-based dual-GPU configurations with NVLink between two GPUs. Best for LLM inference where high GPU memory and strong two-GPU bandwidth matter.


Always check the exact H100 configuration, including GPU form factor, PCIe/SXM5/NVL compatibility, power and cooling, risers, GPU enablement kit, firmware, network cards, rack power capacity, warranty, and testing.


Common Mistakes When Buying NVIDIA H100 PCIe, SXM5 & NVL GPUs


Buying SXM5 without HGX infrastructure

H100 SXM5 modules need a compatible HGX H100 platform. They cannot be installed in a normal PCIe server.


Assuming PCIe and SXM5 perform the same

For single-GPU workloads, the difference may be smaller. For heavy multi-GPU training, SXM5 is much stronger because of NVLink and NVSwitch.


Ignoring power and cooling

H100 SXM5 systems need serious power and cooling planning, especially 4-GPU and 8-GPU HGX servers. Always check rack power, PDU capacity, airflow, cooling, power cables, and redundant PSUs.


Buying H100 PCIe when you really need HGX

H100 PCIe is excellent for many workloads, but if your model needs strong GPU-to-GPU communication across many GPUs, HGX H100 with SXM5 is usually the better architecture.


Buying HGX H100 when PCIe would be enough

Some workloads do not need SXM5. For inference, smaller fine-tuning, or flexible enterprise AI deployments, H100 PCIe can be easier, cheaper, and more practical.


Confusing H100 NVL with H100 SXM5

H100 NVL is mainly a dual-GPU LLM inference solution with 94GB memory per GPU. H100 SXM5 is for 4-GPU and 8-GPU HGX systems with stronger scale-up architecture.


Forgetting about networking

For serious AI clusters, the GPU is only one part of the system. You also need the right NICs, InfiniBand or Ethernet speed, switches, cables, topology, firmware, and tested ports. A weak network design can limit the value of expensive H100 servers.



Summary: NVIDIA H100 PCIe, SXM5 & NVL GPUs


H100 PCIe 80GB

H100 SXM5 80GB

H100 NVL 94GB

Best for

Inference, fine-tuning, flexible servers

Training, HPC, large models

LLM inference

VRAM

80GB

80GB

94GB per GPU

Memory type

HBM2e

HBM3

HBM3

Memory BW

~2.0 TB/s

~3.35 TB/s

~3.9 TB/s

GPU-to-GPU BW

PCIe Gen5 / optional 2-GPU NVLink

900 GB/s NVLink / NVSwitch

600 GB/s between paired GPUs

Topology

PCIe fabric

HGX H100

Dual-GPU NVLink pair

Infrastructure

Standard GPU server

HGX H100 server

Special NVL server config

TDP

350–400W

Up to 700W

350–400W

MIG

Yes

Yes

Yes

Scale strategy

Scale-out

Scale-up

Dual-GPU inference

We have NVIDIA H100 GPUs and complete H100 servers available across PCIe, SXM5, and NVL configurations. Tell us what you are building and we will send you a recommended configuration with pricing → Get your configuration.


FAQ: NVIDIA H100 PCIe, SXM5 & NVL GPUs


Can I install an H100 SXM5 in a standard server?

No. H100 SXM5 modules require a compatible HGX H100 system. They cannot be installed in a regular PCIe slot.


Is H100 PCIe much slower than H100 SXM5?

For single-GPU inference, the difference may be smaller. For multi-GPU training, SXM5 can be much faster because of NVLink and NVSwitch bandwidth.


Does H100 PCIe support NVLink?

Yes, H100 PCIe can support NVLink bridge between two GPUs in compatible systems. This is not the same as HGX H100 SXM5 with NVSwitch.


What is the main difference between H100 PCIe and H100 SXM5?

H100 PCIe is for flexible standard servers. H100 SXM5 is for HGX systems where multiple GPUs need very fast communication.


What is H100 NVL?

H100 NVL is a special H100 version mainly designed for large language model inference. It offers 94GB memory per GPU and uses NVLink between two GPUs.


Which H100 is best for LLM training?

For serious LLM training, H100 SXM5 in an HGX H100 server is usually the best option because of NVLink, NVSwitch, and high memory bandwidth.


Which H100 is best for LLM inference?

For many inference workloads, H100 PCIe is enough. For larger LLM inference workloads, H100 NVL can be a very strong option because it has more memory per GPU.


How many H100 GPUs do I need?

It depends on model size, precision, batch size, and whether you are training or only serving inference. For small inference, one H100 may be enough. For large training, you may need 4, 8, or many more GPUs across a cluster.


Should I buy H100 PCIe, NVL or SXM5?

Choose PCIe if you want flexibility, standard servers, and good inference performance. Choose SXM5 if you need maximum multi-GPU performance for large training or HPC workloads.



NVIDIA H100 GPUs

In Stock: PCIe, SXM5 & NVL GPUs



Sources: NVIDIA H100 PCIe, SXM5 & NVL GPUs


NVIDIA H100 Tensor Core GPU official product page:


NVIDIA H100 PCIe GPU product brief:


NVIDIA HGX H100 / HGX H200 official datasheet:


NVIDIA DGX H100 / H200 system documentation:


NVIDIA H100 NVL GPU product brief:

bottom of page