top of page
server-parts.eu

server-parts.eu Blog

NVIDIA A100 PCIe vs SXM4 Comparison: What Is the Difference?

  • May 10
  • 10 min read

The NVIDIA A100 comes in four variants — PCIe 40GB, PCIe 80GB, SXM4 40GB, and SXM4 80GB — and choosing the wrong one is an expensive mistake.


NVIDIA A100 GPUs

In Stock: PCIe & SXM4 - 40GB & 80GB



This guide compares every A100 variant technically, explains the bandwidth architecture accurately, covers real server options, and helps you decide which configuration fits your infrastructure and workload.


NVIDIA A100 GPU server — PCIe and SXM4 40GB and 80GB configurations refurbished server-parts.eu used


Comparison: NVIDIA A100 GPUs PCIe VS SXM4

Variant

VRAM

Memory type

Memory BW

GPU-to-GPU BW

TDP

A100 PCIe 40 GB

40 GB

HBM2

~1.55 TB/s

~64 GB/s (PCIe fabric)

250W

A100 PCIe 80 GB

80 GB

HBM2e

~2.0 TB/s

~64 GB/s (PCIe fabric)

300W

A100 SXM4 40 GB

40 GB

HBM2

~1.6 TB/s

600 GB/s (NVLink 3.0)

400W

A100 SXM4 80 GB

80 GB

HBM2e

~2.0 TB/s

600 GB/s (NVLink 3.0)

400W



Decision Shortcut: NVIDIA A100 GPUs PCIe VS SXM4


If you only need one thing from this article:

  • 1 GPU → PCIe, any form factor

  • 2–4 GPUs, inference or light fine-tuning → PCIe is usually sufficient

  • 4–8 GPUs, large model training → SXM4 80 GB

  • Scaling to large clusters, foundation models → SXM4 80 GB, InfiniBand networking


Everything else in this article is the technical reasoning behind that table.



What Is the Difference Between PCIe & SXM4 (NVIDIA A100)?


PCIe A100 uses a standard PCIe Gen4 x16 slot, so it works in many regular GPU servers. It is flexible, easier to deploy, and usually the better choice for single-GPU workloads, inference, and smaller training jobs. The limitation is GPU-to-GPU bandwidth. PCIe offers around 64 GB/s theoretical bidirectional bandwidth per x16 link, and this bandwidth is shared across the PCIe fabric. As you add more GPUs, communication can become a bottleneck.


SXM4 A100 is different. It does not plug into a normal PCIe slot. It mounts on an NVIDIA HGX A100 baseboard, where the GPUs are connected through NVLink and NVSwitch. This gives each GPU up to 600 GB/s GPU-to-GPU bandwidth, with a much stronger topology for 4-GPU and 8-GPU systems.


The simple difference:

Form factor

Best for

GPU-to-GPU bandwidth

PCIe A100

Flexibility, inference, smaller workloads

~64 GB/s shared PCIe fabric

SXM4 A100

Multi-GPU training, HPC, large models

600 GB/s via NVLink/NVSwitch

One important point: PCIe and SXM4 are not interchangeable. An SXM4 A100 cannot be installed in a normal PCIe server. You need a complete HGX A100 system.



What Is the Difference Between NVIDIA A100 40GB & 80GB (PCIe VS SXM4)


The difference is not only memory size. The A100 40GB uses HBM2, while the A100 80GB uses HBM2e with a wider memory bus.

Variant

Memory type

Memory bandwidth

Bus width

A100 40GB

HBM2

~1.55–1.6 TB/s

4,096-bit

A100 80GB

HBM2e

~2.0 TB/s

5,120-bit

This means the 80GB version gives you more VRAM and higher bandwidth. That matters for large AI models, fine-tuning, large batch inference, and memory-heavy HPC workloads.


Choose A100 40GB when the model fits comfortably, for example smaller training jobs, quantised inference, or budget-focused deployments.


Choose A100 80GB for larger models, heavier fine-tuning, multi-tenant MIG use, and workloads where memory bandwidth is important.


One important point: 40GB and 80GB A100 modules are not simple memory upgrades. They use different memory layouts, so you cannot upgrade a 40GB A100 to 80GB later.



NVIDIA A100 PCIe vs SXM4: Bandwidth Architecture in Detail

Connection

Bandwidth

Topology

A100 SXM4 NVLink 3.0 (GPU-to-GPU)

600 GB/s per GPU

Full mesh via NVSwitch — all 8 GPUs simultaneously

A100 PCIe Gen4 x16 (GPU-to-GPU)

~64 GB/s per link

Shared PCIe fabric — bandwidth split across all GPUs

A100 SXM4 memory bandwidth (80 GB)

~2.0 TB/s

HBM2e

A100 PCIe memory bandwidth (80 GB)

~2.0 TB/s

HBM2e

A100 PCIe memory bandwidth (40 GB)

~1.55 TB/s

HBM2

The 600 GB/s NVLink bandwidth on SXM4 is dedicated per GPU and works across all GPUs in an HGX system. PCIe bandwidth is much lower and shared across the PCIe fabric, so performance can drop when many GPUs communicate at the same time.


For large language model training and other communication-heavy workloads, SXM4 can be much faster than PCIe. For inference, computer vision, or smaller workloads, the gap is usually smaller and PCIe can still perform very well.



NVIDIA A100 PCIe & SXM4 GPUs: Full Technical Specifications


Architecture
  • All variants: Ampere, GA100 die, TSMC 7nm, 54.2 billion transistors


CUDA cores
  • All variants: 6,912


Tensor Core performance (FP16)
  • All variants: 312 TFLOPS (dense)


TF32 Tensor Core performance
  • All variants: 156 TFLOPS (dense)


FP64 performance:
  • All variants: 19.5 TFLOPS — identical across all four variants. The A100 remains competitive with H100 and H200 on double-precision HPC workloads.


VRAM and memory bandwidth:

Variant

VRAM

Type

Bandwidth

Bus width

PCIe 40 GB

40 GB

HBM2

~1.55 TB/s

4,096-bit

PCIe 80 GB

80 GB

HBM2e

~2.0 TB/s

5,120-bit

SXM4 40 GB

40 GB

HBM2

~1.6 TB/s

4,096-bit

SXM4 80 GB

80 GB

HBM2e

~2.0 TB/s

5,120-bit


GPU-to-GPU interconnect:
  • PCIe variants: PCIe Gen4 fabric, ~64 GB/s per link, shared topology

  • SXM4 variants: NVLink 3.0, 600 GB/s per GPU, full mesh via NVSwitch


Power draw
  • NVIDIA A100 PCIe 40 GB: 250W

  • NVIDIA A100 PCIe 80 GB: 300W

  • NVIDIA A100 SXM4 40 GB: 400W

  • NVIDIA A100 SXM4 80 GB: 400W


Server compatibility
  • PCIe variants: Any PCIe Gen4 x16 server

  • SXM4 variants: HGX A100 baseboard only


MIG support:
  • All variants: Yes — up to 7 isolated instances per GPU



NVIDIA A100 PCIe & SXM4 GPUs: Scale-Up vs Scale-Out:


SXM4 is better for scale-up: This means fewer servers, but each server has tightly connected GPUs through NVLink and NVSwitch. This is important when one large model must be split across several GPUs.


PCIe is better for scale-out: This means more standard servers, usually connected with InfiniBand or Ethernet. It is often cheaper, more flexible, and a good fit for inference, data parallel training, and multi-tenant workloads.

Strategy

Best option

Best for

Scale-up

SXM4

Large models, tensor parallelism, heavy GPU-to-GPU communication

Scale-out

PCIe

Inference, smaller training jobs, flexible server deployments

  • Choose SXM4 when GPUs inside one server must work very closely together.

  • Choose PCIe when you want flexible, standard servers that can scale across many nodes.



Multi-Instance GPU (MIG): NVIDIA A100 PCIe & SXM4 GPUs


MIG is available on all NVIDIA A100 variants. It allows one physical A100 GPU to be split into up to 7 isolated GPU instances. Each instance gets its own dedicated memory and compute resources, so different users or workloads can run on the same GPU without interfering with each other.


For example, an A100 80GB can be split into:

MIG setup

Use case

7× ~10GB instances

Many small inference workloads

3× ~20GB instances

Medium inference workloads

1× full 80GB GPU

One large workload

This makes the A100 especially useful for inference, Kubernetes GPU clusters, research teams, and multi-tenant environments.



NVIDIA A100 PCIe & SXM4 GPUs Workloads


NVIDIA A100 PCIe 40 GB:

Single-GPU inference, model evaluation, smaller training runs up to ~7B parameters in FP16, and deployments where standard PCIe server compatibility matters more than peak performance.


NVIDIA A100 PCIe 80 GB:

Inference on larger models up to ~40B in FP16 on a single GPU, fine-tuning up to ~13B parameters, multi-tenant inference with MIG, and organisations that need 80 GB VRAM without HGX infrastructure.


NVIDIA A100 SXM4 40 GB:

Multi-GPU training where inter-GPU bandwidth is the bottleneck and 40 GB VRAM per GPU is sufficient. A cost-effective option for cluster builds where the model fits within 40 GB per card.


NVIDIA A100 SXM4 80 GB:

The most capable A100 configuration. Multi-GPU training up to ~65B parameters, HPC and scientific computing requiring FP64, and large-scale inference clusters. The configuration we sell most often for serious training and HPC workloads.



NVIDIA A100 Servers: PCIe & SXM4 Models


These are the server models most commonly deployed with NVIDIA A100 GPUs — both from new inventory and secondary market.


SXM4 / HGX A100 servers — for multi-GPU training and HPC:

  • NVIDIA DGX A100 — NVIDIA's own reference system. 8× A100 SXM4 80 GB, 640 GB total VRAM, fully interconnected via NVSwitch. The original A100 training benchmark and the platform all others are measured against.


  • Dell PowerEdge XE8545 — 4U, dual AMD EPYC 7003 (Milan), 4× A100 SXM4 GPUs (40 GB or 80 GB) on an NVLink board. Up to 2 TB DDR4 across 32 DIMMs, 10× 2.5" NVMe bays, air-cooled, standard rack depth. Peak power draw at full GPU load approximately 2,877W. Well-supported through Dell OpenManage and iDRAC 9.


  • Supermicro SYS-420GP-TNAR — 4U, dual Intel Xeon Platinum, 8× A100 SXM4 80 GB on HGX A100 baseboard. Up to 8 TB DDR4 across 32 DIMMs, redundant 3,000W PSUs. One of the most widely deployed 8-GPU A100 servers in GPU cloud infrastructure and research clusters.


  • HPE Apollo 6500 Gen10 Plus (XL675d node) — 6U chassis, dual AMD EPYC, up to 8× A100 SXM4 GPUs via HGX A100 NVLink baseboard, or up to 10× double-wide PCIe GPUs. Up to 4 TB DDR4. One of the few A100 platforms offering both air cooling and integrated direct liquid cooling (DLC). Supports InfiniBand, HPE Slingshot, and high-speed Ethernet.


  • Lenovo ThinkSystem SR670 V2 — 3U, dual Intel Xeon Scalable (Ice Lake), 4× A100 SXM4 GPUs via HGX A100 NVLink. Lenovo Neptune liquid-to-air hybrid cooling on SXM4 configurations. Also configurable with up to 8× double-wide A100 PCIe. Withdrawn from new Lenovo orders but widely available on the secondary market.


  • GIGABYTE G492-ZD0 / G492-ZD2 — 4U, dual AMD EPYC 7002/7003, 8× A100 SXM4 40 GB or 80 GB on HGX A100 baseboard. Up to 8 TB DDR4, redundant 3,000W PSUs, air-cooled. The G492-ZD2 adds additional PCIe expansion slots versus the ZD0. Popular in hyperscale and research deployments.



PCIe A100 servers — for inference, fine-tuning, and scale-out deployments:

  • Dell PowerEdge R750xa — 2U, up to 4× A100 PCIe GPUs. Air-cooled, standard rack depth, full Dell OpenManage and iDRAC9 support. The most flexible A100 PCIe platform for organisations without HGX infrastructure.


  • Dell PowerEdge R7525 — 2U, up to 2× A100 PCIe GPUs, dual AMD EPYC. Used in HPC environments for single or dual-GPU A100 workloads. Entry point for organisations scaling out gradually.


  • Supermicro SYS-220HE-FTNR — 2U, up to 4× double-wide PCIe GPUs including A100 PCIe. Compact, widely available on secondary market, common for inference deployments.


  • HPE ProLiant DL380 Gen10 Plus — 2U, up to 3× A100 PCIe GPUs. Enterprise workhorse for A100 PCIe inference within existing HPE infrastructure. Air-cooled, full iLO management.


  • HPE Apollo 6500 Gen10 Plus (PCIe config) — same chassis as the SXM4 version, configurable with up to 10× double-wide or 16× single-wide PCIe GPUs per node — the highest validated PCIe GPU density on any A100 platform.


  • Lenovo ThinkSystem SR670 V2 (PCIe config) — same chassis, configured for up to 8× double-wide A100 PCIe GPUs. Covers both form factors in one platform depending on configuration.


  • GIGABYTE G492-HA0 / G481-HA0 — Intel Xeon-based, up to 10× A100 PCIe GPUs. GIGABYTE's PCIe A100 platforms for HPC clusters and research environments preferring Intel Xeon over AMD EPYC.



Common Mistakes When Buying NVIDIA A100 PCIe & SXM4 GPUs


Buying SXM4 without HGX infrastructure:

SXM4 modules require an HGX A100 baseboard — they cannot be installed in a standard PCIe server. If you buy SXM4 modules without a compatible system, they are unusable. Always confirm server compatibility before ordering.


Choosing 40 GB when 80 GB is needed:

The 40 GB and 80 GB use different memory hardware with different bus widths. You cannot upgrade memory later — you need a different GPU entirely. In our experience, teams that start with 40 GB to save cost frequently wish they had bought 80 GB from the start.


Assuming PCIe and SXM4 perform identically for all workloads:

For single-GPU inference, the difference is minimal. For multi-GPU training with heavy inter-GPU communication, PCIe can be significantly slower — up to ~2× on communication-bound workloads like large language model training. The form factor choice is a throughput and architecture decision, not just an infrastructure preference.


Ignoring MIG for inference deployments:

Buying multiple GPUs for inference when MIG on fewer GPUs could serve the same workload is a common and expensive mistake. Evaluate MIG slice configurations before deciding on GPU count — a single A100 80 GB can serve up to 7 isolated workloads simultaneously.


Confusing scale-up and scale-out requirements:

SXM4 is the right choice when you need to scale up within a node — large models, tensor parallelism, tight GPU synchronisation. PCIe is the right choice when you need to scale out across many nodes — data parallelism, inference fleets, multi-tenant infrastructure. Buying SXM4 for a workload that only needed scale-out is a common and costly overcalculation.



NVIDIA A100 GPUs

In Stock: PCIe & SXM4 - 40GB & 80GB




Summary: NVIDIA A100 PCIe vs SXM4 GPUs


PCIe 40 GB

PCIe 80 GB

SXM4 40 GB

SXM4 80 GB

Best for

Single-GPU inference

Inference, fine-tuning

Multi-GPU training

Training, HPC, large models

VRAM

40 GB HBM2

80 GB HBM2e

40 GB HBM2

80 GB HBM2e

Memory BW

~1.55 TB/s

~2.0 TB/s

~1.6 TB/s

~2.0 TB/s

GPU-to-GPU BW

~64 GB/s (shared)

~64 GB/s (shared)

600 GB/s (full mesh)

600 GB/s (full mesh)

Topology

PCIe fabric

PCIe fabric

NVLink + NVSwitch

NVLink + NVSwitch

Infrastructure

Any PCIe Gen4 server

Any PCIe Gen4 server

HGX A100 only

HGX A100 only

TDP

250W

300W

400W

400W

MIG

Yes

Yes

Yes

Yes

Scale strategy

Scale-out

Scale-out

Scale-up

Scale-up

We have NVIDIA A100 GPUs and complete A100 servers in stock across all variants. Tell us what you are building and we will send you a recommended configuration with pricing. → Get your configuration →



FAQ: NVIDIA A100 PCIe vs SXM4 GPUs


Can I install an A100 SXM4 in a standard server?

No. SXM4 modules require an HGX A100 baseboard. They cannot be installed in a regular PCIe server slot. Always confirm server compatibility before ordering.


Can I upgrade an A100 40GB to 80GB?

No. The two variants use different memory hardware with different bus widths. There is no upgrade path — you need a different GPU entirely.


Do A100 PCIe GPUs support NVLink?

The A100 PCIe supports NVLink Bridge between two cards in the same server. This is not the same as SXM4, which uses NVLink plus NVSwitch for full-mesh connectivity across all GPUs at 600 GB/s per GPU. PCIe NVLink Bridge is limited to two GPUs only.


What is the difference between HBM2 and HBM2e on the A100?

HBM2 is used on 40GB variants (~1.55–1.6 TB/s bandwidth). HBM2e is used on 80GB variants (~2.0 TB/s) with a wider memory bus. The 80GB delivers more VRAM and higher memory bandwidth — not just extra capacity.


How many A100s do I need to run a large language model?

As a rough guide: 7B parameters needs 1–2 A100 80GB GPUs, 13B needs 2–4, 70B needs 8–16, and 175B+ needs 64 or more. Actual requirements depend on precision, batch size, and parallelism strategy.


What is MIG and do all A100 variants support it?

MIG (Multi-Instance GPU) partitions a single A100 into up to 7 isolated instances, each with dedicated memory and compute. All four A100 variants support it. A single A100 80GB can serve up to 7 isolated workloads simultaneously, making it worth evaluating before buying multiple GPUs for inference.


Is the A100 PCIe significantly slower than SXM4 for training?

For single-GPU workloads, the difference is minimal. For multi-GPU training with heavy inter-GPU communication, PCIe can be considerably slower — 64 GB/s versus 600 GB/s GPU-to-GPU bandwidth is a real bottleneck on large model training.


Can I mix A100 40GB and 80GB GPUs in the same server?

Not recommended. Mixed VRAM creates uneven memory distribution across GPUs, complicates model sharding, and reduces efficiency. Match your variants across the system.


What are the power requirements for A100 PCIe vs SXM4?

PCIe draws 250W (40GB) or 300W (80GB). SXM4 draws 400W on both variants. An 8-GPU SXM4 server can exceed 3,000W at full load — power, cooling, and rack capacity need to be planned accordingly.


When should I choose PCIe over SXM4?

Choose PCIe for inference, smaller training jobs, or standard rack deployments where flexibility matters more than peak inter-GPU bandwidth. Choose SXM4 when running large model training across multiple GPUs where tight GPU interconnect is critical.



Sources: NVIDIA A100 PCIe vs SXM4 GPUs

NVIDIA A100 Tensor Core GPU — Official Product Page:


NVIDIA A100 Tensor Core GPU Datasheet (PDF):


NVIDIA Ampere Architecture Whitepaper (PDF):


NVIDIA HGX A100 Datasheet (PDF):


NVIDIA DGX A100 System User Guide:

1 Comment


Guest
Jun 03

This is a great breakdown of the trade-offs between A100 PCIe and SXM4 deployments. The explanation of inter-GPU bandwidth and power requirements makes it much easier to understand which option fits different AI workloads. I enjoy reading technical comparisons like this. When I take a break from learning about hardware and performance optimization, I often play SoFlo Wheelie Life, a motorcycle stunt game where balance, timing, and efficient progression are key to success.

Like
bottom of page