Template:AMD Radeon Instinct

AMD Radeon Instinct GPUs (has render output, no matrix units)
Model
(Code name)
Launch Architecture
fab
LLVM
target[1]
Transistors
& die size
Core Fillrate[a][b][c] Vector TFLOPS[a][d] Memory TBP Bus
interface
Config[e] Clock[a]
(MHz)
Texture
(GT/s)
Pixel
(GP/s)
FP16 FP32 FP64 Size
(GB)
Bus type
& width
Bandwidth
(GB/s)
Clock
(MT/s)
Radeon Instinct MI6
(Polaris 10)[2][3][4][5][6][7]
Jun 20, 2017 GCN 4
GloFo 14LP
gfx803 5.7×109
232 mm2
2304:144:32
36 CU
1120
1233
161.3
177.6
35.84
39.46
5.161
5.682
5.161
5.682
0.323
0.355
16 GDDR5
256-bit
224 7000 150 W PCIe 3.0
×16
Radeon Instinct MI8
(Fiji)[2][3][4][8][9][10]
GCN 4
TSMC 28 nm
gfx803 8.9×109
596 mm2
4096:256:64
64 CU
1000 256.0 64.00 8.192 8.192 0.512 4 HBM
4096-bit
512 1000 175 W
Radeon Instinct MI25
(Vega 10)[2][3][4][11][12][13][14]
GCN 5
GloFo 14LP
gfx900 12.5×109
510 mm2
1400
1500
358.4
384.0
89.60
96.00
22.94
24.58
11.47
12.29
0.717
0.768
16 HBM2
2048-bit
484 1890 300 W
Radeon Instinct MI50
(Vega 20)[15][16][17][18][19][20]
Nov 18, 2018 GCN 5
TSMC N7
gfx906 13.2×109
331 mm2
3840:240:64
60 CU
1450
1725
348.0
414.0
92.80
110.4
22.27
26.50
11.14
13.25
5.568
6.624
16
32
HBM2
4096-bit
1024 2000 300 W PCIe 4.0
×16
Radeon Instinct MI60
(Vega 20)[16][21][22][23]
4096:256:64
64 CU
1500
1800
384.0
460.8
96.00
115.2
24.58
29.49
12.29
14.75
6.144
7.373
32
AMD Instinct GPUs (has matrix units, no render output)
Model
(Code name)
Launch Architecture
fab
LLVM
target[1]
Transistors
& die size
Core Vector TFLOPS[a][d] Matrix speedup[f] Memory TBP Bus
interface
Config[e] Clock[a]
(MHz)
INT8[g] FP16[h] FP32 FP64 FP32 FP64 S.Sparse Size
(GB)
Bus type
& width
Bandwidth
(GB/s)
Clock
(MT/s)
AMD Instinct MI100
(Arcturus)[24][25][26]
Nov 16, 2020 CDNA 1
TSMC N7
gfx908 25.6×109
750 mm2
7680:480:-
120 CU
1000
1502
122.9
184.6
122.9
184.6
15.36
23.07
7.680
11.54
32 HBM2
4096-bit
1228.8 2400 300 W PCIe 4.0
×16
AMD Instinct MI210
(Aldebaran)[27][28][29]
Mar 22, 2022 CDNA 2
TSMC N6
gfx90a 28 × 109
~770 mm2
6656:416:-
104 CU
(1 × GCD)[i]
1000
1700
106.5
181.0
106.5
181.0
13.31
22.63
13.31
22.63
64 HBM2E
4096-bit
1638.4 3200 300 W
AMD Instinct MI250
(Aldebaran)[30][31][32]
Nov 8, 2021 58 × 109
1540 mm2
13312:832:-
208 CU
(2 × GCD)
213.0
362.1
213.0
362.1
26.62
45.26
26.62
45.26
2 × 64 HBM2E
2 × 4096-bit[j]
2 × 1638.4 500 W
560 W (Peak)
AMD Instinct MI250X
(Aldebaran)[33][31][34]
14080:880:-
220 CU
(2 × GCD)
225.3
383.0
225.3
383.0
28.16
47.87
28.16
47.87
AMD Instinct MI300A
(Antares)[35][36][37][38]
Dec 6, 2023 CDNA 3
TSMC N5 & N6
gfx942 146 × 109
1017 mm2
14592:912:-
228 CU
(6 × XCD)

24 Zen 4 x86 cores
(3 × CCD)[i]

2100 1961.2 980.6 122.6 61.3 128 HBM3
8192-bit
5300 5200 550 W
760 W (Liquid Cooling)
PCIe 5.0
×16
AMD Instinct MI300X
(Aqua Vanjaram)[39][40][41][42]
153 × 109
1017 mm2
19456:1216:-
304 CU
(8 × XCD)
2614.9 1307.4 163.4 81.7 192 750 W
AMD Instinct MI350X[43][44] CDNA 4
TSMC N3 & N6
gfx950 185 × 109
1017 mm2
16384:1024:-
256 CU
(8 × XCD)
2200 4600[k] 144.2 144.2 72.1 288 HBM3e
8192-bit
8000 8000 1000 W PCIe 5.0
×16 (OAM)
AMD Instinct MI355X 2400 288 1400 W
  1. ^ a b c d e Boost values (if available) are stated below the base value in italic.
  2. ^ Texture fillrate is calculated as the number of texture mapping units multiplied by the base (or boost) core clock speed.
  3. ^ Pixel fillrate is calculated as the number of render output units multiplied by the base (or boost) core clock speed.
  4. ^ a b Precision performance is calculated from the base (or boost) core clock speed based on a FMA operation.
  5. ^ a b Unified shaders : Texture mapping units : Render output units and Compute units (CU)
  6. ^ The matrix unit exists in addition to the main vector (SIMD) processing unit to accelerate matrix-multiplication operations common in machine learning applications. It is able to optimize the multiplication of common data types, resulting in a integer-multiple (typically 2×) increase in TFLOPS. Since CDNA3 it is also able to use structured sparsity regularization for a 2× increase in TFLOPS for all data types. In CDNA4 the speedup has been removed for FP32 and FP64, instead focusing the unit on speedups for low precision (INT8, MXFP4/6/8, OCP-FP8, FP16, BF16).
  7. ^ CDNA3 and later supports packed FP8 (E5M2, E4M3) at the same level of performance.
  8. ^ CDNA supports BF16 at half the performance as FP16. CDNA2 and later supports BF16 at the same performance level as FP16. CDNA3 and later supports TF32 at the same performance level as FP16.
  9. ^ a b GCD Refers to a Graphics Compute Die. Each GCD is a different piece of silicon. The same applies to XCDs and CCDs.
  10. ^ CDNA 2.0 Based cards adopt a design using two dies on the same package.They are linked with 400GB/s Bidirectional Infinity Fabric link, The dies are addressed as individual GPUs by the host system.
  11. ^ Matrix only