|
1 | 1 | # XLA architecture
|
2 | 2 |
|
3 |
| -XLA is a machine learning (ML) compiler that optimizes |
4 |
| -linear algebra (XLA = accelerated linear algebra), providing improvements in |
5 |
| -execution speed and memory usage. This page provides a brief overview of the |
6 |
| -the XLA compiler's objectives and architecture. |
| 3 | +XLA (Accelerated Linear Algebra) is a machine learning (ML) compiler that |
| 4 | +optimizes linear algebra, providing improvements in execution speed and memory |
| 5 | +usage. This page provides a brief overview of the objectives and architecture of |
| 6 | +the XLA compiler. |
7 | 7 |
|
8 | 8 | ## Objectives
|
9 | 9 |
|
10 |
| -Today, XLA supports several ML framework frontends (such as PyTorch, TensorFlow, |
11 |
| -and JAX) and is part of the OpenXLA project—an ecosystem of open-source compiler |
12 |
| -technologies for ML that's developed collaboratively by leading ML |
13 |
| -hardware and software organizations. Before the OpenXLA project was created, XLA |
14 |
| -was developed inside the TensorFlow project, but the fundamental |
15 |
| -objectives have remained the same: |
| 10 | +Today, XLA supports several ML framework frontends (including PyTorch, |
| 11 | +TensorFlow, and JAX) and is part of the OpenXLA project – an ecosystem of |
| 12 | +open-source compiler technologies for ML that's developed collaboratively by |
| 13 | +leading ML hardware and software organizations. Before the OpenXLA project was |
| 14 | +created, XLA was developed inside the TensorFlow project, but the fundamental |
| 15 | +objectives remain the same: |
16 | 16 |
|
17 |
| -* **Improve execution speed.** Compile subgraphs to reduce the execution time |
18 |
| - of short-lived ops to eliminate overhead from the execution runtime, fuse |
19 |
| - pipelined operations to reduce memory overhead, and specialize known |
20 |
| - tensor shapes to allow for more aggressive constant propagation. |
| 17 | +* **Improve execution speed.** Compile subgraphs to reduce the execution time |
| 18 | + of short-lived ops and eliminate overhead from the runtime, fuse pipelined |
| 19 | + operations to reduce memory overhead, and specialize known tensor shapes to |
| 20 | + allow for more aggressive constant propagation. |
21 | 21 |
|
22 |
| -* **Improve memory usage.** Analyze and schedule memory usage, |
23 |
| - eliminating many intermediate storage buffers. |
| 22 | +* **Improve memory usage.** Analyze and schedule memory usage, eliminating |
| 23 | + many intermediate storage buffers. |
24 | 24 |
|
25 |
| -* **Reduce reliance on custom ops.** Remove the need for many custom ops by |
26 |
| - improving the performance of automatically fused low-level ops to match the |
27 |
| - performance of custom ops that were originally fused by hand. |
| 25 | +* **Reduce reliance on custom ops.** Remove the need for many custom ops by |
| 26 | + improving the performance of automatically fused low-level ops to match the |
| 27 | + performance of custom ops that were originally fused by hand. |
28 | 28 |
|
29 |
| -* **Improve portability.** Make it relatively easy to write a new backend for |
30 |
| - novel hardware, at which point a large fraction of ML models can |
31 |
| - run unmodified on that hardware. This is in contrast with the approach of |
32 |
| - specializing individual monolithic ops for new hardware, which requires |
33 |
| - models be rewritten to make use of those ops. |
| 29 | +* **Improve portability.** Make it relatively easy to write a new backend for |
| 30 | + novel hardware, so that a large fraction of ML models can run unmodified on |
| 31 | + that hardware. This is in contrast with the approach of specializing |
| 32 | + individual monolithic ops for new hardware, which requires models to be |
| 33 | + rewritten to make use of those ops. |
34 | 34 |
|
35 | 35 | ## How it works
|
36 | 36 |
|
37 |
| -The XLA Compiler takes model graphs from ML frameworks defined in |
| 37 | +The XLA compiler takes model graphs from ML frameworks defined in |
38 | 38 | [StableHLO](https://github.com/openxla/stablehlo) and compiles them into machine
|
39 |
| -instructions for various architectures. StableHLO defines a versioned |
40 |
| -operation set (HLO = high level operations) that provides a |
41 |
| -portability layer between ML frameworks and the compiler. |
| 39 | +instructions for various architectures. StableHLO defines a versioned operation |
| 40 | +set (HLO = high level operations) that provides a portability layer between ML |
| 41 | +frameworks and the compiler. |
42 | 42 |
|
43 | 43 | In general, the compilation process that converts the model graph into a
|
44 | 44 | target-optimized executable includes these steps:
|
45 | 45 |
|
46 |
| -1. XLA performs several built-in optimization and analysis passes on the |
47 |
| -StableHLO graph that are target-independent, such as |
48 |
| -[CSE](https://en.wikipedia.org/wiki/Common_subexpression_elimination), |
49 |
| -target-independent operation fusion, and buffer analysis for allocating runtime |
50 |
| -memory for the computation. During this optimization stage, XLA also converts |
51 |
| -the StableHLO dialect into an internal HLO dialect. |
| 46 | +1. XLA performs several built-in optimization and analysis passes on the |
| 47 | + StableHLO graph that are target-independent, such as |
| 48 | + [CSE](https://en.wikipedia.org/wiki/Common_subexpression_elimination), |
| 49 | + target-independent operation fusion, and buffer analysis for allocating |
| 50 | + runtime memory for the computation. During this optimization stage, XLA also |
| 51 | + converts the StableHLO dialect into an internal HLO dialect. |
52 | 52 |
|
53 |
| -2. XLA sends the HLO computation to a |
54 |
| -backend for further HLO-level optimizations, this time with target-specific |
55 |
| -information and needs in mind. For example, the GPU backend may perform |
56 |
| -operation fusions that are beneficial specifically for the GPU programming model |
57 |
| -and determine how to partition the computation into streams. At this stage, |
58 |
| -backends may also pattern-match certain operations or combinations thereof to |
59 |
| -optimized library calls. |
| 53 | +2. XLA sends the HLO computation to a backend for further HLO-level |
| 54 | + optimizations, this time with target-specific information and needs in mind. |
| 55 | + For example, the GPU backend may perform operation fusions that are |
| 56 | + beneficial specifically for the GPU programming model and determine how to |
| 57 | + partition the computation into streams. At this stage, backends may also |
| 58 | + pattern-match certain operations or combinations thereof to optimized |
| 59 | + library calls. |
60 | 60 |
|
61 |
| -3. The backend then performs target-specific code generation. The CPU and GPU |
62 |
| -backends included with XLA use [LLVM](http://llvm.org) for low-level |
63 |
| -IR, optimization, and code-generation. These backends emit the LLVM IR necessary |
64 |
| -to represent the HLO computation in an efficient manner, and then invoke LLVM to |
65 |
| -emit native code from this LLVM IR. |
| 61 | +3. The backend then performs target-specific code generation. The CPU and GPU |
| 62 | + backends included with XLA use [LLVM](http://llvm.org) for low-level IR, |
| 63 | + optimization, and code generation. These backends emit the LLVM IR necessary |
| 64 | + to represent the HLO computation in an efficient manner, and then invoke |
| 65 | + LLVM to emit native code from this LLVM IR. |
66 | 66 |
|
67 |
| -Within this process, the XLA Compiler is modular in the sense that it is easy to |
68 |
| -slot-in an alternative backend to [target some novel HW |
69 |
| -architecture](./developing_new_backend.md). The GPU backend currently supports |
70 |
| -NVIDIA GPUs via the LLVM NVPTX backend; the CPU backend supports multiple CPU |
71 |
| -ISAs. |
| 67 | +Within this process, the XLA compiler is modular in the sense that it is easy to |
| 68 | +slot in an alternative backend to |
| 69 | +[target some novel HW architecture](./developing_new_backend.md). The GPU |
| 70 | +backend currently supports NVIDIA GPUs via the LLVM NVPTX backend. The CPU |
| 71 | +backend supports multiple CPU ISAs. |
0 commit comments