2015: Keras, TensorFlow
2017: Caffe2, PyTorch, ONNX
A framework is intended for model development.
A framework helps to improve developer efficiency - trying out ideas faster (debugging, interactive development, simplicity, intuitiveness).
A framework helps to improve infrastructure efficiency - running computation faster (implementation, scalability, model definition, cross-platform requirements).
A good framework makes a balance between developer efficiency and infrastructure efficiency.
Examples include Theano, Caffe, MXNet, TensorFlow, and Caffe2.
In these frameworks, we declare and compile models, then repeatedly execute the models in a Virtual Machine.
Easy to optimize.
Easy to serialize for production deployment.
Non-intuitive programming model.
Difficult to design and maintain.
Examples include PyTorch and Chainer.
In these frameworks, we define and construct the models by running computation. There is no separate execution engine.
Intuitive to write programs.
Easy to design, debug, and iterate.
Difficult to optimize - no domain-specific languages.
Hard to deploy on multiple platforms.
Research to Production at Facebook:
PyTorch → Caffe2 (2017): Reimplementation took weeks or months.
PyTorch → ONNX → Caffe2 (2018): Enabling model or model fragment transfer.
PyTorch + Caffe2 (2019-Present): Combining both the advantages of developer efficiency and infrastructure efficiency.
Many frameworks start adopting such a combination:
Keras/TF-Eager + TensorFlow
Gluon + MXNet
Understand your need:
Developer Efficiency: Algorithm Research? Startup? Proof of Concept?
Infrastructure Efficiency: System Research? Cross-Platform? Scale?
Learn one framework and focus on your problem.
It's fine to switch.
The goal of frameworks is to improve productivity.
Within the tech stack, there is the libraries layer on top of frameworks: TF-Serving, CoreML, Clipper, Ray, etc.
On top of libraries, we have the applications layer: Detectron, FairSeq, Magenta, GluonNLP, etc.
Below the frameworks, we have the layer of runtime, compilers, and optimizers: CuDNN, NNPack, TVM, ONXX, etc.
The lowest layer is hardware: CPU, GPU, DSP, FPGA/ASIC
Do NOT use AlexNet!
Unifications help everyone: ONNX bridges the gap between high-level API & framework frontends with hardware vendor libraries & devices.
Invest in experiment management.
Use Computer Science conventional wisdom: programming language, compilers, scientific computation, databases, etc.
Things change across the stack:
Applications layer: quantization brings a balance between speed and accuracy.
Libraries layer: auto-quantization interfaces increase ease of use.
Frameworks layer: quantized training, auto-scaling, etc.
Runtime, compilers, optimizers layer: high-performance fixed-point math.
Hardware layer: quantized computation primitives.