API手册
March
class horizon_plugin_pytorch.march.March
BPU platform.
-
BAYES: Bayes platform
-
BERNOULLI2: Bernoulli2 platform
-
BAYES_E: Bayes platform
qconfig
horizon_plugin_pytorch.quantization.get_default_qconfig(activation_fake_quant: Optional[str] = 'fake_quant', weight_fake_quant: Optional[str] = 'fake_quant', activation_observer: Optional[str] = 'min_max', weight_observer: Optional[str] = 'min_max', activation_qkwargs: Optional[Dict] = None, weight_qkwargs: Optional[Dict] = None)
Get default qconfig.
参数
-
activation_fake_quant – FakeQuantize type of activation, default is fake_quant. Avaliable items are fake_quant, lsq, pact.
-
weight_fake_quant – FakeQuantize type of weight, default is fake_quant. Avaliable items are fake_quant, lsq and pact.
-
activation_observer – Observer type of activation, default is min_max. Avaliable items are min_max, fixed_scale, clip, percentile, clip_std, mse, kl.
-
weight_observer – Observer type of weight, default is min_max. Avaliable items are min_max, fixed_scale, clip, percentile, clip_std, mse.
-
activation_qkwargs – A dict contain activation Observer type, args of activation FakeQuantize and args of activation Observer.
-
weight_qkwargs – A dict contain weight Observer type, args of weight FakeQuantize and args of weight Observer.
qconfig 定义示例
- RDK X3 使用示例如下:
default_qat_8bit_fake_quant_qconfig = get_default_qconfig(
activation_fake_quant="fake_quant",
weight_fake_quant="fake_quant",
activation_observer="min_max",
weight_observer="min_max",
activation_qkwargs=None,
weight_qkwargs={"qscheme": torch.per_channel_symmetric, "ch_axis": 0,},
)
default_qat_out_8bit_fake_quant_qconfig = get_default_qconfig(
activation_fake_quant=None,
weight_fake_quant="fake_quant",
activation_observer=None,
weight_observer="min_max",
activation_qkwargs=None,
weight_qkwargs={"qscheme": torch.per_channel_symmetric, "ch_axis": 0,},
)
default_calib_8bit_fake_quant_qconfig = get_default_qconfig(
activation_fake_quant="fake_quant",
weight_fake_quant="fake_quant",
activation_observer="percentile",
weight_observer="min_max",
activation_qkwargs=None,
weight_qkwargs={"qscheme": torch.per_channel_symmetric, "ch_axis": 0,},
)
default_calib_out_8bit_fake_quant_qconfig = (
default_qat_out_8bit_fake_quant_qconfig
)
default_qat_8bit_lsq_quant_qconfig = get_default_qconfig(
activation_fake_quant="lsq",
weight_fake_quant="lsq",
activation_observer="min_max",
weight_observer="min_max",
activation_qkwargs={"use_grad_scaling": True, "averaging_constant": 1.0,},
weight_qkwargs={"qscheme": torch.per_channel_symmetric, "ch_axis": 0, "use_grad_scaling": True,"averaging_constant": 1.0,},
)
- RDK Ultra 和 RDK X5 使用示例如下:
default_qat_8bit_fake_quant_qconfig = get_default_qconfig(
activation_fake_quant="fake_quant",
weight_fake_quant="fake_quant",
activation_observer="min_max",
weight_observer="min_max",
activation_qkwargs=None,
weight_qkwargs={"qscheme": torch.per_channel_symmetric, "ch_axis": 0,},
)
default_qat_8bit_weight_32bit_out_fake_quant_qconfig = get_default_qconfig(
activation_fake_quant=None,
weight_fake_quant="fake_quant",
activation_observer=None,
weight_observer="min_max",
activation_qkwargs=None,
weight_qkwargs={"qscheme": torch.per_channel_symmetric, "ch_axis": 0,},
)
default_calib_8bit_fake_quant_qconfig = get_default_qconfig(
activation_fake_quant="fake_quant",
weight_fake_quant="fake_quant",
activation_observer="percentile",
weight_observer="min_max",
activation_qkwargs=None,
weight_qkwargs={"qscheme": torch.per_channel_symmetric, "ch_axis": 0,},
)
default_calib_8bit_weight_32bit_out_fake_quant_qconfig = (
default_qat_out_8bit_fake_quant_qconfig
)
default_qat_8bit_weight_16bit_act_fake_quant_qconfig = get_default_qconfig(
activation_fake_quant="fake_quant",
weight_fake_quant="fake_quant",
activation_observer="min_max",
weight_observer="min_max",
activation_qkwargs={"dtype": qint16,},
weight_qkwargs={"qscheme": torch.per_channel_symmetric, "ch_axis": 0,},
)
default_calib_8bit_weight_16bit_act_fake_quant_qconfig = get_default_qconfig(
activation_fake_quant="fake_quant",
weight_fake_quant="fake_quant",
activation_observer="percentile",
weight_observer="min_max",
activation_qkwargs={"dtype": qint16,},
weight_qkwargs={"qscheme": torch.per_channel_symmetric, "ch_axis": 0,},
)
伪量化算子
class horizon_plugin_pytorch.quantization.FakeQuantize(observer: type = <class 'horizon_plugin_pytorch.quantization.observer.MovingAverageMinMaxObserver'>, saturate: bool = None, in_place: bool = False, compat_mask: bool = True, channel_len: int = 1, **observer_kwargs)
Simulate the quantize and dequantize operations in training time.
The output of this module is given by
fake_quant_x = clamp(floor(x / scale + 0.5), quant_min, quant_max) * scale # noqa
-
scale defines the scale factor used for quantization.
-
zero_point specifies the quantized value to which 0 in floating point maps to
-
quant_min specifies the minimum allowable quantized value.
-
quant_max specifies the maximum allowable quantized value.
-
fake_quant_enabled controls the application of fake quantization on tensors, note that statistics can still be updated.
-
observer_enabled controls statistics collection on tensors
-
dtype specifies the quantized dtype that is being emulated with fake-quantization, the allowable values is qint8 and qint16. The values of quant_min and quant_max should be chosen to be consistent with the dtype
参数
-
observer – Module for observing statistics on input tensors and calculating scale and zero-point.
-
saturate – Whether zero out the grad for value out of quanti range.
-
in_place – Whether use in place fake quantize.
-
compat_mask – Whether pack the bool mask into bitfield when saturate = True.
-
channel_len – Size of data at channel dim.
-
observer_kwargs – Arguments for the observer module
observer
User provided module that collects statistics on the input tensor and provides a method to calculate scale and zero-point.
extra_repr()
Set the extra representation of the module
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
forward(x)
Defines the computation performed at every call.
Should be overridden by all subclasses.
Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
set_qparams(scale: Union[torch.Tensor, Sequence, float], zero_point: Optional[Union[torch.Tensor, Sequence, int]] = None)
Set qparams, default symmetric.
classmethod with_args(**kwargs)
Wrapper that allows creation of class factories.
This can be useful when there is a need to create classes with the same constructor arguments, but different instances. Can be used in conjunction with _callable_args
Example:
>>> # xdoctest: +SKIP("Undefined vars")
>>> Foo.with_args = classmethod(_with_args)
>>> foo_builder = Foo.with_args(a=3, b=4).with_args(answer=42)
>>> foo_instance1 = foo_builder()
>>> foo_instance2 = foo_builder()
>>> id(foo_instance1) == id(foo_instance2)
False
QAT
horizon_plugin_pytorch.quantization.convert(module: torch.nn.modules.module.Module, mapping: Optional[Dict[Type[torch.nn.modules.module.Module], Type[torch.nn.modules.module.Module]]] = None, inplace: bool = False, remove_qconfig: bool = True, fast_mode: bool = False)
Convert modules.
Convert submodules in input module to a different module according to mapping by calling from_float method on the target module class. And remove qconfig at the end if remove_qconfig is set to True.
参数
-
module – input module
-
mapping – a dictionary that maps from source module type to target module type, can be overwritten to allow swapping user defined Modules
-
inplace – carry out model transformations in-place, the original module is mutated
-
fast_mode – whether to accelerate quantized model forward. If set True, quantized model cannot be compiled
horizon_plugin_pytorch.quantization.convert_fx(graph_module: torch.fx.graph_module.GraphModule, inplace: bool = False, convert_custom_config_dict: Optional[Dict[str, Any]] = None, _remove_qconfig: bool = True, fast_mode: bool = False) → horizon_plugin_pytorch.quantization.fx.graph_module.QuantizedGraphModule
Convert a calibrated or trained model to a quantized model.
参数
-
graph_module – A prepared and calibrated/trained model (GraphModule)
-
inplace – Carry out model transformations in-place, the original module is mutated.
-
convert_custom_config_dict –
dictionary for custom configurations for convert function:
convert_custom_config_dict = {
# We automativally preserve all attributes, this option is
# just in case and not likely to be used.
"preserved_attributes": ["preserved_attr"],
}
-
_remove_qconfig – Option to remove the qconfig attributes in the model after convert. for internal use only.
-
fast_mode – whether to accelerate quantized model forward. If set True, quantized model cannot be compiled.
返回
A quantized model (GraphModule)
Example: convert fx example:
# prepared_model: the model after prepare_fx/prepare_qat_fx and
# calibration/training
quantized_model = convert_fx(prepared_model)
horizon_plugin_pytorch.quantization.fuse_fx(model: torch.nn.modules.module.Module, fuse_custom_config_dict: Optional[Dict[str, Any]] = None) → horizon_plugin_pytorch.quantization.fx.graph_module.GraphModuleWithAttr
Fuse modules like conv+add+bn+relu etc.
Fusion rules are defined in horizon_plugin_pytorch.quantization.fx.fusion_pattern.py
参数
-
model – a torch.nn.Module model
-
fuse_custom_config_dict –
Dictionary for custom configurations for fuse_fx, e.g.
fuse_custom_config_dict = {
# We automativally preserve all attributes, this option is
# just in case and not likely to be used.
"preserved_attributes": ["preserved_attr"],
}
Example: fuse_fx example:
from torch.quantization import fuse_fx
m = fuse_fx(m)
horizon_plugin_pytorch.quantization.fuse_known_modules(mod_list, is_qat=False, additional_fuser_method_mapping=None)
Fuse modules.
Return a list of modules that fuses the operations specified in the input module list.
Fuses only the following sequence of modules: conv, bn; conv, bn, relu; conv, relu; conv, bn, add; conv, bn, add, relu; conv, add; conv, add, relu; linear, bn; linear, bn, relu; linear, relu; linear, bn, add; linear, bn, add, relu; linear, add; linear, add, relu. For these sequences, the first element in the output module list performs the fused operation. The rest of the elements are set to nn.Identity()
horizon_plugin_pytorch.quantization.fuse_modules(model, modules_to_fuse, inplace=False, fuser_func=<function fuse_known_modules>, fuse_custom_config_dict=None)
Fuses a list of modules into a single module.
Fuses only the following sequence of modules: conv, bn; conv, bn, relu; conv, relu; conv, bn, add; conv, bn, add, relu; conv, add; conv, add, relu; linear, bn; linear, bn, relu; linear, relu; linear, bn, add; linear, bn, add, relu; linear, add; linear, add, relu. For these sequences, the first element in the output module list performs the fused operation. The rest of the elements are set to nn.Identity()
参数
-
model – Model containing the modules to be fused
-
modules_to_fuse – list of list of module names to fuse. Can also be a list of strings if there is only a single list of modules to fuse.
-
inplace – bool specifying if fusion happens in place on the model, by default a new model is returned
-
fuser_func – Function that takes in a list of modules and outputs a list of fused modules of the same length. For example, fuser_func([convModule, BNModule]) returns the list [ConvBNModule, nn.Identity()] Defaults to torch.ao.quantization.fuse_known_modules
-
fuse_custom_config_dict – custom configuration for fusion
# Example of fuse_custom_config_dict
fuse_custom_config_dict = {
# Additional fuser_method mapping
"additional_fuser_method_mapping": {
(torch.nn.Conv2d, torch.nn.BatchNorm2d): fuse_conv_bn
},
}
返回
model with fused modules. A new copy is created if inplace=True.
Examples:
>>> # xdoctest: +SKIP
>>> m = M().eval()
>>> # m is a module containing the sub-modules below
>>> modules_to_fuse = [ ['conv1', 'bn1', 'relu1'],
['submodule.conv', 'submodule.relu']]
>>> fused_m = fuse_modules(
m, modules_to_fuse)
>>> output = fused_m(input)
>>> m = M().eval()
>>> # Alternately provide a single list of modules to fuse
>>> modules_to_fuse = ['conv1', 'bn1', 'relu1']
>>> fused_m = fuse_modules(
m, modules_to_fuse)
>>> output = fused_m(input)
horizon_plugin_pytorch.quantization.prepare_qat(model: torch.nn.modules.module.Module, mapping: Optional[Dict[Type[torch.nn.modules.module.Module], Type[torch.nn.modules.module.Module]]] = None, inplace: bool = False, optimize_graph: bool = False, hybrid: bool = False, optimize_kwargs: Optional[Dict[str, Tuple]] = None, example_inputs: Any = None, qconfig_setter: Optional[Union[Tuple[horizon_plugin_pytorch.quantization.qconfig_template.QconfigSetterBase, ...], horizon_plugin_pytorch.quantization.qconfig_template.QconfigSetterBase]] = None, verbose: int = 0)
Prepare qat.
Prepare a copy of the model for quantization-aware training and converts it to quantized version.
Quantization configuration should be assigned preemptively to individual submodules in .qconfig attribute.
参数
-
model – input model to be modified in-place
-
mapping – dictionary that maps float modules to quantized modules to be replaced.
-
inplace – carry out model transformations in-place, the original module is mutated
-
optimize_graph – whether to do some process on origin model for special purpose. Currently only support using torch.fx to fix cat input scale(only used on Bernoulli)
-
hybrid – whether to generate a hybrid model that some intermediate operation is computed in float. There are some constraints for this functionality now: 1. The hybrid model cannot pass check_model and cannot be compiled. 2. Some quantized operation cannot directly accept input from float operation, user need to manually insert QuantStub.
-
optimize_kwargs –
a dict for optimize graph with the following format:
optimize_kwargs = {
# optional, specify which type of optimization to do. Only
# support "unify_inputs_scale" now
"opt_types": ("unify_inputs_scale",),
# optional, modules start with qualified name to optimize
"module_prefixes": ("backbone.conv",),
# optional, modules in these types will be optimize
"module_types": (horizon.nn.qat.conv2d,),
# optional, functions to optimize
"functions": (torch.clamp,),
# optional, methods to optimize. Only support
# FloatFunctional methods now
"methods": ("add",),
}
-
example_inputs – model inputs. It is used to trace model or check model structure.
-
qconfig_setter – Qconfig setter. Only needed when using qconfig template.
-
verbose – whether check model structure. it has two levels: 0: do nothing 1: check model structure
a. if model has shared ops
b. if model has unfused operations
c. model quantization config
horizon_plugin_pytorch.quantization.prepare_qat_fx(model: Union[torch.nn.modules.module.Module, torch.fx.graph_module.GraphModule], qconfig_dict: Optional[Dict[str, Any]] = None, prepare_custom_config_dict: Optional[Dict[str, Any]] = None, optimize_graph: bool = False, hybrid: bool = False, hybrid_dict: Optional[Dict[str, List]] = None, opset_version: str = 'hbdk3', example_inputs: Any = None, qconfig_setter: Optional[Union[Tuple[horizon_plugin_pytorch.quantization.qconfig_template.QconfigSetterBase, ...], horizon_plugin_pytorch.quantization.qconfig_template.QconfigSetterBase]] = None, verbose: int = 0) → horizon_plugin_pytorch.quantization.fx.graph_module.ObservedGraphModule
Prepare a model for quantization aware training.
参数
-
model – torch.nn.Module model or GraphModule model (maybe from fuse_fx)
-
qconfig_dict –
qconfig_dict is a dictionary with the following configurations:
qconfig_dict = {
# optional, global config
"": qconfig,
# optional, used for module types
"module_type": [
(torch.nn.Conv2d, qconfig),
...,
],
# optional, used for module names
"module_name": [
("foo.bar", qconfig)
...,
],
# priority (in increasing order):
# global, module_type, module_name, module.qconfig
# qconfig == None means quantization should be
# skipped for anything matching the rule.
# The qconfig of function or method is the same as the
# qconfig of its parent module, if it needs to be set
# separately, please wrap this function as a module.
}
- prepare_custom_config_dict –
customization configuration dictionary for quantization tool:
prepare_custom_config_dict = {
# We automativally preserve all attributes, this option is
# just in case and not likely to be used.
"preserved_attributes": ["preserved_attr"],
}
-
optimize_graph – whether to do some process on origin model for special purpose. Currently only support using torch.fx to fix cat input scale(only used on Bernoulli)
-
hybrid – Whether prepare model in hybrid mode. Default value is False and model runs on BPU completely. It should be True if the model is quantized by model convert or contains some CPU ops. In hybrid mode, ops which aren’t supported by BPU and ops which are specified by the user will run on CPU. How to set qconfig: Qconfig in hybrid mode is the same as qconfig in non-hybrid mode. For BPU op, we should ensure the input of this op is quantized, the activation qconfig of its previous non-quantstub op should not be None even if its previous non-quantstub op is a CPU op. How to specify CPU op: Define CPU module_name or module_type in hybrid_dict.
-
hybrid_dict –
hybrid_dict is a dictionary to define user-specified CPU op:
hybrid_dict = {
# optional, used for module types
"module_type": [torch.nn.Conv2d, ...],
# optional, used for module names
"module_name": ["foo.bar", ...],
}
# priority (in increasing order): module_type, module_name
# To set a function or method as CPU op, wrap it as a module.
-
opset_version – opset_version specifics the version of opset that determines the behavior of hybrid mode. Ops that in the quantized opset will be considered as quantized ops and run on BPU, while ops not in the quantized opset but in the float opset will be marked as hybrid (float) ops and run on CPU. Valid options are “hbdk3” and “hbdk4”.
-
hybrid_dict – model inputs. It is used to trace model or check model structure.
-
hybrid_dict – Qconfig setter. Only needed when using qconfig template.
-
hybrid_dict – whether check model structure. It has three levels: 0: do nothing 1: check qat model structure.
a. if model has shared ops
b. if model has unfused operations
c. model quantization config
返回
A GraphModule with fake quant modules (configured by qconfig_dict), ready for quantization aware training
Example: prepare_qat_fx example:
import torch
from horizon_plugin_pytorch.quantization import get_default_qat_qconfig
from horizon_plugin_pytorch.quantization import prepare_qat_fx
qconfig = get_default_qat_qconfig()
def train_loop(model, train_data):
model.train()
for image, target in data_loader:
...
qconfig_dict = {"": qconfig}
prepared_model = prepare_qat_fx(float_model, qconfig_dict)
# Run QAT training
train_loop(prepared_model, train_loop)
Extended tracer and wrap of torch.fx.
This file defines a inherit tracer of torch.fx.Tracer and a extended wrap to allow wrapping of user-defined Module or method, which help users do some optimization of their own module by torch.fx
horizon_plugin_pytorch.utils.fx_helper.wrap(skip_compile: bool = False)
Extend torch.fx.warp.
This function can be:
-
- called or used as a decorator on a string to register a builtin function as a “leaf function”
-
- called or used as a decorator on a function to register this function as a “leaf function”
-
- called or used as a decorator on subclass of torch.nn.Module to register this module as a “leaf module”, and register all user defined method in this class as “leaf method”
-
- called or used as a decorator on a class method to register it as “leaf method”
参数
skip_compile – Whether the wrapped part should not be compiled.
返回
The actural decorator.
返回类型
wrap_inner
ONNX
horizon_plugin_pytorch.utils.onnx_helper.export_to_onnx(model, args, f, export_params=True, verbose=False, training=<TrainingMode.EVAL: 0>, input_names=None, output_names=None, operator_export_type=<OperatorExportTypes.ONNX_FALLTHROUGH: 3>, opset_version=11, do_constant_folding=True, dynamic_axes=None, keep_initializers_as_inputs=None, custom_opsets=None)
Export a (float or qat)model into ONNX format.
参数
-
model (torch.nn.Module/torch.jit.ScriptModule/ScriptFunction) – the model to be exported.
-
args (tuple or torch.Tensor) –
args can be structured either as:
a. ONLY A TUPLE OF ARGUMENTS:
args = (x, y, z)The tuple should contain model inputs such that model(*args) is a valid invocation of the model. Any non-Tensor arguments will be hard-coded into the exported model; any Tensor arguments will become inputs of the exported model, in the order they occur in the tuple.
b. A TENSOR:
args = torch.Tensor([1])This is equivalent to a 1-ary tuple of that Tensor.
c. A TUPLE OF ARGUMENTS ENDING WITH A DICTIONARY OF NAMED ARGUMENTS:
args = (x,
{'y': input_y,
'z': input_z})All but the last element of the tuple will be passed as non-keyword arguments, and named arguments will be set from the last element. If a named argument is not present in the dictionary , it is assigned the default value, or None if a default value is not provided.
-
f – a file-like object or a string containing a file name. A binary protocol buffer will be written to this file.
-
export_params (bool, default True) – if True, all parameters will be exported.
-
verbose (bool, default False) – if True, prints a description of the model being exported to stdout, doc_string will be added to graph. doc_string may contaion mapping of module scope to node name in future torch onnx.
-
training (enum, default TrainingMode.EVAL) –
if model.training is False and in training mode if model.training is True.
TrainingMode.EVAL: export the model in inference mode.TrainingMode.PRESERVE: export the model in inference modeTrainingMode.TRAINING: export the model in training mode. Disables optimizations which might interfere with training. -
input_names (list of str, default empty list) – names to assign to the input nodes of the graph, in order.
-
output_names (list of str, default empty list) – names to assign to the output nodes of the graph, in order.
-
operator_export_type (enum, default ONNX_FALLTHROUGH) –
OperatorExportTypes.ONNX: Export all ops as regular ONNX ops (in the default opset domain).OperatorExportTypes.ONNX_FALLTHROUGH: Try to convert all ops to standard ONNX ops in the default opset domain.OperatorExportTypes.ONNX_ATEN: All ATen ops (in the TorchScript namespace “aten”) are exported as ATen ops.OperatorExportTypes.ONNX_ATEN_FALLBACK: Try to export each ATen op (in the TorchScript namespace “aten”) as a regular ONNX op. If we are unable to do so,fall back to exporting an ATen op. -
opset_version (int, default 11) – by default we export the model to the opset version of the onnx submodule.
-
do_constant_folding (bool, default False) – Apply the constant-folding optimization. Constant-folding will replace some of the ops that have all constant inputs with pre-computed constant nodes.
-
dynamic_axes
(dict<str, list(int)/dict<int, str>>, default empty dict) –By default the exported model will have the shapes of all input and output tensors set to exactly match those given in args (and example_outputs when that arg is required). To specify axes of tensors as dynamic (i.e. known only at run-time), set dynamic_axes to a dict with schema:
KEY (str): an input or output name. Each name must also be provided in input_names or output_names.VALUE (dict or list): If a dict, keys are axis indices and values are axis names. If a list, each element is an axis index. -
keep_initializers_as_inputs (bool, default None) – If True, all the initializers (typically corresponding to parameters) in the exported graph will also be added as inputs to the graph. If False, then initializers are not added as inputs to the graph, and only the non-parameter inputs are added as inputs. This may allow for better optimizations (e.g. constant folding) by backends/runtimes.
-
custom_opsets
(dict<str, int>, default empty dict) –A dict with schema:
KEY (str): opset domain nameVALUE (int): opset versionIf a custom opset is referenced by model but not mentioned in this dictionary, the opset version is set to 1.