Pack/unpack floating-point data from/to a given number of bits. More...
#include <FloatPacker.h>
Public Types | |
typedef uint32_t | Packdest |
Type into which we pack. | |
Public Member Functions | |
FloatPacker (int nbits, int nmantissa, double scale=1, bool is_signed=true, bool round=false) | |
Constructor. | |
bool | errcheck (std::string &err) const |
Check to see if an error occurred. | |
Packdest | pack (double src) const |
Pack a value. | |
double | unpack (Packdest val) const |
Unpack the value VAL . |
Pack/unpack floating-point data from/to a given number of bits.
The format is specified by the following parameters.
nbits - The total number of bits in the representation. scale - Scale factor to apply before storing. nmantissa - The number of bits to use for the mantissa and sign bit. is_signed - Flag to tell if we should use a sign bit. round - Flag to tell if we should round or truncate.
From these we derive:
npack = nmantissa, if is_signed is false. = nmantissa-1 if is_signed is true. nexp = nbits - nmantissa
The format consists of, in order from high bits to low bits:
The number is stored in normalized form, with an exponent bias of 2^(nexp-1). But if the (biased) exponent is zero, then the mantissa is stored in denormalized form. If nexp==0, this gives a fixed-point representation in the range [0,1). 0 is represented by all bits 0; if we have a sign bit, we can also represent -0 by all bits 0 except for the sign bit.
CxxUtils::FloatPacker::FloatPacker | ( | int | nbits, | |
int | nmantissa, | |||
double | scale = 1 , |
|||
bool | is_signed = true , |
|||
bool | round = false | |||
) |
Constructor.
nbits | The number of bits in the packed representation. | |
nmantissa | The number of bits to use for the mantissa and sign bit. | |
scale | Divide the input number by this before packing. | |
is_signed | If true, then one mantissa bit is used for a sign. | |
round | If true, numbers will be rounded. Otherwise, they will be truncated. |
bool CxxUtils::FloatPacker::errcheck | ( | std::string & | err | ) | const |
Check to see if an error occurred.
err[out] | If an error occurred, a description of it. |
errcheck
. FloatPacker::Packdest CxxUtils::FloatPacker::pack | ( | double | src | ) | const |
Pack a value.
src | Value to pack. |
For now, we convert floats to doubles before packing.
double CxxUtils::FloatPacker::unpack | ( | Packdest | val | ) | const |
Unpack the value VAL
.
val | The packed data. It should start with the low bit, and any extraneous bits should have been masked off. |