diff options
Diffstat (limited to 'circuitpython/extmod/ulab/docs/manual/source/ulab-programming.rst')
| -rw-r--r-- | circuitpython/extmod/ulab/docs/manual/source/ulab-programming.rst | 911 |
1 files changed, 911 insertions, 0 deletions
diff --git a/circuitpython/extmod/ulab/docs/manual/source/ulab-programming.rst b/circuitpython/extmod/ulab/docs/manual/source/ulab-programming.rst new file mode 100644 index 0000000..ff1788b --- /dev/null +++ b/circuitpython/extmod/ulab/docs/manual/source/ulab-programming.rst @@ -0,0 +1,911 @@ + +Programming ulab +================ + +Earlier we have seen, how ``ulab``\ ’s functions and methods can be +accessed in ``micropython``. This last section of the book explains, how +these functions are implemented. By the end of this chapter, not only +would you be able to extend ``ulab``, and write your own +``numpy``-compatible functions, but through a deeper understanding of +the inner workings of the functions, you would also be able to see what +the trade-offs are at the ``python`` level. + +Code organisation +----------------- + +As mentioned earlier, the ``python`` functions are organised into +sub-modules at the C level. The C sub-modules can be found in +``./ulab/code/``. + +The ``ndarray`` object +---------------------- + +General comments +~~~~~~~~~~~~~~~~ + +``ndarrays`` are efficient containers of numerical data of the same type +(i.e., signed/unsigned chars, signed/unsigned integers or +``mp_float_t``\ s, which, depending on the platform, are either C +``float``\ s, or C ``double``\ s). Beyond storing the actual data in the +void pointer ``*array``, the type definition has eight additional +members (on top of the ``base`` type). Namely, the ``dtype``, which +tells us, how the bytes are to be interpreted. Moreover, the +``itemsize``, which stores the size of a single entry in the array, +``boolean``, an unsigned integer, which determines, whether the arrays +is to be treated as a set of Booleans, or as numerical data, ``ndim``, +the number of dimensions (``uint8_t``), ``len``, the length of the array +(the number of entries), the shape (``*size_t``), the strides +(``*int32_t``). The length is simply the product of the numbers in +``shape``. + +The type definition is as follows: + +.. code:: c + + typedef struct _ndarray_obj_t { + mp_obj_base_t base; + uint8_t dtype; + uint8_t itemsize; + uint8_t boolean; + uint8_t ndim; + size_t len; + size_t shape[ULAB_MAX_DIMS]; + int32_t strides[ULAB_MAX_DIMS]; + void *array; + } ndarray_obj_t; + +Memory layout +~~~~~~~~~~~~~ + +The values of an ``ndarray`` are stored in a contiguous segment in the +RAM. The ``ndarray`` can be dense, meaning that all numbers in the +linear memory segment belong to a linar combination of coordinates, and +it can also be sparse, i.e., some elements of the linear storage space +will be skipped, when the elements of the tensor are traversed. + +In the RAM, the position of the item +:math:`M(n_1, n_2, ..., n_{k-1}, n_k)` in a dense tensor of rank +:math:`k` is given by the linear combination + +:raw-latex:`\begin{equation} +P(n_1, n_2, ..., n_{k-1}, n_k) = n_1 s_1 + n_2 s_2 + ... + n_{k-1}s_{k-1} + n_ks_k = \sum_{i=1}^{k}n_is_i +\end{equation}` where :math:`s_i` are the strides of the tensor, defined +as + +:raw-latex:`\begin{equation} +s_i = \prod_{j=i+1}^k l_j +\end{equation}` + +where :math:`l_j` is length of the tensor along the :math:`j`\ th axis. +When the tensor is sparse (e.g., when the tensor is sliced), the strides +along a particular axis will be multiplied by a non-zero integer. If +this integer is different to :math:`\pm 1`, the linear combination above +cannot access all elements in the RAM, i.e., some numbers will be +skipped. Note that :math:`|s_1| > |s_2| > ... > |s_{k-1}| > |s_k|`, even +if the tensor is sparse. The statement is trivial for dense tensors, and +it follows from the definition of :math:`s_i`. For sparse tensors, a +slice cannot have a step larger than the shape along that axis. But for +dense tensors, :math:`s_i/s_{i+1} = l_i`. + +When creating a *view*, we simply re-calculate the ``strides``, and +re-set the ``*array`` pointer. + +Iterating over elements of a tensor +----------------------------------- + +The ``shape`` and ``strides`` members of the array tell us how we have +to move our pointer, when we want to read out the numbers. For technical +reasons that will become clear later, the numbers in ``shape`` and in +``strides`` are aligned to the right, and begin on the right hand side, +i.e., if the number of possible dimensions is ``ULAB_MAX_DIMS``, then +``shape[ULAB_MAX_DIMS-1]`` is the length of the last axis, +``shape[ULAB_MAX_DIMS-2]`` is the length of the last but one axis, and +so on. If the number of actual dimensions, ``ndim < ULAB_MAX_DIMS``, the +first ``ULAB_MAX_DIMS - ndim`` entries in ``shape`` and ``strides`` will +be equal to zero, but they could, in fact, be assigned any value, +because these will never be accessed in an operation. + +With this definition of the strides, the linear combination in +:math:`P(n_1, n_2, ..., n_{k-1}, n_k)` is a one-to-one mapping from the +space of tensor coordinates, :math:`(n_1, n_2, ..., n_{k-1}, n_k)`, and +the coordinate in the linear array, +:math:`n_1s_1 + n_2s_2 + ... + n_{k-1}s_{k-1} + n_ks_k`, i.e., no two +distinct sets of coordinates will result in the same position in the +linear array. + +Since the ``strides`` are given in terms of bytes, when we iterate over +an array, the void data pointer is usually cast to ``uint8_t``, and the +values are converted using the proper data type stored in +``ndarray->dtype``. However, there might be cases, when it makes perfect +sense to cast ``*array`` to a different type, in which case the +``strides`` have to be re-scaled by the value of ``ndarray->itemsize``. + +Iterating using the unwrapped loops +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The following macro definition is taken from +`vector.h <https://github.com/v923z/micropython-ulab/blob/master/code/numpy/vector/vector.h>`__, +and demonstrates, how we can iterate over a single array in four +dimensions. + +.. code:: c + + #define ITERATE_VECTOR(type, array, source, sarray) do { + size_t i=0; + do { + size_t j = 0; + do { + size_t k = 0; + do { + size_t l = 0; + do { + *(array)++ = f(*((type *)(sarray))); + (sarray) += (source)->strides[ULAB_MAX_DIMS - 1]; + l++; + } while(l < (source)->shape[ULAB_MAX_DIMS-1]); + (sarray) -= (source)->strides[ULAB_MAX_DIMS - 1] * (source)->shape[ULAB_MAX_DIMS-1]; + (sarray) += (source)->strides[ULAB_MAX_DIMS - 2]; + k++; + } while(k < (source)->shape[ULAB_MAX_DIMS-2]); + (sarray) -= (source)->strides[ULAB_MAX_DIMS - 2] * (source)->shape[ULAB_MAX_DIMS-2]; + (sarray) += (source)->strides[ULAB_MAX_DIMS - 3]; + j++; + } while(j < (source)->shape[ULAB_MAX_DIMS-3]); + (sarray) -= (source)->strides[ULAB_MAX_DIMS - 3] * (source)->shape[ULAB_MAX_DIMS-3]; + (sarray) += (source)->strides[ULAB_MAX_DIMS - 4]; + i++; + } while(i < (source)->shape[ULAB_MAX_DIMS-4]); + } while(0) + +We start with the innermost loop, the one recursing ``l``. ``array`` is +already of type ``mp_float_t``, while the source array, ``sarray``, has +been cast to ``uint8_t`` in the calling function. The numbers contained +in ``sarray`` have to be read out in the proper type dictated by +``ndarray->dtype``. This is what happens in the statement +``*((type *)(sarray))``, and this number is then fed into the function +``f``. Vectorised mathematical functions produce *dense* arrays, and for +this reason, we can simply advance the ``array`` pointer. + +The advancing of the ``sarray`` pointer is a bit more involving: first, +in the innermost loop, we simply move forward by the amount given by the +last stride, which is ``(source)->strides[ULAB_MAX_DIMS - 1]``, because +the ``shape`` and the ``strides`` are aligned to the right. We move the +pointer as many times as given by ``(source)->shape[ULAB_MAX_DIMS-1]``, +which is the length of the very last axis. Hence the the structure of +the loop + +.. code:: c + + size_t l = 0; + do { + ... + l++; + } while(l < (source)->shape[ULAB_MAX_DIMS-1]); + +Once we have exhausted the last axis, we have to re-wind the pointer, +and advance it by an amount given by the last but one stride. Keep in +mind that in the the innermost loop we moved our pointer +``(source)->shape[ULAB_MAX_DIMS-1]`` times by +``(source)->strides[ULAB_MAX_DIMS - 1]``, i.e., we re-wind it by moving +it backwards by +``(source)->strides[ULAB_MAX_DIMS - 1] * (source)->shape[ULAB_MAX_DIMS-1]``. +In the next step, we move forward by +``(source)->strides[ULAB_MAX_DIMS - 2]``, which is the last but one +stride. + +.. code:: c + + (sarray) -= (source)->strides[ULAB_MAX_DIMS - 1] * (source)->shape[ULAB_MAX_DIMS-1]; + (sarray) += (source)->strides[ULAB_MAX_DIMS - 2]; + +This pattern must be repeated for each axis of the array, and this is +how we arrive at the four nested loops listed above. + +Re-winding arrays by means of a function +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In addition to un-wrapping the iteration loops by means of macros, there +is another way of traversing all elements of a tensor: we note that, +since :math:`|s_1| > |s_2| > ... > |s_{k-1}| > |s_k|`, +:math:`P(n1, n2, ..., n_{k-1}, n_k)` changes most slowly in the last +coordinate. Hence, if we start from the very beginning, (:math:`n_i = 0` +for all :math:`i`), and walk along the linear RAM segment, we increment +the value of :math:`n_k` as long as :math:`n_k < l_k`. Once +:math:`n_k = l_k`, we have to reset :math:`n_k` to 0, and increment +:math:`n_{k-1}` by one. After each such round, :math:`n_{k-1}` will be +incremented by one, as long as :math:`n_{k-1} < l_{k-1}`. Once +:math:`n_{k-1} = l_{k-1}`, we reset both :math:`n_k`, and +:math:`n_{k-1}` to 0, and increment :math:`n_{k-2}` by one. + +Rewinding the arrays in this way is implemented in the function +``ndarray_rewind_array`` in +`ndarray.c <https://github.com/v923z/micropython-ulab/blob/master/code/ndarray.c>`__. + +.. code:: c + + void ndarray_rewind_array(uint8_t ndim, uint8_t *array, size_t *shape, int32_t *strides, size_t *coords) { + // resets the data pointer of a single array, whenever an axis is full + // since we always iterate over the very last axis, we have to keep track of + // the last ndim-2 axes only + array -= shape[ULAB_MAX_DIMS - 1] * strides[ULAB_MAX_DIMS - 1]; + array += strides[ULAB_MAX_DIMS - 2]; + for(uint8_t i=1; i < ndim-1; i++) { + coords[ULAB_MAX_DIMS - 1 - i] += 1; + if(coords[ULAB_MAX_DIMS - 1 - i] == shape[ULAB_MAX_DIMS - 1 - i]) { // we are at a dimension boundary + array -= shape[ULAB_MAX_DIMS - 1 - i] * strides[ULAB_MAX_DIMS - 1 - i]; + array += strides[ULAB_MAX_DIMS - 2 - i]; + coords[ULAB_MAX_DIMS - 1 - i] = 0; + coords[ULAB_MAX_DIMS - 2 - i] += 1; + } else { // coordinates can change only, if the last coordinate changes + return; + } + } + } + +and the function would be called as in the snippet below. Note that the +innermost loop is factored out, so that we can save the ``if(...)`` +statement for the last axis. + +.. code:: c + + size_t *coords = ndarray_new_coords(results->ndim); + for(size_t i=0; i < results->len/results->shape[ULAB_MAX_DIMS -1]; i++) { + size_t l = 0; + do { + ... + l++; + } while(l < results->shape[ULAB_MAX_DIMS - 1]); + ndarray_rewind_array(results->ndim, array, results->shape, strides, coords); + } while(0) + +The advantage of this method is that the implementation is independent +of the number of dimensions: the iteration requires more or less the +same flash space for 2 dimensions as for 22. However, the price we have +to pay for this convenience is the extra function call. + +Iterating over two ndarrays simultaneously: broadcasting +-------------------------------------------------------- + +Whenever we invoke a binary operator, call a function with two arguments +of ``ndarray`` type, or assign something to an ``ndarray``, we have to +iterate over two views at the same time. The task is trivial, if the two +``ndarray``\ s in question have the same shape (but not necessarily the +same set of strides), because in this case, we can still iterate in the +same loop. All that happens is that we move two data pointers in sync. + +The problem becomes a bit more involving, when the shapes of the two +``ndarray``\ s are not identical. For such cases, ``numpy`` defines +so-called broadcasting, which boils down to two rules. + +1. The shapes in the tensor with lower rank has to be prepended with + axes of size 1 till the two ranks become equal. +2. Along all axes the two tensors should have the same size, or one of + the sizes must be 1. + +If, after applying the first rule the second is not satisfied, the two +``ndarray``\ s cannot be broadcast together. + +Now, let us suppose that we have two compatible ``ndarray``\ s, i.e., +after applying the first rule, the second is satisfied. How do we +iterate over the elements in the tensors? + +We should recall, what exactly we do, when iterating over a single +array: normally, we move the data pointer by the last stride, except, +when we arrive at a dimension boundary (when the last axis is +exhausted). At that point, we move the pointer by an amount dictated by +the strides. And this is the key: *dictated by the strides*. Now, if we +have two arrays that are originally not compatible, we define new +strides for them, and use these in the iteration. With that, we are back +to the case, where we had two compatible arrays. + +Now, let us look at the second broadcasting rule: if the two arrays have +the same size, we take both ``ndarray``\ s’ strides along that axis. If, +on the other hand, one of the ``ndarray``\ s is of length 1 along one of +its axes, we set the corresponding strides to 0. This will ensure that +that data pointer is not moved, when we iterate over both ``ndarray``\ s +at the same time. + +Thus, in order to implement broadcasting, we first have to check, +whether the two above-mentioned rules can be satisfied, and if so, we +have to find the two new sets strides. + +The ``ndarray_can_broadcast`` function from +`ndarray.c <https://github.com/v923z/micropython-ulab/blob/master/code/ndarray.c>`__ +takes two ``ndarray``\ s, and returns ``true``, if the two arrays can be +broadcast together. At the same time, it also calculates new strides for +the two arrays, so that they can be iterated over at the same time. + +.. code:: c + + bool ndarray_can_broadcast(ndarray_obj_t *lhs, ndarray_obj_t *rhs, uint8_t *ndim, size_t *shape, int32_t *lstrides, int32_t *rstrides) { + // returns True or False, depending on, whether the two arrays can be broadcast together + // numpy's broadcasting rules are as follows: + // + // 1. the two shapes are either equal + // 2. one of the shapes is 1 + memset(lstrides, 0, sizeof(size_t)*ULAB_MAX_DIMS); + memset(rstrides, 0, sizeof(size_t)*ULAB_MAX_DIMS); + lstrides[ULAB_MAX_DIMS - 1] = lhs->strides[ULAB_MAX_DIMS - 1]; + rstrides[ULAB_MAX_DIMS - 1] = rhs->strides[ULAB_MAX_DIMS - 1]; + for(uint8_t i=ULAB_MAX_DIMS; i > 0; i--) { + if((lhs->shape[i-1] == rhs->shape[i-1]) || (lhs->shape[i-1] == 0) || (lhs->shape[i-1] == 1) || + (rhs->shape[i-1] == 0) || (rhs->shape[i-1] == 1)) { + shape[i-1] = MAX(lhs->shape[i-1], rhs->shape[i-1]); + if(shape[i-1] > 0) (*ndim)++; + if(lhs->shape[i-1] < 2) { + lstrides[i-1] = 0; + } else { + lstrides[i-1] = lhs->strides[i-1]; + } + if(rhs->shape[i-1] < 2) { + rstrides[i-1] = 0; + } else { + rstrides[i-1] = rhs->strides[i-1]; + } + } else { + return false; + } + } + return true; + } + +A good example of how the function would be called can be found in +`vector.c <https://github.com/v923z/micropython-ulab/blob/master/code/numpy/vector/vector.c>`__, +in the ``vector_arctan2`` function: + +.. code:: c + + mp_obj_t vectorise_arctan2(mp_obj_t y, mp_obj_t x) { + ... + uint8_t ndim = 0; + size_t *shape = m_new(size_t, ULAB_MAX_DIMS); + int32_t *xstrides = m_new(int32_t, ULAB_MAX_DIMS); + int32_t *ystrides = m_new(int32_t, ULAB_MAX_DIMS); + if(!ndarray_can_broadcast(ndarray_x, ndarray_y, &ndim, shape, xstrides, ystrides)) { + mp_raise_ValueError(translate("operands could not be broadcast together")); + m_del(size_t, shape, ULAB_MAX_DIMS); + m_del(int32_t, xstrides, ULAB_MAX_DIMS); + m_del(int32_t, ystrides, ULAB_MAX_DIMS); + } + + uint8_t *xarray = (uint8_t *)ndarray_x->array; + uint8_t *yarray = (uint8_t *)ndarray_y->array; + + ndarray_obj_t *results = ndarray_new_dense_ndarray(ndim, shape, NDARRAY_FLOAT); + mp_float_t *rarray = (mp_float_t *)results->array; + ... + +After the new strides have been calculated, the iteration loop is +identical to what we discussed in the previous section. + +Contracting an ``ndarray`` +-------------------------- + +There are many operations that reduce the number of dimensions of an +``ndarray`` by 1, i.e., that remove an axis from the tensor. The drill +is the same as before, with the exception that first we have to remove +the ``strides`` and ``shape`` that corresponds to the axis along which +we intend to contract. The ``numerical_reduce_axes`` function from +`numerical.c <https://github.com/v923z/micropython-ulab/blob/master/code/numerical/numerical.c>`__ +does that. + +.. code:: c + + static void numerical_reduce_axes(ndarray_obj_t *ndarray, int8_t axis, size_t *shape, int32_t *strides) { + // removes the values corresponding to a single axis from the shape and strides array + uint8_t index = ULAB_MAX_DIMS - ndarray->ndim + axis; + if((ndarray->ndim == 1) && (axis == 0)) { + index = 0; + shape[ULAB_MAX_DIMS - 1] = 0; + return; + } + for(uint8_t i = ULAB_MAX_DIMS - 1; i > 0; i--) { + if(i > index) { + shape[i] = ndarray->shape[i]; + strides[i] = ndarray->strides[i]; + } else { + shape[i] = ndarray->shape[i-1]; + strides[i] = ndarray->strides[i-1]; + } + } + } + +Once the reduced ``strides`` and ``shape`` are known, we place the axis +in question in the innermost loop, and wrap it with the loops, whose +coordinates are in the ``strides``, and ``shape`` arrays. The +``RUN_STD`` macro from +`numerical.h <https://github.com/v923z/micropython-ulab/blob/master/code/numpy/numerical/numerical.h>`__ +is a good example. The macro is expanded in the +``numerical_sum_mean_std_ndarray`` function. + +.. code:: c + + static mp_obj_t numerical_sum_mean_std_ndarray(ndarray_obj_t *ndarray, mp_obj_t axis, uint8_t optype, size_t ddof) { + uint8_t *array = (uint8_t *)ndarray->array; + size_t *shape = m_new(size_t, ULAB_MAX_DIMS); + memset(shape, 0, sizeof(size_t)*ULAB_MAX_DIMS); + int32_t *strides = m_new(int32_t, ULAB_MAX_DIMS); + memset(strides, 0, sizeof(uint32_t)*ULAB_MAX_DIMS); + + int8_t ax = mp_obj_get_int(axis); + if(ax < 0) ax += ndarray->ndim; + if((ax < 0) || (ax > ndarray->ndim - 1)) { + mp_raise_ValueError(translate("index out of range")); + } + numerical_reduce_axes(ndarray, ax, shape, strides); + uint8_t index = ULAB_MAX_DIMS - ndarray->ndim + ax; + ndarray_obj_t *results = NULL; + uint8_t *rarray = NULL; + ... + +Here is the macro for the three-dimensional case: + +.. code:: c + + #define RUN_STD(ndarray, type, array, results, r, shape, strides, index, div) do { + size_t k = 0; + do { + size_t l = 0; + do { + RUN_STD1((ndarray), type, (array), (results), (r), (index), (div)); + (array) -= (ndarray)->strides[(index)] * (ndarray)->shape[(index)]; + (array) += (strides)[ULAB_MAX_DIMS - 1]; + l++; + } while(l < (shape)[ULAB_MAX_DIMS - 1]); + (array) -= (strides)[ULAB_MAX_DIMS - 2] * (shape)[ULAB_MAX_DIMS-2]; + (array) += (strides)[ULAB_MAX_DIMS - 3]; + k++; + } while(k < (shape)[ULAB_MAX_DIMS - 2]); + } while(0) + +In ``RUN_STD``, we simply move our pointers; the calculation itself +happens in the ``RUN_STD1`` macro below. (Note that this is the +implementation of the numerically stable Welford algorithm.) + +.. code:: c + + #define RUN_STD1(ndarray, type, array, results, r, index, div) + ({ + mp_float_t M, m, S = 0.0, s = 0.0; + M = m = *(mp_float_t *)((type *)(array)); + for(size_t i=1; i < (ndarray)->shape[(index)]; i++) { + (array) += (ndarray)->strides[(index)]; + mp_float_t value = *(mp_float_t *)((type *)(array)); + m = M + (value - M) / (mp_float_t)i; + s = S + (value - M) * (value - m); + M = m; + S = s; + } + (array) += (ndarray)->strides[(index)]; + *(r)++ = MICROPY_FLOAT_C_FUN(sqrt)((ndarray)->shape[(index)] * s / (div)); + }) + +Upcasting +--------- + +When in an operation the ``dtype``\ s of two arrays are different, the +result’s ``dtype`` will be decided by the following upcasting rules: + +1. Operations with two ``ndarray``\ s of the same ``dtype`` preserve + their ``dtype``, even when the results overflow. + +2. if either of the operands is a float, the result automatically + becomes a float + +3. otherwise + + - ``uint8`` + ``int8`` => ``int16``, + + - ``uint8`` + ``int16`` => ``int16`` + + - ``uint8`` + ``uint16`` => ``uint16`` + + - ``int8`` + ``int16`` => ``int16`` + + - ``int8`` + ``uint16`` => ``uint16`` (in numpy, the result is a + ``int32``) + + - ``uint16`` + ``int16`` => ``float`` (in numpy, the result is a + ``int32``) + +4. When one operand of a binary operation is a generic scalar + ``micropython`` variable, i.e., ``mp_obj_int``, or ``mp_obj_float``, + it will be converted to a linear array of length 1, and with the + smallest ``dtype`` that can accommodate the variable in question. + After that the broadcasting rules apply, as described in the section + `Iterating over two ndarrays simultaneously: + broadcasting <#Iterating_over_two_ndarrays_simultaneously:_broadcasting>`__ + +Upcasting is resolved in place, wherever it is required. Notable +examples can be found in +`ndarray_operators.c <https://github.com/v923z/micropython-ulab/blob/master/code/ndarray_operators.c>`__ + +Slicing and indexing +-------------------- + +An ``ndarray`` can be indexed with three types of objects: integer +scalars, slices, and another ``ndarray``, whose elements are either +integer scalars, or Booleans. Since slice and integer indices can be +thought of as modifications of the ``strides``, these indices return a +view of the ``ndarray``. This statement does not hold for ``ndarray`` +indices, and therefore, the return a copy of the array. + +Extending ulab +-------------- + +The ``user`` module is disabled by default, as can be seen from the last +couple of lines of +`ulab.h <https://github.com/v923z/micropython-ulab/blob/master/code/ulab.h>`__ + +.. code:: c + + // user-defined module + #ifndef ULAB_USER_MODULE + #define ULAB_USER_MODULE (0) + #endif + +The module contains a very simple function, ``user_dummy``, and this +function is bound to the module itself. In other words, even if the +module is enabled, one has to ``import``: + +.. code:: python + + + import ulab + from ulab import user + + user.dummy_function(2.5) + +which should just return 5.0. Even if ``numpy``-compatibility is +required (i.e., if most functions are bound at the top level to ``ulab`` +directly), having to ``import`` the module has a great advantage. +Namely, only the +`user.h <https://github.com/v923z/micropython-ulab/blob/master/code/user/user.h>`__ +and +`user.c <https://github.com/v923z/micropython-ulab/blob/master/code/user/user.c>`__ +files have to be modified, thus it should be relatively straightforward +to update your local copy from +`github <https://github.com/v923z/micropython-ulab/blob/master/>`__. + +Now, let us see, how we can add a more meaningful function. + +Creating a new ndarray +---------------------- + +In the `General comments <#General_comments>`__ sections we have seen +the type definition of an ``ndarray``. This structure can be generated +by means of a couple of functions listed in +`ndarray.c <https://github.com/v923z/micropython-ulab/blob/master/code/ndarray.c>`__. + +ndarray_new_ndarray +~~~~~~~~~~~~~~~~~~~ + +The ``ndarray_new_ndarray`` functions is called by all other +array-generating functions. It takes the number of dimensions, ``ndim``, +a ``uint8_t``, the ``shape``, a pointer to ``size_t``, the ``strides``, +a pointer to ``int32_t``, and ``dtype``, another ``uint8_t`` as its +arguments, and returns a new array with all entries initialised to 0. + +Assuming that ``ULAB_MAX_DIMS > 2``, a new dense array of dimension 3, +of ``shape`` (3, 4, 5), of ``strides`` (1000, 200, 10), and ``dtype`` +``uint16_t`` can be generated by the following instructions + +.. code:: c + + size_t *shape = m_new(size_t, ULAB_MAX_DIMS); + shape[ULAB_MAX_DIMS - 1] = 5; + shape[ULAB_MAX_DIMS - 2] = 4; + shape[ULAB_MAX_DIMS - 3] = 3; + + int32_t *strides = m_new(int32_t, ULAB_MAX_DIMS); + strides[ULAB_MAX_DIMS - 1] = 10; + strides[ULAB_MAX_DIMS - 2] = 200; + strides[ULAB_MAX_DIMS - 3] = 1000; + + ndarray_obj_t *new_ndarray = ndarray_new_ndarray(3, shape, strides, NDARRAY_UINT16); + +ndarray_new_dense_ndarray +~~~~~~~~~~~~~~~~~~~~~~~~~ + +The functions simply calculates the ``strides`` from the ``shape``, and +calls ``ndarray_new_ndarray``. Assuming that ``ULAB_MAX_DIMS > 2``, a +new dense array of dimension 3, of ``shape`` (3, 4, 5), and ``dtype`` +``mp_float_t`` can be generated by the following instructions + +.. code:: c + + size_t *shape = m_new(size_t, ULAB_MAX_DIMS); + shape[ULAB_MAX_DIMS - 1] = 5; + shape[ULAB_MAX_DIMS - 2] = 4; + shape[ULAB_MAX_DIMS - 3] = 3; + + ndarray_obj_t *new_ndarray = ndarray_new_dense_ndarray(3, shape, NDARRAY_FLOAT); + +ndarray_new_linear_array +~~~~~~~~~~~~~~~~~~~~~~~~ + +Since the dimensions of a linear array are known (1), the +``ndarray_new_linear_array`` takes the ``length``, a ``size_t``, and the +``dtype``, an ``uint8_t``. Internally, ``ndarray_new_linear_array`` +generates the ``shape`` array, and calls ``ndarray_new_dense_array`` +with ``ndim = 1``. + +A linear array of length 100, and ``dtype`` ``uint8`` could be created +by the function call + +.. code:: c + + ndarray_obj_t *new_ndarray = ndarray_new_linear_array(100, NDARRAY_UINT8) + +ndarray_new_ndarray_from_tuple +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This function takes a ``tuple``, which should hold the lengths of the +axes (in other words, the ``shape``), and the ``dtype``, and calls +internally ``ndarray_new_dense_array``. A new ``ndarray`` can be +generated by calling + +.. code:: c + + ndarray_obj_t *new_ndarray = ndarray_new_ndarray_from_tuple(shape, NDARRAY_FLOAT); + +where ``shape`` is a tuple. + +ndarray_new_view +~~~~~~~~~~~~~~~~ + +This function crates a *view*, and takes the source, an ``ndarray``, the +number of dimensions, an ``uint8_t``, the ``shape``, a pointer to +``size_t``, the ``strides``, a pointer to ``int32_t``, and the offset, +an ``int32_t`` as arguments. The offset is the number of bytes by which +the void ``array`` pointer is shifted. E.g., the ``python`` statement + +.. code:: python + + a = np.array([0, 1, 2, 3, 4, 5], dtype=uint8) + b = a[1::2] + +produces the array + +.. code:: python + + array([1, 3, 5], dtype=uint8) + +which holds its data at position ``x0 + 1``, if ``a``\ ’s pointer is at +``x0``. In this particular case, the offset is 1. + +The array ``b`` from the example above could be generated as + +.. code:: c + + size_t *shape = m_new(size_t, ULAB_MAX_DIMS); + shape[ULAB_MAX_DIMS - 1] = 3; + + int32_t *strides = m_new(int32_t, ULAB_MAX_DIMS); + strides[ULAB_MAX_DIMS - 1] = 2; + + int32_t offset = 1; + uint8_t ndim = 1; + + ndarray_obj_t *new_ndarray = ndarray_new_view(ndarray_a, ndim, shape, strides, offset); + +ndarray_copy_array +~~~~~~~~~~~~~~~~~~ + +The ``ndarray_copy_array`` function can be used for copying the contents +of an array. Note that the target array has to be created beforehand. +E.g., a one-to-one copy can be gotten by + +.. code:: c + + ndarray_obj_t *new_ndarray = ndarray_new_ndarray(source->ndim, source->shape, source->strides, source->dtype); + ndarray_copy_array(source, new_ndarray); + +Note that the function cannot be used for forcing type conversion, i.e., +the input and output types must be identical, because the function +simply calls the ``memcpy`` function. On the other hand, the input and +output ``strides`` do not necessarily have to be equal. + +ndarray_copy_view +~~~~~~~~~~~~~~~~~ + +The ``ndarray_obj_t *new_ndarray = ...`` instruction can be saved by +calling the ``ndarray_copy_view`` function with the single ``source`` +argument. + +Accessing data in the ndarray +----------------------------- + +Having seen, how arrays can be generated and copied, it is time to look +at how the data in an ``ndarray`` can be accessed and modified. + +For starters, let us suppose that the object in question comes from the +user (i.e., via the ``micropython`` interface), First, we have to +acquire a pointer to the ``ndarray`` by calling + +.. code:: c + + ndarray_obj_t *ndarray = MP_OBJ_TO_PTR(object_in); + +If it is not clear, whether the object is an ``ndarray`` (e.g., if we +want to write a function that can take ``ndarray``\ s, and other +iterables as its argument), we find this out by evaluating + +.. code:: c + + mp_obj_is_type(object_in, &ulab_ndarray_type) + +which should return ``true``. Once the pointer is at our disposal, we +can get a pointer to the underlying numerical array as discussed +earlier, i.e., + +.. code:: c + + uint8_t *array = (uint8_t *)ndarray->array; + +If you need to find out the ``dtype`` of the array, you can get it by +accessing the ``dtype`` member of the ``ndarray``, i.e., + +.. code:: c + + ndarray->dtype + +should be equal to ``B``, ``b``, ``H``, ``h``, or ``f``. The size of a +single item is stored in the ``itemsize`` member. This number should be +equal to 1, if the ``dtype`` is ``B``, or ``b``, 2, if the ``dtype`` is +``H``, or ``h``, 4, if the ``dtype`` is ``f``, and 8 for ``d``. + +Boilerplate +----------- + +In the next section, we will construct a function that generates the +element-wise square of a dense array, otherwise, raises a ``TypeError`` +exception. Dense arrays can easily be iterated over, since we do not +have to care about the ``shape`` and the ``strides``. If the array is +sparse, the section `Iterating over elements of a +tensor <#Iterating-over-elements-of-a-tensor>`__ should contain hints as +to how the iteration can be implemented. + +The function is listed under +`user.c <https://github.com/v923z/micropython-ulab/tree/master/code/user/>`__. +The ``user`` module is bound to ``ulab`` in +`ulab.c <https://github.com/v923z/micropython-ulab/tree/master/code/ulab.c>`__ +in the lines + +.. code:: c + + #if ULAB_USER_MODULE + { MP_ROM_QSTR(MP_QSTR_user), MP_ROM_PTR(&ulab_user_module) }, + #endif + +which assumes that at the very end of +`ulab.h <https://github.com/v923z/micropython-ulab/tree/master/code/ulab.h>`__ +the + +.. code:: c + + // user-defined module + #ifndef ULAB_USER_MODULE + #define ULAB_USER_MODULE (1) + #endif + +constant has been set to 1. After compilation, you can call a particular +``user`` function in ``python`` by importing the module first, i.e., + +.. code:: python + + from ulab import numpy as np + from ulab import user + + user.some_function(...) + +This separation of user-defined functions from the rest of the code +ensures that the integrity of the main module and all its functions are +always preserved. Even in case of a catastrophic failure, you can +exclude the ``user`` module, and start over. + +And now the function: + +.. code:: c + + static mp_obj_t user_square(mp_obj_t arg) { + // the function takes a single dense ndarray, and calculates the + // element-wise square of its entries + + // raise a TypeError exception, if the input is not an ndarray + if(!mp_obj_is_type(arg, &ulab_ndarray_type)) { + mp_raise_TypeError(translate("input must be an ndarray")); + } + ndarray_obj_t *ndarray = MP_OBJ_TO_PTR(arg); + + // make sure that the input is a dense array + if(!ndarray_is_dense(ndarray)) { + mp_raise_TypeError(translate("input must be a dense ndarray")); + } + + // if the input is a dense array, create `results` with the same number of + // dimensions, shape, and dtype + ndarray_obj_t *results = ndarray_new_dense_ndarray(ndarray->ndim, ndarray->shape, ndarray->dtype); + + // since in a dense array the iteration over the elements is trivial, we + // can cast the data arrays ndarray->array and results->array to the actual type + if(ndarray->dtype == NDARRAY_UINT8) { + uint8_t *array = (uint8_t *)ndarray->array; + uint8_t *rarray = (uint8_t *)results->array; + for(size_t i=0; i < ndarray->len; i++, array++) { + *rarray++ = (*array) * (*array); + } + } else if(ndarray->dtype == NDARRAY_INT8) { + int8_t *array = (int8_t *)ndarray->array; + int8_t *rarray = (int8_t *)results->array; + for(size_t i=0; i < ndarray->len; i++, array++) { + *rarray++ = (*array) * (*array); + } + } else if(ndarray->dtype == NDARRAY_UINT16) { + uint16_t *array = (uint16_t *)ndarray->array; + uint16_t *rarray = (uint16_t *)results->array; + for(size_t i=0; i < ndarray->len; i++, array++) { + *rarray++ = (*array) * (*array); + } + } else if(ndarray->dtype == NDARRAY_INT16) { + int16_t *array = (int16_t *)ndarray->array; + int16_t *rarray = (int16_t *)results->array; + for(size_t i=0; i < ndarray->len; i++, array++) { + *rarray++ = (*array) * (*array); + } + } else { // if we end up here, the dtype is NDARRAY_FLOAT + mp_float_t *array = (mp_float_t *)ndarray->array; + mp_float_t *rarray = (mp_float_t *)results->array; + for(size_t i=0; i < ndarray->len; i++, array++) { + *rarray++ = (*array) * (*array); + } + } + // at the end, return a micropython object + return MP_OBJ_FROM_PTR(results); + } + +To summarise, the steps for *implementing* a function are + +1. If necessary, inspect the type of the input object, which is always a + ``mp_obj_t`` object +2. If the input is an ``ndarray_obj_t``, acquire a pointer to it by + calling ``ndarray_obj_t *ndarray = MP_OBJ_TO_PTR(arg);`` +3. Create a new array, or modify the existing one; get a pointer to the + data by calling ``uint8_t *array = (uint8_t *)ndarray->array;``, or + something equivalent +4. Once the new data have been calculated, return a ``micropython`` + object by calling ``MP_OBJ_FROM_PTR(...)``. + +The listing above contains the implementation of the function, but as +such, it cannot be called from ``python``: it still has to be bound to +the name space. This we do by first defining a function object in + +.. code:: c + + MP_DEFINE_CONST_FUN_OBJ_1(user_square_obj, user_square); + +``micropython`` defines a number of ``MP_DEFINE_CONST_FUN_OBJ_N`` macros +in +`obj.h <https://github.com/micropython/micropython/blob/master/py/obj.h>`__. +``N`` is always the number of arguments the function takes. We had a +function definition ``static mp_obj_t user_square(mp_obj_t arg)``, i.e., +we dealt with a single argument. + +Finally, we have to bind this function object in the globals table of +the ``user`` module: + +.. code:: c + + STATIC const mp_rom_map_elem_t ulab_user_globals_table[] = { + { MP_OBJ_NEW_QSTR(MP_QSTR___name__), MP_OBJ_NEW_QSTR(MP_QSTR_user) }, + { MP_OBJ_NEW_QSTR(MP_QSTR_square), (mp_obj_t)&user_square_obj }, + }; + +Thus, the three steps required for the definition of a user-defined +function are + +1. The low-level implementation of the function itself +2. The definition of a function object by calling + MP_DEFINE_CONST_FUN_OBJ_N() +3. Binding this function object to the namespace in the + ``ulab_user_globals_table[]`` |
