Vectorization helpers. More...

#include "CxxUtils/features.h"
#include "CxxUtils/inline_hints.h"
#include <cstdlib>
#include <cstring>
#include <type_traits>
#include "CxxUtils/vec_fb.h"

Include dependency graph for vec.h:

This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Classes
struct	CxxUtils::vecDetail::vec_typedef< T, N >
	check the type and the size of the vector. More...

struct	CxxUtils::vecDetail::vec_type< VEC >
	Deduce the element type from a vectorized type. More...

struct	CxxUtils::vecDetail::vec_mask_type< VEC >
	Deduce the type of the mask returned by relational operations, for a vectorized type. More...

struct	CxxUtils::vecDetail::bool_pack_helper::bool_pack<... >

Namespaces
	CxxUtils

	CxxUtils::vecDetail

	CxxUtils::vecDetail::bool_pack_helper
	Helper for static asserts for argument packs.

Macros
#define	WANT_VECTOR_FALLBACK 0

Typedefs
template<bool... bs>
using	CxxUtils::vecDetail::bool_pack_helper::all_true = std::is_same< bool_pack< bs..., true >, bool_pack< true, bs... > >

template<typename T , size_t N>
using	CxxUtils::vec = typename vecDetail::vec_typedef< T, N >::type
	Define a nice alias for the vectorized type. More...

template<class VEC >
using	CxxUtils::vec_type_t = typename vecDetail::vec_type< VEC >::type
	Define a nice alias for the element type of a vectorized type. More...

template<class VEC >
using	CxxUtils::vec_mask_type_t = typename vecDetail::vec_mask_type< VEC >::type
	Define a nice alias for the mask type for a vectorized type. More...

Functions
template<class VEC >
constexpr ATH_ALWAYS_INLINE size_t	CxxUtils::vec_size ()
	Return the number of elements in a vectorized type. More...

template<class VEC >
constexpr ATH_ALWAYS_INLINE size_t	CxxUtils::vec_size (const VEC &)
	Return the number of elements in a vectorized type. More...

template<typename VEC , typename T >
ATH_ALWAYS_INLINE void	CxxUtils::vbroadcast (VEC &v, T x)
	Copy a scalar to each element of a vectorized type. More...

template<typename VEC >
ATH_ALWAYS_INLINE void	CxxUtils::vload (VEC &dst, vec_type_t< VEC > const *src)

template<typename VEC >
ATH_ALWAYS_INLINE void	CxxUtils::vstore (vec_type_t< VEC > *dst, const VEC &src)

template<typename VEC >
ATH_ALWAYS_INLINE void	CxxUtils::vselect (VEC &dst, const VEC &a, const VEC &b, const vec_mask_type_t< VEC > &mask)

template<typename VEC >
ATH_ALWAYS_INLINE void	CxxUtils::vmin (VEC &dst, const VEC &a, const VEC &b)

template<typename VEC >
ATH_ALWAYS_INLINE void	CxxUtils::vmax (VEC &dst, const VEC &a, const VEC &b)

template<typename VEC >
ATH_ALWAYS_INLINE bool	CxxUtils::vany (const VEC &mask)

template<typename VEC >
ATH_ALWAYS_INLINE bool	CxxUtils::vnone (const VEC &mask)

template<typename VEC >
ATH_ALWAYS_INLINE bool	CxxUtils::vall (const VEC &mask)

template<typename VEC1 , typename VEC2 >
ATH_ALWAYS_INLINE void	CxxUtils::vconvert (VEC1 &dst, const VEC2 &src)
	performs dst is the result of a static cast of each element of src More...

template<size_t... Indices, typename VEC , typename VEC1 >
ATH_ALWAYS_INLINE void	CxxUtils::vpermute (VEC1 &dst, const VEC &src)
	vpermute function. More...

template<size_t... Indices, typename VEC , typename VEC1 >
ATH_ALWAYS_INLINE void	CxxUtils::vpermute2 (VEC1 &dst, const VEC &src1, const VEC &src2)
	vpermute2 function. More...

Detailed Description

Vectorization helpers.

Author: scott snyder snyde.nosp@m.r@bn.nosp@m.l.gov; Christos Anastopoulos (helper methods)

Date: Mar, 2020 gcc and clang provide built-in types for writing vectorized code, using the vector_size attribute. This usually results in code that is much easier to read and more portable than one would get using intrinsics directly. However, it is still non-standard, and there are some operations which are kind of awkward.

This file provides some helpers for writing vectorized code in C++.

A vectorized type may be named as CxxUtils::vec<T, N>. Here T is the element type, which should be an elementary integer or floating-point type. N is the number of elements in the vector; it should be a power of 2. This will either be a built-in vector type if the vector_size attribute is supported or a fallback C++ class intended to be (mostly) functionally equivalent (see vec_fb.h)

The GCC, clang and fallback vector types support: ++, –, +,-,*,/,%, =, &,|,^,~, >>,<<, !, &&, ||, ==, !=, >, <, >=, <=, =, sizeof and Initialization from brace-enclosed lists

Furthermore the GCC and clang vector types support the ternary operator.

We also support some additional operations.

Deducing useful types:

CxxUtils::vec_type_t<VEC> is the element type of VEC.
CxxUtils::vec_mask_type_t<VEC> is the vector type return by relational operations.

Deducing the num of elements in a vectorized type:

CxxUtils::vec_size<VEC>() is the number of elements in VEC.
CxxUtils::vec_size(const VEC&) is the number of elements in VEC.

Initializing with a value :

CxxUtils::vbroadcast (VEC& v, T x) initializes each element of v with x.

Load from/store to array:

CxxUtils::vload (VEC& dst, const vec_type_t<VEC>* src) loads elements from src to dst
CxxUtils::vstore (vec_type_t<VEC>* dst, const VEC& src) stores elements from src to dst Basic Algorithms :
CxxUtils::vselect (VEC& dst, const VEC& a, const VEC& b, const vec_mask_type_t<VEC>& mask) copies elements from a or b, depending on the value of mask to dst. dst[i] = mask[i] ? a[i] : b[i]
CxxUtils::vmin (VEC& dst, const VEC& a, const VEC& b) copies to dst[i] the min(a[i],b[i])
CxxUtils::vmax (VEC& dst, const VEC& a, const VEC& b) copies to dst[i] the max(a[i],b[i])

Bool reductions :

CxxUtils::vany(const VEC& mask) Returns true if at least one value in mask is true.
CxxUtils::vnone(const VEC& mask) Returns true if all values in k are false
CxxUtils::vall(const VEC& mask) Returns true if all values in k are true

Conversions/Casting :

CxxUtils::vconvert (VEC1& dst, const VEC2& src) Fills dst with the result of a static_cast of every element of src to the element type of dst. dst[i] = static_cast<vec_type_t<VEC1>>(src[i])

Permutations :

The destination is a vector with the same element type as the source vector(s) but that has an element count equal to the number of indices specified

CxxUtils::vpermute<mask> (VEC& dst, const VEC& src) Fills dst with permutation of src according to mask. mask is a list of integers that specifies the elements that should be extracted and returned in src. dst[i] = src[mask[i]] where mask[i] is the ith integer in the mask.
CxxUtils::vpermute2<mask> (VEC& dst, const VEC& src1,const VEC& src2) Fills dst with permutation of src1 and src2 according to mask. mask is a list of integers that specifies the elements that should be extracted from src1 and src2. An index i in the interval [0,N) indicates that element number i from the first input vector should be placed in the corresponding position in the result vector. An index in the interval [N,2N) indicates that the element number i-N from the second input vector should be placed in the corresponding position in the result vector.

For good performance the user should use vector types that fit the size of the ISA. e.g 128 bit wide for SSE, 256 wide for AVX etc.

Specifying a combination that is not valid for the current architecture causes the compiler to synthesize the instructions using a narrower mode. But this might not always produce optimal code for all operations.

Consider using Function Multiversioning (CxxUtils/features.h) if you really need to target efficiently multiple ISAs.

Definition in file vec.h.

Macro Definition Documentation

◆ WANT_VECTOR_FALLBACK

#define WANT_VECTOR_FALLBACK 0

Definition at line 144 of file vec.h.

Classes

Namespaces

Macros

Typedefs

Functions

Detailed Description

Macro Definition Documentation

◆ WANT_VECTOR_FALLBACK