Version 7.1.0

Sept 9, 2020

Release Notes

BitMagic v.7.1.0 focuses on improving support for ARM CPU.

  1. Fixed bug (rooted in 6.x.x versions) of AND operation. This is rare, but potentially serious issue, please upgrade your version asap.
  2. ARM CPU is getting much attention nowdays, BitMagic library succinct containers are very useful in situations of limited memory and edge computing. This release improves suppirt there.

    BitMagic v.7.1.0 was formally tested and verified to work on 32-bit ARM (Raspberry Pi). Formal ARM testing showed certain pain points of BitMagic (bitscan algorithms reliance on POPCNT). As part of 7.1.0 release we introduced a different approach to bitscan based on CLZ (count trailing/leading zeroes) which is available natively on ARM. This resulted in significant improvement in performance of bvector<>::enumerator on ARM.

    ARM testing also showed ways to improve Rank-Select algorithms and succinct containers based on Rank. Some optimizations are still in progress, subject of follow up releases.

    ARM NEON SIMD support is planned for the future.

  3. Technical notes

    ARM testing identified bitscan algorithm as a common hotspot on ARM (Raspberry Pi 8GB, overclocked to 2GHz). x86 used either LUT based bit-counting or hardware accelerated POPCNT instruction for bitscan decode. BitScan is used to decode a bit-vector into indexes of set bits (inverted list). It is also an important operation for decode of bit-transposed succinct structures.

    There are multiple variants of BitScan of various levels of (in-)efficiency, here are two common and practically useful variants.

    BitScan based on POPCNT (best for x86)

    
    template <typename B >
    unsigned short
    bitscan_popcnt(bm::id_t w, B* bits, unsigned short offs) 
    {
        unsigned pos = 0;
        while (w)
        {
            bm::id_t t = w & -w;
            bits[pos++] = (B)(bm::word_bitcount(t - 1) + offs);
            w &= w - 1;
        }
        return (unsigned short)pos;
    }
    
    

    BitScan based on Trailing Zeroes (LZCNT) (best for ARM)

                                                          
    
    template < typename B >
    unsigned short bitscan_bsf(unsigned w, B* bits) 
    {
        unsigned short pos = 0;
        while (w)
        {
            bits[pos++] = count_trailing_zeros_u32(w);
            w &= w - 1;
        }
        return pos;
    }
    
    
    

    Practical difference in the context of a C++ program implementing STL-like iterator or bit-visiting algorithms can be up to 30%.

    Download v7.1.0 (Sourceforge) GitHub

Follow us on Twitter