Huffman Coding library implemented in C

Question

I have written a small library for C that implements the Huffman coding algorithm as outlined in David Huffman’s paper on Minimum-Redundancy Codes, and a small test program to implement it. The library consists of two primary functions, huffman_encode() and huffman_decode(), and a series of helper functions. ~~The program also uses a linked list library which I have written and included below although I am not looking for a review of the linked list in this question.~~ Edit: The linked list library has been fully removed from the latest version of the code

Here is what I would like critiqued:

Current runtime memory usage is quite high (my test program allocates >60kb), how can I reduce on this? Are there redundancies I can get rid of or can I improve my implementation of the algorithm? I’m fairly sure I can ditch the linked list…
What are some ways I can reduce the header size to improve the compression ratios?
I feel like my code is over-engineered, how could I simplify things without losing performance?
How is my documentation/commenting (Particularly in huffman.c)? I know that the code is quite obfuscated in places but I was wondering if it made the library easy enough to understand
Are there any general performance chokes that can be removed?
Would there be any major issues porting my code to other systems? I have never used bit level operations before and I’ve heard that endianness can be a problem when using them
Are there any bugs you can find? I have tested it extensively but there’s always the chance that I’ve missed something. Are there any input strings that can break my code?

I compiled this code on the latest version of Debian Linux (4.9.88-1+deb9u1) using:

clang *.c -o huffman -std=c11 -Wall -Wextra -Wpedantic -O2

huffman.h

#ifndef HUFFMAN_H
#define HUFFMAN_H

/* Header files */

#include <stdbool.h>
#include <stdint.h>

/* Return values */

#define EXIT_SUCCESS 0
#define MEM_ERROR 1
#define INPUT_ERROR 2

/* Node identifiers, might change to enumeration */

#define INTERNAL_NODE 0
#define CHARACTER_NODE 1

/* Size of the header with no characters stored */

#define HEADER_BASE_SIZE 10

/* Huffman Tree node */

typedef struct huffman_node_t {
    size_t freq;
    struct huffman_node_t * child[2];
    char c;
} huffman_node_t;

/* User Functions */

int huffman_decode(uint8_t * input, char ** output);
int huffman_encode(char * input, uint8_t ** output);

/* Helper Functions */

/* Encoding */

int huff_tree_from_freq(size_t * freq, huffman_node_t ** head_node);
int node_compare(const void * first_node, const void * second_node);
int node_compare_priority(const void * first_node, const void * second_node);
void code_array_from_tree(huffman_node_t * node, uint8_t huffman_array[256][2], uint8_t bits_set);

/* Decoding */

char is_char_node(uint8_t byte, uint8_t bits_left, huffman_node_t * node);
int huff_tree_from_codes(huffman_node_t ** head_node, uint8_t huffman_array[256][2]);
int add_char_to_tree(huffman_node_t * node, char c, uint8_t byte, uint8_t bits_left, uint8_t curr_bit);

/* Universal */

huffman_node_t * create_char_node(char c, size_t freq);
huffman_node_t * create_internal_node(huffman_node_t * first_child, huffman_node_t * second_child);
void destroy_huff_tree(huffman_node_t * node);
void node_print(const void * element);

#endif

huffman.c

/* 
 *  Filename:   huffman.c
 *  Author:     Jess Turner
 *  Date:       17/07/18
 *  Licence:    GNU GPL V3
 *
 *  Encodes and decodes a byte stream using huffman coding
 *
 *  Return/exit codes:
 *      EXIT_SUCCESS    - No error
 *      MEM_ERROR       - Memory allocation error
 *      INPUT_ERROR     - No input given
 *
 *  User Functions:
 *      - huffman_encode()      - Encodes a string using Huffman coding
 *      - huffman_decode()      - Decodes a Huffman encoded string 
 *
 *  Helper Functions:
 *
 *      Encoding:
 *          - huff_tree_from_freq()     - Generate a Huffman tree from a frequency analysis
 *          - code_array_from_tree()    - Generate a "code array" from the huffman tree, used for fast encoding
 *          - node_compare()            - Calculate the difference in frequency between two nodes
 *          - node_compare_priority()   - Modified version of node_compare() which prioritises character nodes over internal nodes when frequencies are equal
 *
 *      Decoding:
 *          - huff_tree_from_codes()    - Generates a Huffman tree from a stored "code array"
 *          - is_char_node()            - Determine if a given byte is a character node in a Huffman tree
 *          - add_char_to_tree()        - Adds a character and its encoding byte to a Huffman tree
 *
 *      Universal:
 *          - create_char_node()        - Generate a character node
 *          - create_internal_node()    - Generate an internal node
 *          - destroy_huff_tree()       - Traverses the tree and frees all memory associated with it
 *          - node_print()              - Debugging function used to print information about a node, can be passed to dlist_operate() to print all nodes in the priority queue
 *
 *  Data structures:
 *
 *      Code array:
 *          - Fast way to encode data using the information generated from a Huffman tree and an easy way to store a representation of the tree
 *          - 2D array that represents each byte to be encoded and how it is encoded allowing for O(1) time to determine how a given byte is encoded
 *          - Position in the array (i.e. code_array[0-255]) represents the byte to be encoded
 *          - The first element at each position (i.e. code_array[byte][0]) represents the way the byte is encoded
 *          - The seconde element at each position (i.e. code_array[byte][1]) represents the number of bits that encode the byte
 *
 *      Huffman tree:
 *          - Binary tree that operates much like any other Huffman tree
 *          - Contains two types of nodes, internal nodes and character nodes
 *          - Every node contains either the frequency of the character it represents or the combined frequencies of its child nodes
 *
 *  Encoded data format:
 *
 *      - Header
 *          - Compressed string length (uint32_t stored as 4 uint8_t's)
 *          - Decompressed string length (uint32_t stored as 4 uint8_t's)
 *          - Header size (uint16_t stored as 2 uint8_t's)
 *          - Huffman tree stored as a "code array" (3 bytes per character: encoded character, encoding byte, number of bits set)
 *      - Encoded data
 *
 *  The future:
 *      - (Possibly) Modify decoding algorithm to use a hash table lookup rather than tree recursion, might be faster
 *      - Find way to reduce header size, possibly using the huffman algorithm twice to encode the header?
 *      - Look into using a terminator byte instead of storing length, might reduce total size
 *      - Combine with duplicate string removal and make full LZW compression
 *
 */

#include <ctype.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include "dlist_library.h"
#include "huffman.h"

int huff_tree_from_freq(size_t * freq, huffman_node_t ** head_node) {
    dlist_t * node_queue;
    huffman_node_t * char_node_list[256] = { NULL };
    huffman_node_t * first_temp_node, * second_temp_node, * internal_node;
    size_t node_count = 0;

    if(!(node_queue = dlist_init(sizeof(huffman_node_t *))))
        return MEM_ERROR;

    for(uint16_t i = 0; i < 256; i++)
        if(freq[i] && !(char_node_list[node_count++] = create_char_node(i - 128, freq[i])))
            return MEM_ERROR;

    for(size_t i = 0; i < node_count; i++)
        if(dlist_push(&char_node_list[i], node_queue) != LIST_OK)
            return MEM_ERROR;

    dlist_sort(&node_compare, node_queue);

    while(node_count-- > 1) {
        dlist_pop(&first_temp_node, node_queue);
        dlist_pop(&second_temp_node, node_queue);

        if(!(internal_node = create_internal_node(first_temp_node, second_temp_node)))
            return MEM_ERROR;

        if(dlist_push(&internal_node, node_queue) != LIST_OK)
            return MEM_ERROR;

        dlist_sort(&node_compare_priority, node_queue);
    }

    dlist_pop(head_node, node_queue);
    dlist_destroy(node_queue);

    return EXIT_SUCCESS;
}

void destroy_huff_tree(huffman_node_t * node)
{
    if(node->child[0]) {
        destroy_huff_tree(node->child[0]);
        destroy_huff_tree(node->child[1]);
    }

    free(node);

    return;
}

int huffman_encode(char * input, uint8_t ** output)
{
    size_t freq[256]        = { 0 };
    uint16_t header_size    = HEADER_BASE_SIZE;
    uint32_t length         = strlen(input);

    for(size_t i = 0; i < length; i++)
        freq[input[i] + 128]++;

    for(uint16_t i = 0; i < 256; i++)
        if(freq[i])
            header_size += 3;

    /* Handle strings with either one unique byte or zero bytes */

    if(header_size == HEADER_BASE_SIZE) {
        return INPUT_ERROR;
    } else if(header_size == HEADER_BASE_SIZE + 3) {
        for(uint16_t i = 0; i < 256; i++)
            if(freq[i])
                ++freq[i > 0 ? i - 1 : i + 1];
            
        header_size += 3;
    }

    huffman_node_t * head_node = NULL;

    if(huff_tree_from_freq(freq, &head_node) != EXIT_SUCCESS)
        return MEM_ERROR;

    uint8_t codes[256][2] = {{ 0 }};

    code_array_from_tree(head_node, codes, 0);
    destroy_huff_tree(head_node);

    size_t encoded_bit_len = 0;

    /* Use the generated code array to calculate the byte length of the output */

    for(size_t i = 0; i < length; i++)
        encoded_bit_len += codes[input[i] + 128][1];

    size_t encoded_byte_len = (encoded_bit_len >> 3) + !!(encoded_bit_len & 0x7); /* Calculate bit length / 8, add one if there's a remainder */

    uint8_t * str_out = NULL;

    if(!(str_out = calloc(encoded_byte_len + header_size + 1, sizeof(uint8_t))))
        return MEM_ERROR;

    /* Write header information */

    /* Bit level hack to store uint32_t's and uint16_t's in an array of uint8_t's */

    str_out[0] = (uint8_t)length;
    str_out[1] = (uint8_t)(length >> 0x8);
    str_out[2] = (uint8_t)(length >> 0x10);
    str_out[3] = (uint8_t)(length >> 0x18); 

    str_out[4] = (uint8_t)encoded_byte_len;
    str_out[5] = (uint8_t)(encoded_byte_len >> 0x8);
    str_out[6] = (uint8_t)(encoded_byte_len >> 0x10);
    str_out[7] = (uint8_t)(encoded_byte_len >> 0x18);

    str_out[8] = (uint8_t)header_size;
    str_out[9] = (uint8_t)(header_size >> 0x8);

    size_t byte_pos = HEADER_BASE_SIZE;

    /* Store the encoding information */

    for(uint16_t i = 0; i < 256; i++) {
        if(codes[i][1]) {
            str_out[byte_pos++] = i;
            str_out[byte_pos++] = codes[i][0];
            str_out[byte_pos++] = codes[i][1];
        }
    }

    /* Encode output stream */

    for(size_t i = 0, bit_pos = 0; i < length; i++) {
        for(size_t j = 0; j < codes[input[i] + 128][1]; j++) {
            str_out[byte_pos] |= ((codes[input[i] + 128][0] >> j) & 0x1) << bit_pos;

            if(++bit_pos == 8) {
                bit_pos = 0;
                byte_pos++;
            }
        }
    }

    *output = str_out;

    return EXIT_SUCCESS;
}

int huffman_decode(uint8_t * input, char ** output)
{
    size_t byte_pos         = 0;
    uint8_t bit_pos         = 0;
    uint8_t min_length      = 8;
    uint8_t codes[256][2]   = {{ 0 }};
    uint32_t char_count     = 0;

    /* Extract header information and build code array */

    uint32_t length = * (uint32_t *) &input[0];
    uint16_t header_size = * (uint16_t *) &input[8];

    for(byte_pos = HEADER_BASE_SIZE; byte_pos < header_size; byte_pos += 3) {
        codes[input[byte_pos]][0] = input[byte_pos + 1];
        codes[input[byte_pos]][1] = input[byte_pos + 2];

        if(codes[input[byte_pos]][1] < min_length)
            min_length = codes[input[byte_pos]][1]; /* By knowing the smallest encoding length we can speed up the recursive decoding */
    }

    char * str_out = NULL;

    if(!(str_out = calloc(length + 1, sizeof(char))))
        return MEM_ERROR;

    huffman_node_t * head_node = NULL;

    if(huff_tree_from_codes(&head_node, codes) == MEM_ERROR)
        return MEM_ERROR;

    /* Decode input stream */

    while(char_count < length) {
        for(uint8_t i = 0, byte = 0; i < 8; i++) {
            byte |= ((input[byte_pos] >> bit_pos) & 0x1) << i;
            
            if(++bit_pos == 8) {
                bit_pos = 0;
                byte_pos++;
            }

            if(i + 1 >= min_length && (str_out[char_count] = is_char_node(byte, i + 1, head_node)) != '') {
                char_count++;
                break;
            }
        }
    }

    destroy_huff_tree(head_node);

    str_out[char_count] = '';
    *output = str_out;

    return EXIT_SUCCESS;
}

char is_char_node(uint8_t byte, uint8_t bits_left, huffman_node_t * node)
{
    static uint8_t bit_pos = 0;

    return (!bits_left) ?
            (bit_pos = 0, !node->child[0]) ?
                node->c :
                '' :
            is_char_node(byte, bits_left - 1, node->child[((byte >> bit_pos++) & 0x1)]); /* This return is the best and worst line of code I've ever written */
}

void code_array_from_tree(huffman_node_t * node, uint8_t huffman_array[256][2], uint8_t bits_set)
{
    static uint8_t byte = '';

    if(node->child[0]) {
        byte &= ~(0x1 << bits_set);
        code_array_from_tree(node->child[0], huffman_array, bits_set + 1);
        byte |= 0x1 << bits_set;
        code_array_from_tree(node->child[1], huffman_array, bits_set + 1);
    } else {
        huffman_array[node->c + 128][0] = byte;
        huffman_array[node->c + 128][1] = bits_set;
    }
}

int huff_tree_from_codes(huffman_node_t ** head_node, uint8_t huffman_array[256][2])
{
    if(!(*head_node = malloc(sizeof(huffman_node_t))))
        return MEM_ERROR;

    (*head_node)->child[0] = NULL;
    (*head_node)->child[1] = NULL;

    for(uint16_t i = 0; i < 256; i++)
        if(huffman_array[i][1])
            if(add_char_to_tree(*head_node, (char)(i - 128), huffman_array[i][0], huffman_array[i][1] - 1, 0) != EXIT_SUCCESS)
                return MEM_ERROR;

    return EXIT_SUCCESS;
}

int add_char_to_tree(huffman_node_t * node, char c, uint8_t byte, uint8_t bits_left, uint8_t curr_bit)
{
    const uint8_t branch = (byte >> curr_bit) & 0x1;

    if(!node->child[branch]) {
        if(!(node->child[branch] = malloc(sizeof(huffman_node_t))))
            return MEM_ERROR;
    
        node->child[branch]->child[0] = NULL;
        node->child[branch]->child[1] = NULL;
    }

    if(bits_left) {
        return add_char_to_tree(node->child[branch], c, byte, bits_left - 1, curr_bit + 1);
    } else {
        node->child[branch]->c = c;
        return EXIT_SUCCESS;
    }
}

huffman_node_t * create_char_node(char c, size_t freq) {
    huffman_node_t * node;

    if(!(node = malloc(sizeof(huffman_node_t))))
        return NULL;

    node->freq = freq;
    node->child[0] = NULL;
    node->child[1] = NULL;
    node->c = c;

    return node;
}

huffman_node_t * create_internal_node(huffman_node_t * first_child, huffman_node_t * second_child) {
    huffman_node_t * node;

    if(!(node = malloc(sizeof(huffman_node_t))))
        return NULL;

    node->freq = first_child->freq + second_child->freq;
    node->child[0] = first_child;
    node->child[1] = second_child;

    return node;
}

int node_compare_priority(const void * first_node, const void * second_node) {
    return !((*(huffman_node_t **)first_node)->freq - (*(huffman_node_t **)second_node)->freq) ?
                0 :
                (*(huffman_node_t **)second_node)->child[0] ?
                    -1 :
                    0;
}

int node_compare(const void * first_node, const void * second_node) {
    return (*(huffman_node_t **)first_node)->freq - (*(huffman_node_t **)second_node)->freq;
}

void node_print(const void * element)
{
    printf("Frequency: %zun", (*(huffman_node_t **)(element))->freq);

    if((*(huffman_node_t **)(element))->child[0])
        printf("Node has children...n");
    else
        printf("Node has no children, character is "%c"n", (*(huffman_node_t **)(element))->c);
}

main.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "huffman.h"

int main()
{
    uint8_t * encoded = NULL;
    char * decoded = NULL;
    char * test_strings[] = {
                "test string",
                "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890!"£$%^&*()-=_+\|,./<>?[]{}'#@~`¬n",
                "test",
                "Hello, world!",
                "This is a test string",
                "My name is Jeff",
                "Test",
                "tteesstt",
                "test",
                "ab",
                "Ω≈ç√∫˜µ≤≥÷",
                "ЁЂЃЄЅІЇЈЉЊЋЌЍЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюя",
                "If you're reading this, you've been in a coma for almost 20 years now. We're trying a new technique. We don't know where this message will end up in your dream, but we hope it works. Please wake up, we miss you.",
                "a",
                "aaaaaaaaaaaaaa",
                "",
                "Powerلُلُصّبُلُلصّبُررً ॣ ॣh ॣ ॣ冗",
                "Hello, world! This is a test string test string If you're reading this, you've been in a coma for almost 20 years now. We're trying a new technique. We don't know where this message will end up in your dream, but we hope it works. Please wake up, we miss you. tteesstt"
            };

    for(size_t i = 0; i < sizeof(test_strings) / sizeof(char *); i++) {
        printf("nEncoding string %zu...", i);
        fflush(stdout);

        if(huffman_encode(test_strings[i], &encoded) != EXIT_SUCCESS) {
            fprintf(stderr, "nError: Failed to encode string "%s"!n", test_strings[i]);
            continue;
        }

        printf("Done!nAttempting to decode...");
        fflush(stdout);

        if(huffman_decode(encoded, &decoded) != EXIT_SUCCESS) {
            fprintf(stderr, "nError: Failed to decode string!nEncoded string was "%s"n", test_strings[i]);
            free(encoded);
            continue;
        }

        printf("Done!nValidating...");
        
        if(!strcmp(test_strings[i], decoded))
            printf("Success!n");
        else
            printf("Failed! Got "%s"!n", decoded);

        free(decoded);
        free(encoded);
    }

    return 0;
}

Note: I am not looking for a review of any code below here in this question, I am saving this section for another review but I’m including it here because the rest of the code relies on it.

Edit: Since I wrote this code I was able to refactor it to completely remove this section

dlist_library.h

#ifndef DLIST_H
#define DLIST_H

#include <stdlib.h>                                                     /* Needed for size_t */

/* Return values */

#define LIST_OK 0
#define MEM_ERROR 1                                                     /* Memory allocation error */
#define SIZE_ERROR 2                                                    /* list dimension error */
#define INDEX_ERROR 3                                                   /* No data at given index */

/* List element data structure */

typedef struct list_element_t
{
    void * data;                                                        /* Contains the data stored at this node */
    struct list_element_t * next;                                       /* Contains the pointer to the next element, or NULL if it's the tail node */
} dlist_element_t;

/* List master data structure */

typedef struct
{
    dlist_element_t *   head;                                           /* Pointer to the head of the list */
    dlist_element_t *   tail;                                           /* Pointer to the tail of the list */
    size_t              data_width;                                     /* The size of each element in the list */
    int                 status;
} dlist_t;

/* User Functions */

dlist_element_t * dlist_search(void const * const data, int (*compare)(const void * first_element, const void * second_element), dlist_t * list); /* Search the list for an occurance of a given data value using a user defined comparison function */
dlist_t * dlist_init(size_t data_size);                                                                     /* Initialise the list data structure */
int dlist_insert_before(void const * const data, dlist_element_t * element, dlist_t * list);                /* Insert an element into the list at the position before a specified element */
int dlist_insert_after(void const * const data, dlist_element_t * element, dlist_t * list);                 /* Insert an element into the list at the position after a specified element */
int dlist_peek(void * const data, dlist_t * list);                                                          /* Check the contents of the element at the end of the list without popping the list */
int dlist_pop(void * const data, dlist_t * list);                                                           /* Pop an element from the front of the list, deals with cleanup when the head node is empty */
int dlist_push(void const * const data, dlist_t * list);                                                    /* Push an element to the back of the list, creates a new block when tail node is full */
int dlist_remove(dlist_element_t * element, dlist_t * list);                                                /* Remove an element from the list and connect the two elements next to it */
void dlist_destroy(dlist_t * list);                                                                         /* Destroy the list data structure and any associated nodes */
void dlist_operate(void(*operation)(const void * element), dlist_t * list);                                 /* Perform a user defined action on every single element stored in the list */
void dlist_sort(int (*compare)(const void * first_element, const void * second_element), dlist_t * list);   /* Sort all elements in the list using a merge sort */

/* Internal Functions */

dlist_element_t * dlist_merge_sorted(int (*compare)(const void * first_element, const void * second_element), dlist_element_t * head, dlist_element_t * second_head);
void dlist_sort_split(dlist_element_t * source, dlist_element_t ** front, dlist_element_t ** back);
void dlist_sort_internal(int (*compare)(const void * first_element, const void * second_element), dlist_element_t ** head);

#endif

dlist_library.c

/* 
 * Filename:    dlist_library.c
 * Author:      Jess Turner
 * Date:        10/08/18
 * Licence:     GNU GPL V3
 *
 * Library for a generic, dynamically allocated singly linked list
 *
 * Return/exit codes:
 *      LIST_OK         - No error
 *      SIZE_ERROR      - list size error (invalid block size or number of datas)
 *      MEM_ERROR       - Memory allocation error
 *      INDEX_ERROR     - Couldn't pop data from the list
 *
 * All functions returning pointers will return NULL on memory allocation faliure, else they will specify an error in list->status for the user to handle
 *
 * Todo:
 *      - Add secure versions of dlist_destroy(), dlist_pop(), and dlist_remove() to overwrite memory blocks that are no longer in use
 *      - Add a parameter to dlist_destroy() and dlist_remove() containing a function pointer detailing how to delete the data stored in each node
 */

#include <dlist_library.h>
#include <stdio.h>
#include <string.h>

dlist_t * dlist_init(size_t data_width)
{
    dlist_t * list;

    if(!(list = malloc(sizeof(dlist_t))))
        return NULL;

    list->tail          = NULL;
    list->head          = NULL;
    list->data_width    = data_width;
    list->status        = LIST_OK;

    return list;
}

void dlist_destroy(dlist_t * list)
{
    if(list == NULL)
        return;

    while(list->head) {
        dlist_element_t * temp = list->head->next;
        free(list->head);
        list->head = temp;
    }
    
    list->status        = 0;
    list->data_width    = 0;
    list->tail          = NULL;

    free(list);
}

int dlist_push(void const * const data, dlist_t * list)
{
    dlist_element_t * new_element;

    if(!(new_element = malloc(sizeof(dlist_element_t))) || !(new_element->data = malloc(list->data_width))) {
        if(new_element)
            free(new_element);
        return list->status = MEM_ERROR;
    }

    memcpy(new_element->data, data, list->data_width);

    if(list->head == NULL)
        list->head = new_element;
    else
        list->tail->next = new_element;

    list->tail = new_element;
    list->tail->next = NULL;

    return list->status = LIST_OK;
}

int dlist_pop(void * const data, dlist_t * list)
{
    if(list->head == NULL)
        return list->status = INDEX_ERROR;

    memcpy(data, list->head->data, list->data_width);

    free(list->head->data);
    dlist_element_t * temp = list->head;
    list->head = list->head->next;
    free(temp);

    return list->status = LIST_OK;
}

int dlist_remove(dlist_element_t * element, dlist_t * list)
{
    if(element == NULL || list->head == NULL)
        return list->status = INDEX_ERROR;

    if(list->head == element) {
        list->head = list->head->next;
        return list->status = LIST_OK;
    }

    dlist_element_t * prev = NULL;
    dlist_element_t * curr = list->head;

    while(curr != element) {
        prev = curr;
        curr = curr->next;

        if(curr == NULL)
            return list->status = INDEX_ERROR;
    }

    prev->next = curr->next;

    return list->status = LIST_OK;
}

int dlist_insert_after(void const * const data, dlist_element_t * element, dlist_t * list)
{
    if(list->head == NULL)
        return dlist_push(data, list);

    dlist_element_t * new_element;

    if(!(new_element = malloc(sizeof(dlist_element_t))) || !(new_element->data = malloc(list->data_width))) {
        if(new_element)
            free(new_element);
        return list->status = MEM_ERROR;
    }

    memcpy(new_element->data, data, list->data_width);

    new_element->next = element->next;
    element->next = new_element;

    if(element == list->tail)
        list->tail = new_element;

    return list->status = LIST_OK;
}

int dlist_insert_before(void const * const data, dlist_element_t * element, dlist_t * list)
{
    if(list->head == NULL)
        return dlist_push(data, list);

    dlist_element_t * prev = list->head;
    dlist_element_t * curr = prev->next;

    while(curr != NULL && curr != element) {
        curr = curr->next;
        prev = prev->next;
    }

    if(curr == NULL)
        return list->status = INDEX_ERROR;

    dlist_element_t * new_element;

    if(!(new_element = malloc(sizeof(dlist_element_t))) || !(new_element->data = malloc(list->data_width))) {
        if(new_element)
            free(new_element);
        return list->status = MEM_ERROR;
    }

    memcpy(new_element->data, data, list->data_width);

    if(curr == list->head) {
        new_element->next = curr;
        list->head = new_element;
    } else {
        new_element->next = prev->next;
        prev->next = new_element;
    }

    return list->status = LIST_OK;
}

dlist_element_t * dlist_search(void const * const data, int (*compare)(const void * first_element, const void * second_element), dlist_t * list)
{
    dlist_element_t * curr;

    for(curr = list->head; curr != NULL; curr = curr->next)
        if(!(*compare)(curr->data, data))
            return curr;

    list->status = INDEX_ERROR;

    return NULL;
}

int dlist_peek(void * const data, dlist_t * list)
{
    if(list->head == NULL)
        return list->status = INDEX_ERROR;

    memcpy(data, list->head->data, list->data_width);

    return list->status = LIST_OK;
}

void dlist_sort_split(dlist_element_t * source, dlist_element_t ** front, dlist_element_t ** back)
{
    dlist_element_t * slow = source;
    dlist_element_t * fast = source->next;
 
    while(fast != NULL) {
        fast = fast->next;
        if(fast != NULL) {
            slow = slow->next;
            fast = fast->next;
        }
    }

    *front = source;
    *back = slow->next;
    slow->next = NULL;
}

dlist_element_t * dlist_merge_sorted(int (*compare)(const void * first_element, const void * second_element), dlist_element_t * head, dlist_element_t * second_head)
{
    dlist_element_t * result = NULL;

    if(head == NULL)
        return second_head;
    else if(second_head == NULL)
        return head;
     
    if(compare(head->data, second_head->data) < 0) {
        result = head;
        result->next = dlist_merge_sorted(compare, head->next, second_head);
    } else {
        result = second_head;
        result->next = dlist_merge_sorted(compare, head, second_head->next);
    }

    return result;
}

void dlist_sort_internal(int (*compare)(const void * first_element, const void * second_element), dlist_element_t ** head)
{
    dlist_element_t * back = NULL;
    dlist_element_t * front = NULL;

    if(*head == NULL || (*head)->next == NULL)
        return;

    dlist_sort_split(*head, &front, &back);

    dlist_sort_internal(compare, &front);
    dlist_sort_internal(compare, &back);

    *head = dlist_merge_sorted(compare, front, back);
}

void dlist_sort(int (*compare)(const void * first_element, const void * second_element), dlist_t * list)
{
    if(list->head == NULL)
        list->status = INDEX_ERROR;

    dlist_sort_internal(compare, &list->head);

    for(dlist_element_t * curr = list->head; curr != NULL; curr = curr->next)
        if(curr->next == NULL)
            list->tail = curr;
}

void dlist_operate(void(*operation)(const void * element), dlist_t * list)
{
    for(dlist_element_t * curr = list->head; curr != NULL; curr = curr->next)
        operation(curr->data);
}

c++compression memory management memory optimization performance

harold · Answer

There is a lot here, so I only address a couple of points.

What are some ways I can reduce the header size to improve the compression ratios?

An effective trick is switching to using canonical Huffman codes, which gives you an unambiguous mapping from a table of code lengths to the table of codes. So this enables decoding while the header only stores, for each symbol, the length of the corresponding code, which is much smaller than storing the entire code as well. DEFLATE uses this (it goes much further to compress its header too, so if you want to make it even smaller you can look there for ideas), that RFC also provides code for decoding the set of lengths to a table of canonical Huffman codes.

The encoder should also use the same lengths-to-codes implementation to ensure consistency, then the Huffman tree building (if you do it at all!) only serves to find the lengths of the codes. That would let you simplify code_array_from_tree a little since it doesn't need to build codes any more, only lengths.

Are there any general performance chokes that can be removed?

Yes, first a small one. The way you encode, only one bit is written per iteration. You could append to a small bit-buffer held in an integer and then write out full bytes as they become available:

uint32_t buffer = 0;
int numbits = 0;
for (int i = 0; i < datasize; i++) {
    int len = codes[input[i]].length;
    // make room for code
    buffer <<= len;
    numbits += len;
    // save code to buffer
    buffer |= codes[input[i]].code;
    // output all the complete bytes in the buffer
    while (numbits >= 8) {
        output[out_pos++] = buffer >> (numbits - 8);
        numbits -= 8;
    }
}
// TODO: save bits that remain in the buffer

Since the output is byte based and every byte is only written once, it's easy to adapt to streaming the output to a file rather than constructing all of in memory.

There is a much larger performance problem in the decoder. You have used a bit-by-bit tree-walking decoding algorithm, that algorithm is inherently slow since it keeps traversing that tree. It also looks like this implementation of it can only handle codes up to 8 bits. Limiting code lengths is very reasonable, upping the maximum code length has diminishing returns that diminish very quickly, and long codes make table based decoding more complicated. 8 bits is a fairly short limit, suitable for text files but not for binary files.

There are various table-based decoding strategies, that all have in common that you won't be walking down the actual Huffman tree, but they differ in their details. In the simplest variant, you use a table of 2^k entries where k is your maximum code length (you have k=8 now, making this table tiny, DEFLATE has k=15). This table maps every possible state of your k-bit decoding buffer to a (length, symbol) pair corresponding with the first code in the buffer. You could interpret that table as caching the result of every possible way to do the tree-walk, combined with every possible "left-over" bit string (the part of the buffer after the first code). An other way to interpret it as the final limit of using wider and wider decoding tries (with the usual binary tree being a 2-way trie), leaving you with a trie that is only a single level deep so it becomes just an array.

The decoding itself then can become something like:

while (true) {
    uint32_t buffer = peekKBits(bit_position);
    struct sym s = decoding_table[buffer];
    bit_position += s.length;
    output[out_pos++] = s.symbol;
    if (s.symbol == END)
        break;
}

peekKBits must support peeking slightly beyond the end of the data, until the END symbol is aligned with the start of the buffer. It can read from a stream if you want, the position passed to it will never decrease, it just has to support temporarily buffering a constant amount of data. The output can also be streamed.

This looks simple, so maybe the complexity went into building the table? Actually not really, that's not so bad either, for example (this depends on the order in which you pack bits though, so this is just one possible way):

for (int i = 0; i < symbols; i++) {
    uint32_t code = codes[i].code;
    int padlength = MAX_CODE_LENGTH - codes[i].length;
    uint32_t padmask = (1U << padlength) - 1;
    // write an entry into the decoding table for every
    // possible "padding" of the code up to the buffer size
    struct sym s = { .length = codelength, .symbol = i };
    for (uint32_t padding = 0; padding <= padmask; padding++)
        decoding_table[(code << padlength) | padding] = s;
}

The size of the table used in that simple variant does not scale well with the maximum supported code length, there are many variants with smaller tables. An obvious one is a multi-level table, which can land anywhere in the middle between the two extremes. There are more complex options that get the table size down very well while staying away from true multi-level decoding, essentially replacing the initial steps with some arithmetic. If you keep k=8 then none of that is necessary, even the simplest table-based decoding algorithm would have a tiny table.

To clarify the encoding and decoding processes, let's take this example: "ABACABAAAAAAA".

B and C are the least common so, if you're using tree building, the corresponding nodes would be merged first, and then that merged node is merged with the A node, leading to codes like this:

A: 1
B: 00
C: 01

So the whole string translated to the corresponding codes would be ‭1 00 1 01 1 00 1 1 1 1 1 1 1 which can be packed into two bytes like this ‭10010110 01111111. I chose this example so it would cut a code through the middle on the byte boundary, because this typically occurs and must be handled.

Different packing orders are possible, for example DEFLATE packs the first item into the bottom bits of a byte first. That doesn't fundamentally matter for this decoding technique, the padding bits would be at the top instead of at the bottom and peekKbits would concatenate the other way around. The principle remains the same, but the diagrams become more confusing.

Since the longest code was 2 bits, we could use a 2-bit decoding buffer and a 4-entry decoding table. Obviously such small sizes are not typical, so I will also show what would happen with an 8-bit decoding buffer and the corresponding 256-entry table. First the small example though.

The first 2-bit buffer would be ‭10 (here: [10]010110). 10 was not a code, it's really 1 with some extra stuff after it which we don't know what it is yet. This is why the table-building step appends the padding bits, here the padding is 0. Anyway that 4-entry decoding table would look like this:

index symbol length
   00      B      2
   01      C      2
   10      A      1
   11      A      1

The buffer holds the value 10, so looking at table[2] we decode an A and advance the bit-position by 1. Next the buffer is here: 1[00]10110, decode a B, advance by 2, etc:

100[10]110 -> A, 1
1001[01]10 -> C, 2
100101[10] -> A, 1

The next buffer crosses between the bytes, 1001011[0 0]1111111. That's not difficult but it must be dealt with. For example, with the stateless peekKbits it could work like:

uint32_t peekKBits(int bit_pos) {
    int byte_pos = bit_pos >> 3;
    int bit_offset = 14 - (bit_pos & 7);
    uint32_t concat = (input[byte_pos] << 8) | input[byte_pos + 1];
    return (concat >> bit_offset) & 3;
}

Which reads two bytes (ensure they exist, this will read a byte beyond the input data so it has to be extended), concatenates them, shifts the concatenated string right until the two target bits at the bottom bits, and then chops off the top. Statefully flowing the data through a shift register, like I showed for encoding, is of course also possible.

Anyway, however we get that next buffer, it'll be 00, with one bit from the first byte and one bit from the second byte. It will decode to (B, 2), and after that there are only some boring As to decode.

A bigger decoding table is really the same idea, but with more padding bits, so many entries of the table will correspond to the same (symbol, length) pair. For a 256 entry table, all 8-bit values of the form 00xxxxxx would map to B, 2, that's 64 entries just for B. The decoding process would now go like this:

[10010110] 01111111 -> A, 1
1[0010110 0]1111111 -> B, 2
100[10110 011]11111 -> A, 1
1001[0110 0111]1111 -> C, 2
100101[10 011111]11 -> A, 1
1001011[0 0111111]1 -> B, 2
10010110 0[1111111 x] -> A, 1 (using padded input)
more A's

If you limit code lengths in the decoder, the encoder should enforce this limit as well, otherwise it can produce undecodable files for unlucky inputs. There are various ways to do it, such as rebalancing the Huffman tree, modifying the frequencies and rebuilding the tree, using some totally different algorithm such as Package-Merge, fudging the lengths themselves while ensuring they remain a valid set of Huffman code lengths, or even directly building the set of lengths using a heuristic.

Would there be any major issues porting my code to other systems? I have never used bit level operations before and I've heard that endianness can be a problem when using them

Endianness determines what order the bytes of multi-byte types are laid out in in memory. That matters when you directly reinterpret the memory of some type as the memory of some other type, as in unions and casting char* to int* and then reading/writing to it, that sort of thing. Bitwise operators operate on integers, not memory, so they are unaffected by endianness.

Huffman Coding library implemented in C

One Answer

Add your own answers!

Ask a Question