C memory management (#231)

hobovsky · ggorlen · DonaldKellett · web-flow · commit 0a0a83e888a9 · 2021-01-16T12:27:29.000+01:00
* Initial stub

* markdown

* code style

* Memory management

* fix invalid link

* Info on Criterion

* tests, stub on random

* wording

* Assertions

* preloaded

* random utilities

* example test suite

* example test suite

* todo: not on size of returned array

* note on input mutation

* initial version, checklist

* Naive approach

* fix markdown

* Organization

* Preallocated buffer

* Query + allocation + calculation

* Solution returns status

* Symmetric user functions, organization

* 2D arrays: flat

* Hide one of paragraphs because it's probably too complex.

* 2D arrays

* typos, wording

* Apply suggestions from code review

Co-authored-by: Greg Gorlen &lt;gsgorlen@gmail.com&gt;

* Apply suggestions from code review

Co-authored-by: Donald Sebastian Leung &lt;donaldsebleung@gmail.com&gt;

* Apply suggestions from code review

Co-authored-by: Steffan &lt;40404519+Steffan153@users.noreply.github.com&gt;

* Organization

* Organization

* constants

* Memory allocated by tests

* Memory managed by the user

* 2d arrays

* Note on casts of linear buffers

* Note on cast between 1d and 2d

* Array of constants

* Headers of examples

* Apply suggestions from code review

Co-authored-by: Steffan &lt;40404519+Steffan153@users.noreply.github.com&gt;

* Apply suggestions from code review

Co-authored-by: Steffan &lt;40404519+Steffan153@users.noreply.github.com&gt;

* Fix example section

* Apply suggestions from code review

Co-authored-by: Donald Sebastian Leung &lt;donaldsebleung@gmail.com&gt;

* Apply suggestions from code review

* Remove string constants, replace them with enums

* Apply suggestions from code review

Co-authored-by: Steffan &lt;40404519+Steffan153@users.noreply.github.com&gt;

* Apply suggestions from code review

Co-authored-by: Donald Sebastian Leung &lt;donaldsebleung@gmail.com&gt;

Co-authored-by: Greg Gorlen &lt;gsgorlen@gmail.com&gt;
Co-authored-by: Donald Sebastian Leung &lt;donaldsebleung@gmail.com&gt;
Co-authored-by: Steffan &lt;40404519+Steffan153@users.noreply.github.com&gt;
diff --git a/content/languages/c/authoring/memory-management-techniques.md b/content/languages/c/authoring/memory-management-techniques.md
@@ -6,19 +6,378 @@ sidebar: "language:c"
 
 # Memory Management in C kata
 
-_TBD_
+Unlike many modern, high-level languages, C does not manage memory automatically. Manual memory management is a very vast and complex topic, with many possible ways of achieving the goal depending on a specific case, caveats, and pitfalls.
+
+
+## General information
+
+### Specification
+
+Whenever a kata passes in a pointer to the user's solution or requires it to return or manipulate a pointer or data referenced by a pointer, it should **explicitly** and **clearly** provide all information necessary to carry out the operation correctly. See the paragraph on [related guidelines](/languages/c/authoring/#working-with-pointers-and-memory-management) in ["C: creating and translating a kata"](/languages/c/authoring/) tutorial.
+When the structure, layout, or allocation scheme of pointed data is not described, users cannot know how to implement requirements without causing either a crash or a memory leak.
+
+
+### Interface
+
+It often happens that the solution function has to accept and return more values than just these related to the kata task itself. There can be more parameters required for tracking the memory, sizes of allocated buffers, statuses, etc. Depending on exact requirements, these parameters can be passed in and returned as separate function arguments, or can be packed together into some kind of structure. Examples in this article assume the former, but authors are free to decide otherwise.
+
+
+### Arrays and strings
+
+Since C-strings and arrays of other types are similar from the perspective of memory management, most of the techniques presented here apply equally to handling memory holding integers, floats, and characters, zero-terminated or not.
+
+
+## Memory Management Patterns
+
+In C, unlike for example Python, Java, C#, or Javascript, dynamically allocated memory is not managed by the runtime. It's considered to be a resource like any other, for example, a file, a DB or network connection, or a hardware device. The program itself has to take care of it properly, allocating it when necessary, and freeing when no longer needed.
+
+In kata, the memory can be managed either by the test suite, by the user, or both. Authors can choose the way how their kata should deal with memory and they can pick any ownership strategy. However, they should be aware of the advantages and disadvantages of each such strategy, and when and which applies the best. 
+
+
+### Statically allocated constant data
+
+The best way to prevent problems with memory allocation is to avoid unnecessary memory allocation. This advice might sound tricky, but there are simply many kata that require dynamic memory allocation or operation on data pointed by pointers, while it's simply not necessary and could be avoided. One commonly occurring example of such a situation is when a kata requires returning a pointer to a string which could be replaced by a constant. It seems to appear particularly often when translating kata from other languages. Returning a string in high-level languages is not a problem, but in C it always raises questions of who should allocate it and how it should be allocated. Consider replacing the string with an `enum`. For example, if the requirement for the JavaScript version is: _"Return the string 'BLACK' if a black pawn will be captured first, 'WHITE' if a white one, and 'NONE' if all pawns are safe."_, the C version should preferably provide and use the named constants `BLACK`, `WHITE`, and `NONE`.
+
+<details>
+    <summary>Example</summary>
+
+Solution:
+
+```c
+//Since Codewars does not allow header files for kata, declarations need to be repeated
+//This definition has to be provided by the solution stub snippet.
+typedef enum Player { BLACK, WHITE, NONE } Player;
+
+Player who_won(const char* board) //typedef used for return type 
+{
+    if(...) {
+      return BLACK;    //return constant instead of an allocated string
+    } else if (...) {
+      return WHITE;
+    } else {
+      return NONE;
+    }
+}
+```
+
+Tests:
+
+```c
+//Since Codewars does not allow header files for kata, declarations need to be repeated
+typedef enum Player { BLACK, WHITE, NONE } Player;
+
+Player who_won(const char* board);
+
+//helper function
+static const char* stringify(Player player) {
+  static const char* const strings[] = {"BLACK", "WHITE", "NONE"};
+  return strings[player];
+}
+
+Test(fixed_tests, no_one_won) {
+ 
+  Player winner = who_won("BWWB");
+
+  //remember to turn enum values into strings to get better error messages
+  cr_assert_eq(winner, NONE, "Expected: [%s], but was: [%s]", stringify(NONE), stringify(winner));
+}
+```
+
+</details>
+
+
+### Memory managed by tests (i.e. caller)
+
+One set of possible techniques assumes that the caller is the owner of allocated memory and tests should be responsible for allocating and releasing it. Memory is always allocated by the test suite, and the test suite can decide whether it wants to use memory allocated automatically (i.e. on the stack), dynamically (for example with `malloc`), or in some other available way. The test suite is also responsible for releasing it, if necessary. Such allocated buffer is passed to the user's solution to work on, and it's filled with the requested data.
+
+Sometimes it's perfectly known how large the result will be before the solution is called, or it's possible to pre-allocate a buffer that will be large enough for every call. For example, if the test suite asks to generate `n` Fibonacci numbers, it means that the resulting array needs to have the size of at least `n`. Sometimes the exact size is not known exactly, but it's possible to accurately estimate its upper bound. For example, a function that removes punctuation from a string needs to work on a buffer at least as large as an input string, but the result can turn out to be a bit smaller. In such cases, the test suite can allocate the buffer which would be big enough to keep the result, and pass it to the solution function:
+
+<details>
+    <summary>Example</summary>
+    
+Solution:
+
+```c
+//function prototype can use size hints
+void calculate_numbers(size_t n, int result [n]) {
+    //...actual calculations
+}
+```
+
+Tests:
+
+```c
+void calculate_numbers(size_t n, int result [n]);
+Test(fixed_tests, small_inputs) {
+
+    //requested amount of nubers
+    const int to_generate = 4;
+
+    //array allocated on stack,
+    //the required size is perfectly known
+    int result_array[to_generate];
+
+    //pass the array to the function, and expect
+    //it to be filled with the result
+    calculate_numbers(to_generate, result_array);
+
+    //...perform assertions, verify correctness of returned numbers...
+
+    //no need to deallocate the array
+}
+
+Test(random_tests, large_inputs) {
+
+    const int MAX_TEST = 10000000;
+
+    //dynamically allocate an array large enough to fit all possible answers.
+    //allocate it once, and reuse it through the tests.
+    int* array = malloc(sizeof(int) * MAX_TEST);
+
+    //ten random tests
+    for(int i=0; i<10; ++i) {
+
+        //randomize the input
+        int n = rand() % MAX_TEST + 1;
+
+        //use preallocated array
+        calculate_numbers(n, array);
+
+        //...perform assertions, verify correctness of returned numbers...
+    }
+
+    //release the memory after all tests
+    free(array);
+}
+```
+
+</details>
+
+This technique is often overlooked by kata authors, but it greatly simplifies the way how user solutions are built and how they communicate with the test suite. The user's solution does not have to worry about allocations or error handling, and can focus on its task. The test suite can use any allocation technique it wants, like automatic allocation on the stack, or dynamic allocation on a heap. Buffers can be allocated once and reused across many test calls.
+
+The biggest problem with allocated memory is that its size has to be known or possible to estimate before calling the user's solution. It's very often the case, but sometimes such estimation is not possible or easy. There are ways to work around this problem and work with memory allocated by the caller even when its size is not known upfront, but they are out of the scope of this article. In such cases, kata can use memory allocated by the user.
+
+
+### Mixed approach: `malloc` in the solution and `free` in tests
+
+In the vast majority of cases when a kata requires the solution to allocate memory, authors choose the naive approach of allocating the memory in the solution, and releasing it with `free` in the test suite after performing all necessary assertions. This mimics the behavior known from high-level languages where returning an array or object from inside of the user's solution is perfectly valid, but it's not always the best, or even correct, way of working with unmanaged memory in C.
+
+This approach is useful when the size of the result is not known before the call. The solution is responsible for finding the correct size and returning it along with the pointer to the buffer itself, and the test suite is responsible for freeing it after every call.
+
+<details>
+    <summary>Example</summary>
+
+Kata task:
+
+> Given a natural number `n`, return all prime numbers up to and including `n`.
+
+Solution:
+
+```c
+//get all prime numbers less than upto
+//use an output parameter to return the size of the result
+int* get_primes(int upto, int* size) {
+
+    //the solution allocates required memory
+    int* result = malloc(sizeof(int) * ...);
+
+    //... fill result with primes
+    //...
+
+    *size = ...; //assign amount of primes
+    return result;
+}
+```
+
+Test suite:
+
+```c
+Test(fixed_tests, should_return_2_and_3_for_4) {
+    
+    int expected[] {2, 3}, expected_size = 2;
+    int actual_size;
+    
+    //call user solution and expect it to allocate the returned array
+    int* actual = get_primes(4, &actual_size);
+
+    //...assert on actual_size
+    //...assert on contents of actual
+
+    //after performing all necessary assertions,
+    //free the array allocated by the user solution
+    free(actual);
+}
+```
+
+</details>
+
+This approach works in a way similar to functions like `strdup` or `asprintf`, which allocate required memory and pass its ownership to the caller. It's a good fit for Codewars kata because it's simple, effective, and works well in Codewars' code runner.
+
+A potential issue with the mixed approach is not related to Codewars, but to "real world" C programming and design. It might not work well for complex memory structures, or when a callee has to do advanced book-keeping and tracking of allocated memory. It also does not work well when passing data between modules (for example, between libraries, or from a library to the main program).
+
+
+### Memory managed by the solution
+
+The opposite of managing memory in the test suite is the approach of delegating the responsibility to the solver. This way, tests do not need to worry about problematic aspects of memory management, kata authors give freedom of implementation to users, and can reduce the boilerplate required to implement memory management.
+
+This idea boils down to asking users to provide their equivalents of allocation and de-allocation functions. The solution function is responsible not only for solving the task but also for allocation of memory and storing of book-keeping information. The clean-up function is responsible for releasing resources.
+
+There are many possible ways of implementing the allocation scheme and corresponding clean-up function, but an example implementation could be:
+
+<details>
+    <summary>Example</summary>
+
+Kata task:
+
+> Given the initial generation of a Game of Life population, return the state and size of the game world after `n` generations.
+
+Solution:
+
+```c
+//solution function, which allocates all required memory and solves the task
+char** game_of_life(int generations, char** initial_generation, int* world_h, int* world_w) {
+
+    char** world = ...; //allocating memory for the world map
+    
+    for(int i=0 i < generations; ++i) {
+        //... actual game, which potentially requires additional (re)allocations
+    }
+
+    //return the final state of the game world to the caller
+    return world;
+}
+
+//clean-up function
+void destroy_world(char** world) {
+    //... deallocate all memory appropriately in a way
+    //which matches how the game_of_life allocated it.
+}
+```
+
+Tests:
+
+```c
+int world_w = 3, world_h = 3;
+char** initial_generation = ...; //set up a GoL glider
+int generations = 25;
+
+//invoke solution function, which also allocates memory
+char** actual = game_of_life(generations, initial_generation, &world_h, &world_w);
+
+//... perform assertions on the world map and verify the state of its cells
+
+//call the clean-up function, which deallocates all memory
+destroy_world(actual);
+
+//...at this point memory is deallocated, no need to call free
+```
+
+</details>
+
+Memory management by a callee is not a common requirement for Codewars kata. It can be useful when the memory is structured in a complex way, or when it has to be tracked in some particular way. It mimics the behavior of C libraries, which often provide symmetrical de/allocation functions, and/or use opaque pointers as elements of their interface.
 
-## Arrays and strings
 
-- malloc in solution and free in tests
-- pass in a preallocated buffer (use size hints if possible)
-- two functions: get size, allocate in tests, run solution
-- two functions: solution with allocation, deallocation. Bookkeeping information managed by user or passed as additional `void*`
-- one function: accept buffer+size, return retsult or error and required size
-- avoid string constants, use named symbols
-- reporting size:output param, structure, sentinel teminators
 ## Two-dimensional arrays
 
-- N+1 allocations
-- Flat
-- TOC + Flat
+Some kata require the user solution to return a two-dimensional array, for example, a 2D matrix, or an array of C-strings. Such scenarios are a bit more complex, because not only does the higher-order array have to be properly managed, but all its individual entries as well. The exact approach selected for the allocation of such structures depends on the scenario because different techniques are suitable for square or rectangular arrays, jagged arrays, arrays of null-terminated strings, etc.
+
+
+Just as any form of memory, 2D arrays can be managed by the test suite, user solution, or both. As long as the size of the 2D array is known before calling a solution and does not change through the course of calculations, the test suite can choose to perform all necessary allocations and pass the memory to the solution function ready to use. This is a very good approach when working with chessboards, sudokus, matrices and mazes of predetermined sizes, etc. However, in the case that the size of the answer cannot be easily determined beforehand, the mixed approach or memory management by the callee with a clean-up function provided by the user can be better.
+
+:::note Note on examples
+For simplicity, this section uses the terms "2D array", "array of arrays", and "matrix" interchangeably and assumes row-major order, i.e. data can be accessed with `array[row][col]`.
+:::
+
+### Naive approach: N+1 allocations
+
+This is the most common approach of using dynamically allocated multidimensional arrays. An array of pointers to rows is allocated first, and each row is allocated individually afterwards.
+
+<details>
+    <summary>Example</summary>
+
+Allocation:
+
+```c
+//allocate array of rows first
+char** world = malloc(sizeof(char*) * world_h);
+for(int i=0; i < world_h; ++i) {
+
+    //allocate every row individually
+    world[i] = malloc(world_w);
+}
+```    
+
+Deallocation:
+
+```c
+//... deallocate all memory also row by row
+for(int i=0; i < world_h; ++i)
+    free(world[i]);
+
+free(world);
+}
+```
+
+</details>
+
+The advantage of individually allocated rows is that it works well for jagged arrays.
+
+This approach, despite appearing to be simple, is affected by issues mostly related to performance. It tends to be slow since each dynamic allocation requires a memory lookup. It can also cause excessive memory fragmentation.
+
+Additionally, it is sometimes unnecessarily used to return an array of data (usually strings) that could be turned into constants.
+
+
+### Array of string constants
+
+This approach is related to [returning a statically allocated const data](#statically-allocated-constant-data) but extended to arrays. Some kata require the user to return an array of strings, which could be turned into constants. In such a case, string constants should be replaced with `enum`, and just a one-dimensional, dynamically allocated array of enum values should be used.
+
+The only tricky part is stringification of the values if they are going to be displayed or used as a part of assertion messages.
+
+
+### Flat array
+
+Very often overlooked, but a very good approach to represent 2D arrays is to store them in a regular, linear array of `T[ ]`, potentially supported by some type casts between a linear buffer and two-dimensional matrix. 
+
+<details>
+    <summary>Example</summary>
+
+```c
+//declaration of solution accepting a two-dimentional array
+void play_game_of_life(size_t world_h, size_t world_w, char world_2d[world_h][world_w]);
+
+size_t world_h = ...;
+size_t world_w = ...;
+
+//allocating a linear buffer of memory for the world map
+char* world_linear = malloc(world_h * world_w);
+
+//cast a linear buffer to a 2D array
+char (*world_2d)[world_h] = (char (*)[world_h])world_linear;
+
+for(size_t row = 0; row < world_h; ++row) {
+    for(size_t col = 0; col < world_w; ++col) {
+    
+        //...set up the world...
+
+        //access a cell in linear buffer, or
+        world_linear[row * world_w + col] = 'x'; //set a cell as alive
+
+        //access a cell in 2d array
+        world_2d[row][col] = ' '; //set a cell as dead
+    }
+}
+
+//pass the 2d array to user solution
+play_game_of_life(world_h, world_w, world_2d);
+
+//deallocate all memory at once
+free(world_linear);
+```
+
+</details>
+
+This way, the complexity of memory management is greatly reduced since all necessary memory can be allocated and freed with a single call to `malloc` (or equivalent) and `free`.
+
+The drawback of the version with casts between linear and 2D arrays is that it is best suited for perfectly rectangular arrays, i.e. arrays whose sub-arrays all have equal length. However, the version without casts can be effectively used when bounds between inner arrays can be efficiently determined, for example, each row of a matrix has a well-known length, rows of a Pascal's triangle have precisely defined, although different, lengths, and string entries are clearly terminated.
+
+This method also does not fit perfectly the scenario when such an array should be *returned* from a function. The function still has to specify its return type as `T*`, and the caller has to either work with the linear form of the array or perform the cast on its own.