Remarks from review #233 (#240)

hobovsky · Steffan153 · web-flow · commit 07e602287df8 · 2021-01-15T18:01:26.000+01:00
* Remarks from review

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Steffan &lt;40404519+Steffan153@users.noreply.github.com&gt;

Co-authored-by: Steffan &lt;40404519+Steffan153@users.noreply.github.com&gt;
diff --git a/content/authoring/guidelines/submission-tests.md b/content/authoring/guidelines/submission-tests.md
@@ -15,6 +15,7 @@ NOTE: There are many kinds of kata, and some guidelines might simply not apply t
 ## General Guidelines
 
 - **Conform to [General Coding Guidelines][authoring-guidelines-general-coding]**: tests are code too and as such should keep up to code quality standards.
+- **Tests should verify outcomes of solutions, and not their specific approach or implementation details.** A solution that returns correct answers and satisfies all required characteristics should be accepted regardless of the used algorithm or details of implementation.
 - **Be familiar with the testing framework used by Codewars for your [language][languages].** Some languages use traditional, widely known testing frameworks, like `JUnit` (for Java) or `chai` (for JavaScript), and some others may use frameworks dedicated specifically for use by Codewars (for example Python). Know what kind of assertions are available, how to test objects, types, collections, etc. Know techniques provided by the framework (test case generators, parameterized test cases, etc.) and assertion libraries. You can visit the [reference page for your language][languages] to find more detailed information.
 - **The test suite should be organized**, splitting the tests into different groups and subgroups. Each test framework provides tools to do that, generally with a possibility to give names to each (sub)group. Meaningful names are helpful to the user when there are a lot of tests. For example `test arrays of odd length`, `test arrays of even length`, `test arrays of negative numbers` are informative while `test batch 1`,`test batch 2`, `test batch 3` are not.
 - **Know your language**, what can be tested, and what cannot. Know how to test floating-point values, equalities, equivalences, and such.
@@ -37,6 +38,7 @@ Some test suites require a reference solution to generate the expected value(s)
   - Reference solution being accessible to users by mistake.
   - Input mutation by the user solution which can affect the input passed to the reference solution, or make assertion messages confusing.
   - Incorrect implementation of the reference solution leading to the rejection of valid users' solutions.
+- **The reference solution, if used, does not have to be the same as the one in the "Reference Solution" snippet.** While the "Reference Solution" snippet serves its specific purpose and is [controlled by its own set of quality guidelines][authoring-guidelines-reference-solution], the reference solution used by performance tests can use a different, more efficient approach, to make sure that it does not consume too much of a time limit available for the user solution.
 - **The reference solution should not be revealed to the user.** When an assertion fails or the test suite crashes, some testing frameworks print fragments of source code which caused the failure to the console. It may happen that such printed failure messages or stack traces expose information about the solution which should not be revealed, so the place where the expected solution is computed is not a trivial choice at all.
 - **The reference solution shouldn't be accessible to the user solution.** It should not be possible to call the reference solution directly, or implement the user solution as an alias or wrapper around the reference solution. The reference solution should be completely inaccessible outside the submit tests. In particular, it should not be a global and/or public function. Check the [reference page for your language][languages] to see how to prevent this problem in your tests.
 
@@ -63,7 +65,7 @@ Fixed tests are tests with predetermined inputs and outputs, and do not change b
 
 ## Random Tests
 
-Random tests are uncommon in "real life" coding and are somewhat specific to Codewars. They are required to reject invalid approaches based on input probing, hard-coding, and other workarounds. The goal of random tests is to make the inputs unpredictable so that only solutions that are actually solving the task may pass.
+Random tests are uncommon in "real life" coding and are somewhat specific to Codewars. They are required to reject invalid approaches based on input probing, hard-coding, and other workarounds. The goal of random tests is to make the expected return values and their order unpredictable so that only solutions that are actually solving the task may pass.
 
 - **Random tests should generate test cases for all scenarios** which cannot be completely tested with fixed tests. If necessary, build different kinds of random input generators. If a specific kind of input has a very low chance of occurring purely at random (e.g. generating a palindrome), it's better to build a specific random generator that can enforce this kind of input rather than rely on 1000 random tests and just pray for the specific case to come up. Sometimes it can be a good idea to keep one fully random generator, because it may generate cases you didn't think about.
 - **Random tests should ensure that it's infeasible to pass tests by counting test cases.** Cases shouldn't be grouped by output type or behavior, especially if the expected output is a boolean variable (e.g. checking that some input satisfies some criteria), or when it comes to error checking (solution throwing an exception in some specific situations). The order of tested scenarios should be unpredictable. One possible way to achieve this is to generate and collect a set of random inputs for all required scenarios and shuffle them before the actual testing. If there are some fixed tests for particularly tricky scenarios which can be skipped by counting, they should be shuffled into the set of random inputs.
@@ -79,11 +81,11 @@ Random tests are uncommon in "real life" coding and are somewhat specific to Cod
 Some kata require solutions to be fast enough. For example, the author may only wish to accept solutions completing in (sub-)linear time. Building such test suites is not an easy task!
 
 - **Performance tests can be implemented in terms of random tests, by testing with large random inputs.** However, to make debugging easier, it can be worth having a separate set of random tests with small inputs first.
-- If the difficulty of a kata is roughly proportional to the size of the input, it's usually **better to have a few tests with large input rather than many with medium-sized inputs.** For example, 100 tests with huge numbers or arrays is usually better than 1000 tests with moderately large arrays/numbers (but see remarks on size of inputs above and the problem below about building the inputs too).
+- If possible, **performance of a solution should be evaluated by gauging the size of the inputs, rather than the amount of tests.** If the difficulty of a kata is roughly proportional to the size of the input, it's usually better to have a few tests with large input rather than many with medium-sized inputs. For example, 100 tests with huge numbers or arrays is usually better than 1000 tests with moderately large arrays/numbers. At the same time, authors should ensure that large inputs are generated and handled in a way that does not have too much negative impact on the test suite.
 - **The difference between accepted and rejected solutions should be easy to spot.** Ideally, accepted solutions should complete well under the time limit, while rejected solutions should time out consistently. Otherwise, you risk that solutions with valid complexity characteristics will time out, and users will be frustrated looking for micro-optimizations. Achieving this generally calls for very large inputs, so be careful!
 - **Performance tests should be consistent between runs.** It should not happen that one and the same solution sometimes passes, and sometimes fails, depending on randomized inputs.
-- **The reference solution, if used, does not have to be the same as the one in the "Reference Solution" snippet.** While the "Reference Solution" snippet serves its specific purpose and is [controlled by its own set of quality guidelines][authoring-guidelines-reference-solution], the reference solution used by performance tests can use a different, more efficient approach, to make sure that it does not consume too much of a time limit available for the user solution.
 - **Make sure that what you measure is what you want**, when solutions are to be rejected based on their performance. With huge inputs, the random generation of the inputs may be more time-consuming than the computation of the expected result. Hence the overall timing indication in the output panel is generally useless to ensure that the performance tests are actually discriminating the different kinds of solutions as expected. The evaluation of the time actually used by the user's solution should be done (and compared to the reference solution if any) excluding the input generation time.
+- **Kata should not call for micro-optimizations when not necessary.** Performance tests should leave some freedom for users and give some leeway to solutions based on various approaches to the problem or with slight differences in implementation, as long as they satisfy general performance criteria.
 - When maintaining a kata with performance requirements, it can be useful to have access to a solution whose time complexity is supposed to be rejected. It can be used by maintainers to gauge the size of inputs for performance tests, to make sure they consistently fail. Storing it, properly commented, in the test suite, can be very helpful.
 
 ## Tests with Additional Restrictions