Test Cases for Problems

Test cases are used to automate the marking of student code submissions. 

There are four types of test cases, which are described below. By default, all test cases are  Validate test cases. This is the vast majority of test cases used on Grok.

Enabling other types of Test Cases

By default, all test cases are  Validate test cases. You can enable the other types of test cases using the Show Advanced option on the Problem, see screenshot below.  Then change the Test Case Type to one of the below types. 

Types of Test Cases 

By default, all test cases are  Validate test cases. See above for how to enable other types of test cases. 

Type of Test Case Correctness
Typically used to assess the correctness of the solution
Elegance
Typically used to assess the style & elegance of the solution (rather than the correctness). 
Normal Validate
  • Used to check that the submitted code passes the requirements of the problem. 
  • Visible to students when marking (though you can "hide" certain information about the test case in the test case message that the student receives).
  • Submissions which fail the Validate test cases will be marked as "failed" within Grok.
  • Submissions which fail the Validate test cases will not receive Grok points (if there are points available). 
Suggest
  • Used to make suggestions for improvements to the submitted code. 
  • Visible to students when marking (though you can "hide" certain information about the test case in the test case message that the student receives).
  • Submissions which fail the Suggest test cases will still be marked as "passed" within Grok.
  • Submissions which fail the Suggest test cases will get Grok points (if there are points available). 
  • Often used for linting, for example. 
Assignment Assess
  • Used to check that the submitted code passes the requirements of the problem.
  • Hidden from students until the Show Assessments date.
  • Submissions which fail the Assess test cases will still be marked as "passed" within Grok. 
  • Used to assign marks which are more nuanced than a pass/fail - this process must be done outside of Grok. 

Review
  • Used to make suggestions for improvements to the submitted code. 
  • Hidden from students until the Show Assessments date. 

What's the practical difference between Validate and Suggest test cases? 

Essentially, it is:

  1. The user experience of passing vs not passing the Validate tests. This does not impact tutor ability to see the students' latest Marked submission
  2. Validate test cases only run if the previous one passed, whereas ALL the Suggest test cases run. There is an advanced option to "group" Validate test cases in which case all tests in the group will run at once. See below about setting this up. 

Using Test Cases for Assessments

With test cases, what Grok envisions is that one would set an appropriate minimum acceptance criteria using the Validate test cases (e.g. Is the output format correct? Does it pass the example input/output pairs in the problem description? etc). One would use the Assess test cases to test for harder or more nuanced cases. The idea being that students who pass more Assess test cases will achieve higher marks. 

Generally we (Grok) assume that any student who passes the Validate test cases will receive a passing mark for this assessment. That's not a requirement, of course. But it's our recommendation, and how the system was designed to work. That's also not to say that you can't still assess the students who don't pass all the Validate cases - you absolutely can assess their submissions and assign them a mark. 

Checker Options

Each test case has a number of optional checker options which are passed to the checker. The appropriate checker must be used for these options to pass through successfully. The most common output checker used for Python code is the "Differ". This diffs the input and output of the program with the expected input and output of the test case.

Friendliness

The friendliness options only apply when using the Differ checker. They are set in the following interface:

The differ operates by comparing the expected input and output to the actual input and output.

  • Space (ignore whitespace) - removes all whitespace from the actual and expected before diffing
  • Punct (ignore punctuation) - removes all punctuation from the actual and expected before diffing
  • Case (ignore case) - converts the actual and expected to lowercase before diffing
  • Sort (ignore line order) - lexicographically sorts all of the lines of actual and expected before diffing
  • Sort | Uniq (ignore line order and duplicates) - does `sort` and then removes duplicate lines in actual and expected before diffing
  • Norm floats (round floating point numbers) - round all floating point numbers to the provided number of significant figures in actual and expected before diffing (note significant figures not decimal places)
  • Slice from (ignore lines from N) - remove the first N lines of output from actual and expected before diffing. This happens before sort and sort | uniq.
  • Slice to (ignore lines to N) - remove after N from actual and expected before diffing. If set to a negative number, it counts down from the last line (like Python slices). This happens before sort and sort | uniq.

Grouping Tests

Sometimes it's useful to group up tests, and only allow subsequent tests to be run if the all tests in a group are passed. By default, all members of a test group are run, regardless of whether other members of the group fail. 
An example suite of tests might look like the following:
  • Test 1: Simple validation test with example to ensure code runs at all
  • Tests 2 - 5: Validation tests of edge cases. Only run if Test 1 passes.
  • Tests 6 - 8: Assess tests to assist manual marking. Only run if all Tests 2 - 5 pass.
To achieve this, put the second set of tests into a group.
To form a group, simply ensure they are part of a continuous sequence (for instance, tests {#2, #3, #4, #5} would be valid, but {#2, #4, #5} would form two groups), and give them each the same string in the Group field within the test options. It helps to use a meaningful string, e.g., "Edge cases". If you do not specific a test group label, each test is treated as being in its own group of size 1.

Changing behaviour when a Validation test fails

Another way to get the results of all tests, even when an earlier validation test fails, is to alter the problem's validation test behaviour.

Options are:

  • Stop on fail - If a validation test fails, do not run any further tests beyond the current group, and do not show the results of any later tests in the current group
  • Stop on fail but show all group results - As above, but show all results for the current test group (even if they occur after the failing test)
  • Continue on fail - Run and show results of all tests, regardless of earlier test failures.