Why a formal specification?

This section is intended to provide a rigorous mathematical definition of how one can construct any JSON Schema starting from a predefined set of primitives (a formal grammar), and how can such a schema be validated against arbitrary JSON documents. But before we give a formal mathematical specification we will briefly explain what advantages does having a fixed grammar for JSON Schema offer.

A grammar for JSON Schema is very similar to a grammar of some language (e.g. English, Spanish, French, etc.), since it tells us which Schemas (or sentences in the case of a spoken language) are permitted and which are not permitted. Formal grammars are generally used when we want a clean and easy to process specification of some object or a process, and are used to e.g. specify programming languages, inputs to compilers, in text processing and many other areas.

Below we provide an (incomplete) list of some of the advantages of having a formal grammar for JSON Schema:

What happens when there is no formal grammar?

To illustrate what sort of issues one faces when there is no formal and completely unambiguous specification of JSON Schema we designed several tests for checking the properties which were not fully defined in the latest JSON Schema draft. These test are then passed onto five existing implementations of the latest JSON Schema draft. As we will see, the main problem is that when checked against the same schemas and the same documents, the validators return different results, which is certainly not the outcome one desires. Note that these test do not reflect on the quality of these validators, they simply illustrate what happens when the specification allows a lot of freedom in interpreting how certain aspects of the schema should work. Below we explain the different tests and list the validators that were used, together with the validation results.

The tests

Here we describe the schemas and documents used in each test, as well as provide some intuition on which aspects of the JSON Schema specification are they intended to examine.

T1

The first test, denoted T1, checks if the validator considers JSON documents to be ordered or not (although this does not refer to the schema specification, it still illustrates that all elements participating in the validation process should be formally specified).

{
"type": "array",
 "uniqueItems": true
 }

The document we will be validating against this schema is intended to check if the order of keys in two (syntactically) identical JSON documents matters, and it is given below:

[ {"a": 3, "b": 4}, {"b": 4, "a": 3} ]
T2

The second test, T2, is intended to check how validators handle the keyword "$ref", which is intended to overwrite any other alternative specification at the same level (see here for more details).

{
 "definitions": {
    "a": {"type": "string"}
   },
 "$ref": "#/definitions/a",
 "type": "integer"
}

The document in T2 is simply the string "hola".

T3

In T3 we use a schema which requires two properties "a" and "b" to be present, but also specifies that the document is a number which is a multiple of 3. This schema is intended to test how the validator handles enforcing types.

{
 "type": "object",
 "required": ["a", "b"],
 "multipleOf": 3
}

The document is simply the number 4.

T4

Our final test (T4) uses a schema containing a cyclical reference (which is allowed in the current draft of JSON Schema).

{
 "definitions": {
      "a": {"$ref": "#/definitions/a"}
     },
  "anyOf": [
          {"$ref": "#/definitions/a"},
          {"type": "string"}
         ]
}

The document is simply the string "hola".

The validators used

The validators were downloaded from their respective github repositories, or dedicated Web pages. For those also having an online version we list the corresponding links.

V1: JSON Schema for Python

V2: A JavaScript validator for JSON Schema

V3: JSON Schema validator in Java (also available online)

V4: Ruby JSON Schema Validator

V5: Another Python validator

We would like to remark that all of these validators claim to pass the JSON Schema test suite.

Results and discussion

The results that we ran in September 2015 are summarised in the table below (please keep on reading for the changes in new versions of the validators). Here Y stands for accepting the document corresponding to the test, N that the document does not validate, and '--' that the validator does not support this type of schema.

V1 V2 V3 V4 V5
T1 N Y Y N Y
T2 Y N Y N Y
T3 N Y N N N
T4 -- -- N -- --

As we can see, different validators implement different features of the JSON Schema specification draft in a different way, which is certainly not the way we want them to behave, since using a different validator might result in accepting a different class of documents.

To start of, as T1 demonstrates, the validators do not even agree on what is the proper definition of a JSON document. In particular, it seems that some of them consider the documents to be ordered, while the others do not.

Next, T2 shows that how the validators treat the $ref keyword is also not uniform. In particular, some of them do give it a priority over other schemas at the same level, but some do not.

As far as T3 is concerned, it seems that most validators do enforce the type of the JSON document correctly, although even here there are some exceptions.

Finally, using cyclic references, although permitted by the JSON Schema draft, is not supported by most validators, and in the one case that it was supported, the returned result was not correct. On the other hand, it is also debatable how such a property should be implemented, since it could lead to potentially infinite validation sequences.

Therefore, the lack of a strict formalisation leaves us with different implementations which can disagree in certain cases. This could potentially result in certain problems, for instance in a scenario where the developers who use different validators wish to exchange their JSON documents and except them to satisfy a certain agreed-upon schema, since the results they get might be different. In order to avoid such scenarios in the rest of this section we provide a formal grammar for JSON Schema and define its semantics in an unambiguous way. In particular, we make sure that JSON files are interpreted correctly (T1), that references and types are enforced as intended (T2 and T3), and that cyclic references which can lead to infinite validation (T4) are not permitted.

All of these recommendations are implemented in the formal specification we present and are also available in our WWW 2016 paper Foundations of JSON Schema.

What is interesting is that between the time we ran the tests (September 2015) and the day our paper was published the validators we used were updated, so we re ran the tests again in July 2016. The updated results are given below:

V1 V2 V3 V4 V5
T1 N N N N N
T2 Y N Y N Y
T3 N N N N N
T4 -- -- N -- --

As we can see, the problems with T1 are now completely resolved and all the validators interpret JSON documents correctly, and the same holds for T3, as the new versions now enforce the type of a document as intended. Unfortunately, T2 still suffers from the same issues, since the specification is not precise in that respect, and there were no changes with respect to T4.

Intrestingly enough, the changes that were implemented are in accordance with the formal specification we (independently) proposed, thus showing how the theoretical ideas we present go hand in hand with the solutions JSON developers came up with in practice.