Updating the Data Model

Schema Validation and Submission

Data administrators can configure custom data submission validations for Song by creating and submitting Song schemas. These schemas act as blueprints for validating submissions, ensuring that every piece of data adheres to the requirements specified by the administrators. This validation process guarantees that all essential fields are included and that the data within these fields conforms to the designated data types or permitted value sets.

Integration with Song's Base Schema

Song merges all admin-defined schemas with its pre-existing base schema. Therefore, when creating your schemas, it is important to reference the base schema to avoid specifying conflicting properties.

Song Base Schema

The Song base schema can be restrictive for data models outside of cancer research contexts, as it requires tumor and normal samples. We are aware of this limitation and are currently working on a new data-agnostic submission system. For more information, check out our software currently under development section

Building Schemas

Schema Basics

The following schema defines any data submission using the analysisType of exampleSchema must contain two fields, field1 (a string) and field2 (a number):

{
  "name": "exampleSchema",
  "schema": {
    "type": "object",
    "required": ["field1", "field2"],
    "properties": {
      "field1": {
        "type": "string"
      },
      "field2": {
        "type": "number"
      }
    }
  }
}

There are many different type values available in JSON schema, here is a list of commonly used JSON type values definitions:

string Textual data e.g., "a word"
number Numeric data (integer or float), e.g. -5, 10, -5.8, 10.2
integer Integer values (16, 0, -20)
boolean Boolean values (true or false)
object Key-value pairs where keys are strings and values can be any type
array Ordered lists of items, which can contain any data type
enum A fixed set of values.
null Represents a null value

JSON Schema can also include various additional constraints:

Regex Patterns: Fields can use regex patterns to enforce specific formatting rules
```
"field1": {
"type": "string",
"pattern": "^[A-Za-z]+$"
}
```
Required Fields: Defines which fields must be present in the data object
```
"required": ["field1", "field2"]
```
Array Constraints: Allows setting minimum (minItems) and maximum (maxItems) array lengths
```
"field3": {
"type": "array",
"minItems": 1,
"maxItems": 5
}
```

Conditional Logic (if-then): Logic to enforce required fields based on conditions.

"if": {
"properties": { "field4": { "const": "value1" } }
},
"then": {
"required": ["field5"]
}

Basic Example

In the context of Song here is a basic schema, it requires at a minimum, an analysisType defined by the name field and a single object within it. In the example below, this object is a field termed experiment:

{
 "name": "basicSchemaExample",
 "schema":{
     "type": "object",
     "required":[
         "experiment"
     ],
     "properties":{
        "experiment":{}
     }
  }
}

Click here for a detailed breakdown

name is the name of the schema, which identifies the schema for validation purposes
schema contains the schema definition
type is the data type of the schema, in this case, an object
required is a list of fields that must be included in any data submission validated against this schema
properties are the fields that the schema expects. In this example, the schema expects an experiment field.

The analysisType is defined in data submissions to Song. This field informs Song which data model the submission should be validated against. Below is an example of a mock data submission:

{
  "studyId": "MICR-CA",
  "analysisType": {
    "name": "basicSchemaExample"
  },
  "experiment": "myNewExperiment"
}

In this example, the schema named basicSchemaExample is used to validate the data submission
The analysisType field specifies that the submission should adhere to the basicSchemaExample model, ensuring that all required fields, such as experiment, are present and correctly formatted
The field studyId comes from and is required by Song's base schema and is used to identify what group this collection of data belongs to

Detailed Examples

Let's break down some more complex schema examples. We will pull from a reference schema that can be found here. In the following sections, we will provide snippets of this schema along with explanations of the structure, function, and any embedded logic.

Required Fields: Here, our Schema dictates that "donor", "specimen", "workflow", and "experiment" are required fields:

{
"name": "quickStartSchema",
"schema": {
    "type": "object",
    "required": ["donor", "specimen", "workflow", "experiment"],
    "properties": { ... 

Enum, Types, and Patterns: Within workflow, we can see the use of propertyNames, enum, required fields, types and regex patterns

    "workflow": {
        "propertyNames": {
        "enum": ["workflowName", "workflowShortName", "workflowVersion", "genomeBuild", "inputs","sessionId","runId"]
        },
        "required": ["workflowName", "genomeBuild", "inputs"],
        "type": "object",
        "properties": {
        "workflowName": {
            "type": "string",
            "pattern": "^[a-zA-Z][a-zA-Z0-9 _\\-]+[a-zA-Z0-9]+$"
        },
        "workflowShortName": {
            "type": "string",
            "pattern": "^[a-zA-Z][a-zA-Z0-9_\\-]+[a-zA-Z0-9]+$"
        },
        "workflowVersion": {
            "type": "string"
        },
        "genomeBuild": {
            "type": "string",
            "enum": ["GRCh37", "GRCh38_hla_decoy_ebv", "GRCh38_Verily_v1"]
        },

Click here for a detailed breakdown

workflow Defines an object containing properties related to workflow.
propertyNames Limits the allowed property names within workflow to those listed
required specifies that workflowName, genomeBuild, and inputs must be present within the workflow object
type indicates that workflow is an object type and therefore contains nested key value pairs
workflowName Requires a string ("type": "string") that matches the specified regex pattern ("pattern": ^[a-zA-Z][a-zA-Z0-9 _\\-]+[a-zA-Z0-9]+$). This ensures it starts with a letter, allows alphanumeric characters, spaces, underscores, and hyphens, and ends with alphanumeric characters.
genomeBuild requires a string ("type": "string") that can only be one of the specified values ("enum": ["GRCh37", "GRCh38_hla_decoy_ebv", "GRCh38_Verily_v1"]).

minItems & maxItems: The JSON Schema below is a simplified workflow property made to show the usage of minItems and maxItems

    "workflow": {
        "propertyNames": {
        "enum": ["inputs"]
        },
        "required": ["inputs"],
        "type": "object",
        "properties": {
        "inputs": {
            "type": "array",
            "items": {
            "type": "object",
            "properties": {
                "tumourAnalysisId": {
                "type": "string",
                "pattern": "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{13}$"
                },
                "normalAnalysisId": {
                "type": "string",
                "pattern": "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{13}$"
                },
                "analysisType": {
                "type": "string"
                }
            }
            },
            "minItems": 1,
            "maxItems": 2
        },
        }

Click here for a detailed breakdown

"minItems": 1, If you submit data according to this schema then you must include at least one complete set of inputs (A complete set consists of analysisType, normalAnalysisId, and tumourAnalysisId).
"maxItems": 2 You can include up to two complete sets of inputs in a single submission.

The minItems and maxItems constraints apply to the number of these sets (or arrays) within the inputs array, not to the individual fields within each set.

Conditional Logic: The Schema segment below demonstrates the usage of conditional if, and then logic used to determine if the fields causeOfDeath and survivalTime are required
```
        "vitalStatus": {
            "type": "string",
            "enum": ["Alive", "Deceased"]
        },
        "if": {
        "properties": {
            "vitalStatus": {
            "const": "Deceased"
            }
        }
        },
        "then": {
        "required": ["causeOfDeath", "survivalTime"]
        }
    },
```
Click here for a detailed breakdown
This conditional schema structure allows for dynamic validation based on the value of vitalStatus, ensuring that causeOfDeath and survivalTime are only required when vitalStatus is Deceased.
- If vitalStatus is "Deceased", then the submission must include causeOfDeath and survivalTime.
- If vitalStatus is "Alive", then there are no additional requirements needed.
- const is a validation keyword that specifies that a property's value must exactly match for the submission to be considered valid

Null Values: Null values can provide flexibility by allowing a property to be explicitly null when no valid string value is applicable or known. The schema segment below shows the use of a null enum value for a relapseType property.

"relapseType": {
    "type": ["string", "null"],
    "enum": [
    "Distant recurrence/metastasis",
    "Local recurrence",
    "Local recurrence and distant metastasis",
    "Progression (liquid tumours)",
    null
    ]

Click here for a detailed breakdown

If relapseType is a string, it must match exactly one of the values listed in the enum array ("Distant recurrence/metastasis", "Local recurrence", "Local recurrence and distant metastasis", "Progression (liquid tumours)")

If relapseType is null

{
  "relapseType": null
}

The above key value pair is considered valid according to the schema. This allows for scenarios where relapseType might not have a defined value or where its value is intentionally absent or unknown.

Minimum & Maximum: Minimum and maximum keywords in JSON Schema provide straightforward ways to enforce numerical constraints, ensuring that data adheres to specified ranges or limits
```
"treatmentDuration": {
"type": "integer",
"minimum": 0
}
```
Here, "minimum": 0 ensures that treatmentDuration can only accept non-negative integer values

Want to learn more?
If you want to learn more about JSON schema take a look at the following JSON Schema guide.

Updating the Schema

You can update Song schemas using the Song server's Swagger UI or using curl commands.

Using the Swagger UI

The Song Swagger UI provides a user-friendly interface to interact with Song's API endpoints. You can access it at the Song server URL appended with /swagger-ui.html. For the quickstart, this will be http://localhost:8080/swagger-ui.html.

To update the schema using the Swagger UI:

Locate the Schema API endpoints From the schema dropdown, find the POST RegisterAnalysisType endpoint.
Select Try it out and input your API key and Schema: enter your authorization token in the authorization field (Bearer {API-Key}), and place your new schema inside the request field.

API Keys are brokered by Keycloak and accessible when logged in to the Stage UI. For the Overture QuickStart, Stage can access from localhost:3000
- Login through the Stage UI by selecting login from the top right. Default credentials when using the Overture QuickStart will be username admin and password admin123.
- Generate a new API token by selecting Profile and Token from your user drop down found on the top right of the Stage UI, select Generate New Token.
Select Execute: expected responses as well as response codes and descriptions, are conveniently documented within Swagger-UI.

Verifying Schemas
To verify your schema has successfully been added, you can use the GET ListAnalysisTypes endpoint found under the Schema dropdown. If updating a pre-existing schema, use the GET GetAnalysisTypeVersion endpoint.

Using the Curl command

The following curl command makes a POST request with the required authorization tokens, headers and data:

curl -X POST "https://localhost:8080/schemas" -H "accept: */*" -H "Authorization: Bearer {Insert-API-Key}" -H "Content-Type: application/json" -d "{ \"name
\":\"example_demo\", \"schema\": { \"type\":\"object\", \"required\":[ \"experiment\" ], \"properties\":{ \"experiment\": { \"type\": \"object\", \"required\": [\"experiment_type\"], \"propertyNames\": { \"experiment_type\":{ \"type\":\"string\", \"enum\": [\"WGS\",\"RNA-Seq\"] }, } } } }}"

-X POST "https://localhost:8080/schemas" specifies the request method to be used, in this case, POST, this points to Song's schemas endpoint
-H "accept: */*" adds an HTTP header specifying that the client accepts any type of response
-H "Authorization: Bearer {Insert-API-Key}" adds an HTTP header for authorization, with a Bearer token (replace {Insert-API-Key} with the actual API key).
-H "Content-Type: application/json" Adds an HTTP header specifying the content type of the request body as JSON.
-d {...} is the data to be sent with the POST request. This is the JSON payload defining the schema.

Useful Links

Here are some resources to help with the creation of new schemas for your projects:

Understanding JSON Schema guide: This comprehensive guide provides detailed information on JSON Schema formatting.
Example schema: For a practical reference, you can examine this sample schema used in CanCOGeN's VirusSeq Portal.
Base schema reference: Song incorporates a base schema that combines with all user schemas. When developing your schemas, it's crucial to reference this base schema to avoid conflicting properties and ensure compatibility with Song's base schema structure.

There's no need to write your JSON Schema manually. Several existing tools can help you format your data efficiently:

For basic schemas, JSONschema.net or Liquid Technologies' Online JSON to Schema Converter are excellent resources. These tools allow you to easily convert JSON to JSON Schema.

What's Next?

With your data model updated, we next need to ensure we configure an accurate index mapping to help enable our downstream search and portal UI components

Schema Validation and Submission​

Integration with Song's Base Schema​

Building Schemas​

Schema Basics​

JSON Schema can also include various additional constraints:​

Basic Example​

Detailed Examples​

Updating the Schema​

Using the Swagger UI​

Using the Curl command​

Useful Links​

Schema Validation and Submission

Integration with Song's Base Schema

Building Schemas

Schema Basics

JSON Schema can also include various additional constraints:

Basic Example

Detailed Examples

Updating the Schema

Using the Swagger UI

Using the Curl command

Useful Links