Real World Testing

Note

The features described on this page are available in the following Firely Server editions:

  • Firely Scale - 🌍 / 🇺🇸

  • Firely CMS Compliance - 🇺🇸

The Real World Testing functionality of Firely Server is designed to fulfill the requirements of the ONC Health IT Certification Program defined by the 21st Century Cures Act. See ONC Health IT Certification Program - Real World Testing Resource Guide for background.

Real World Testing (RWT) is a process for recording and analyzing statistical data about the REST API behaviour of Firely Server. It allows for retrospectively gathering insights into the response codes of FHIR CRUD requests, as well as custom operations. The functionality enables the collection of all statistics needed for the Firely Server Real World Testing Plans.

Technically, Firely Server allows to execute pre-defined queries against a dataset containing statistical data, stored in an external InfluxDB. The queries, defined as Flux queries, are distributed using a Library resource as part of the Firely Server admin db.

Introduction

Firely Server provides an API for executing the mentioned Flux queries remotely against the metrics collected in the InfluxDB backend. The response contains aggregations using a denominator/numerator style as outlined by the Firely real world testing plan. This API allows users with access to administration endpoint to query PII-free data to get insights about Firely Server usage.

The operation is based on the Library resource, which must contain a base64 encoded Flux query. This query can include placeholders for parameters that are dynamically replaced when the operation is executed. The RWT operation follows the FHIR Asynchronous Interaction Request Pattern, similar to Bulk Data Export, providing a robust mechanism for handling intensive data processing tasks. Read more about the async request flow.

Note

The Real World Testing operation requires the use and maintenance of an externally provided components: InfluxDB, Telegraf and OpenTelemetry collector. Firely Server does not provide these capabilities out-of-the-box. See InfluxDB OSS , Telegraf OSS and OpenTelemetry Collector OSS for more details.

Configuration

To start using Real World Testing you will first have to add the relevant plugins (Vonk.Plugin.RealWorldTesting) to the PipelineOptions in the appsettings.

"PipelineOptions": {
   "PluginDirectory": "./plugins",
   "Branches": [
     {
       "Path": "/",
       "Include": [
         "Vonk.Core",
         "Vonk.Fhir.R4",
         //"Vonk.Fhir.R5",
         "Vonk.Repository.Sqlite.SqliteVonkConfiguration",
         ...
       ],
       "Exclude": [
         "Vonk.Subscriptions.Administration"
       ]
     },
     {
       "Path": "/administration",
       "Include": [
         "Vonk.Core",
         "Vonk.Fhir.R4",
         "Vonk.Plugin.RealWorldTesting"
         "Vonk.Repository.Sqlite.SqliteTaskConfiguration",
         "Vonk.Repository.Sqlite.SqliteAdministrationConfiguration",
         "Vonk.Plugins.Terminology",
         "Vonk.Administration",
         ...
       ],
       "Exclude": [
         "Vonk.Core.Operations"
       ]
       ... etc

Note

RealWorldTesting works as an asynchronous operation. To store all operation-related information, it is necessary to enable a “Task Repository” on the admin database. Please enable the relevant “Vonk.Repository.[database-type].[database-type]TaskConfiguration” in the administration pipeline options, depending on the database type you use for the admin database. All supported databases can be used as a task repository. In the example above we have enabled the task repository for SQLite: “Vonk.Repository.Sqlite.SqliteTaskConfiguration”.

Please make sure that $realworldtesting and realworldtestingstatus are enabled in the administration operations in the settings:

{
  "Administration": {
    ...
    "Operations": {
        "$realworldtesting": {
            "Enabled": true
        },
        "$realworldtestingstatus": {
            "Enabled": true
        },
    }
    ...
  }
  ...
}

To configure RWT one needs to also have values for connecting to InfluxDB configured.

"RealWorldTesting": {
    "InfluxDbOptions": {
        "Host": "https://influxdb-host-url",
        "Bucket": "bucket-name",
        "Token": "bucket-connection-token",
        "Organization": "organization-name"
    }
}

InfluxDb has a concept of buckets and organizations, so one would need to use the same bucket for writing and reading data to the backend. However it is advised to use tokens with different access rights, since querying data while executing RWT operation only requires read access enabled.

In addition, there is the following configuration section for the Real World Testing operation itself:

"RealWorldTesting": {
    "RepeatPeriod": 60000,
    "InfluxDbOptions": {
        // ... see above
    }
}

In RepeatPeriod you can configure the polling interval (in milliseconds) for checking the Task queue for a new operation task.

Next to the configuration for reading statistics from InfluxDB, as the RWT operations rely on the Opentelemewtry traces generated by Firely Server, one needs to enable the OpenTelemetry tracing in the appsettings and configure the endpoint to which the traces are sent :

"OpenTelemetryOptions": {
    "EnableTracing": true,
    "Endpoint": "http://otlp-collector-url:4317"
}

The specified endpoint should point to the GRPC endpoint of the OpenTelemetry collector which is connected to a Telegraf instance for processing OpenTelemetry traces.

See also

For more details on configuring OpenTelemetry, refer to OpenTelemetry.

As part of the OpenTelemetry collector configuration, one has to (at least) specify:

  • an importer exposing OTLP using the GRPC protocol,

  • a processor filtering out the liveness and readiness check from the statistics

  • a processor selecting the requests,

  • an exporter targeting a telegraf service

  • a service connecting the above components.

Below is an example of configuration, where the OpenTelemetry collector is configured to receive traces from Firely Server and forward them to Telegraf:

receivers:
  otlp:
    protocols:
      grpc:
  ...

exporters:
  otlp/telegraf:
    endpoint: http://telegraf.influxdb.svc.cluster.local:4311
    tls:
      insecure: true

processors:
  batch: {}
  filter/health: #https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/filterprocessor
    error_mode: ignore
    traces:
      span:
        - 'attributes["url.path"] == "/$$liveness"'
        - 'attributes["url.path"] == "/$$readiness"'
      filter/health: #https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/filterprocessor
    error_mode: ignore
    traces:
      span:
        - 'attributes["url.path"] == "/$$liveness"'
        - 'attributes["url.path"] == "/$$readiness"'
  filter/requestmeter:
    spans:
      include:
        match_type: strict
        attributes:
          - key: "scope"
            value: "request"

service:
  pipelines:
    traces/requestmeter:
      receivers: [otlp]
      exporters: [otlp/telegraf]
      processors: [filter/health, filter/requestmeter, batch]
    ...

Firely Server also requires a specific Telegraf config. In particular,

  • an input corresponding to the output of the OpenTelemetry collector

  • a processor for executing a Starlark script converting traces into metric points

  • an output for sending the metrics to InfluxDB

Below is an example of configuration:

[[inputs.prometheusremotewrite]]
  listen_address = ":9201"
  metrics_schema = "prometheus-v2"
[[processors.starlark]]
  script = "/etc/telegraf/scripts/starlark.star"
[[outputs.influxdb_v2]]
  urls = ["http://influxdb.influxdb.svc.cluster.local:8086"]
  token = "<influxdb-write-token>"

The script for the Starlark processor should be placed in the specified location and should look like this:

load("json.star", "json")

def apply(metric):
    if "attributes" in metric.fields:
        attrs_json = metric.fields["attributes"]
        attrs = json.decode(attrs_json)

        # if it is a request move measurment to requests collection
        if "scope" in attrs and attrs["scope"] == "request":
            metric.name = "requests"
            attrs.pop("scope") # remove scope from attributes
        else:
            return metric #if it is not a request, return the metric as is

        # copy attributes to tags and drop
        for k, v in attrs.items():
            metric.tags[k] = str(v)
        metric.fields.pop("attributes")

        # Collect only duration field and drop the rest
        fields_to_remove = [field for field in metric.fields if field != "duration_nano"]

        # Drop unwanted fields
        for field in fields_to_remove:
        metric.fields.pop(field)
    else:
        return None #if there are no attributes, drop this trace

    return metric

Please ensure that Telegraf is afterwards forwarding all metrics to InfluxDb to the same bucket as configured under the InfluxDbOptions. When executing any REST API request against Firely Server, corresponding traces should be visible in InfluxDB afterwards.

Note

Real World Testing is a powerful feature that requires careful configuration and setup. It is recommended to test your queries and configurations in a staging environment before deploying to production.

Note

In order to demonstrate the required setup for the RWT feature on a Kubernetes cluster, we have added the required dependencies in the Firely Server helm chart and the values.yaml contains basic settings for the influxdb2, telegraf and opentelemetry collector charts. However, we highly recommend deploying and configuring independently InfluxDB, Telegraf and OpenTelemetry collector.

Using Real World Testing

To initiate a Real World Testing operation, construct a request to the administration endpoint with the necessary parameters, such as the URL of the Library resource containing the query, and any additional parameters specified within the Library resource. For example:

GET {{BASE_URL}}/administration/$realworldtesting?url=https://fire.ly/fhir/Library/rwt-all-requests&from=2024-03-18T14:34:16.772Z&to=2024-03-18T14:34:52.453Z

Alternatively a POST request might be executed, here query parameters are passed as a Parameters resource in request body:

POST {{BASE_URL}}/administration/$realworldtesting
{
    "resourceType": "Parameters",
    "parameter": [
        {
            "name": "url",
            "valueUri": "https://fire.ly/fhir/Library/rwt-all-requests"
        },
        {
            "name": "from",
            "valueDateTime": "2024-03-18T14:34:16.772Z"
        },
        {
            "name": "to",
            "valueDateTime": "2024-03-18T14:34:52.453Z"
        }
    ]
}

This request triggers the execution of the specified Flux query against the InfluxDB dataset, with the provided parameters dynamically injected into the query.

Operation Response

Upon successful initiation, the operation returns a 202 status code with a Content-Location header pointing to a status endpoint where the operation’s progress and results can be monitored:

{{BASE_URL}}/administration/$realworldtestingstatus?_id=7e700b18-d8b0-40da-8deb-f6d1d6a51b23

There are six possible status options:

  1. Queued

  2. Active

  3. Complete

  4. Failed

  5. CancellationRequested

  6. Cancelled

  • If a task is Queued or Active, GET $realworldtestingstatus will return the status in the X-Progress header

  • If a task is Complete, GET $realworldtestingstatus will return the results with a result bundle (see example below).

  • If a task is Failed, GET $realworldtestingstatus will return HTTP Statuscode 500 with an OperationOutcome.

  • If a task is on status CancellationRequested or Cancelled, GET $realworldtestingstatus will return HTTP Statuscode 410 (Gone).

{
    "resourceType": "Bundle",
    "type": "batch-response",
    "entry": [
        {
            "response": {
                "status": "200 OK",
                "location": "{{BASE_URL}}/administration/$realworldtesting?url=https://fire.ly/fhir/Library/rwt-all-requests&from=2024-03-18T14:34:16.772Z&to=2024-03-18T14:34:52.453Z"
            },
            "resource": {
                "resourceType": "Parameters",
                "parameter": [
                    {
                        "name": "value",
                        "valueInteger": 42
                    }
                ]
            }
        }
    ]
}

Default RWT metrics

By default the admin db of Firely Server contains the following Library resource with Flux queries:

This metrics reports the total number of requests per custom operation

This metrics reports the total number of requests over all REST API interactions

Library Resource Requirements

For evaluating statistics it is possible to create custom Flux queries stored within Library resources. The following requirements need to be meet:

  • The Library resource should be a valid FHIR Library resource according to specification

  • The content.data element is expected to contain base64 encoded Flux query to be executed against InfluxDB.

  • The parameter element may be filled with one or more ParameterDefinition values. The following ParameterDefinition types are allowed: string, integer, decimal, date, dateTime. These parameters define query parameters that are expected to be defined in the Flux query, as well as required for $realworldtesting operation request.

Note

The Library resource’s Flux query must be designed to return a single numeric value. Ensure that your query properly aggregates or processes the data to meet this requirement. Keep in mind that the Library needs to added to the administration database.

An example Library can be found below:

{
    "id": "rwt-all-requests",
    "resourceType": "Library",
    "type": {
        "coding": [
            {
                "system": "http://terminology.hl7.org/CodeSystem/library-type",
                "code": "logic-library",
                "display": "Logic Library"
            }
        ]
    },
    "url": "https://fire.ly/fhir/Library/rwt-all-requests",
    "version": "1.0.0",
    "name": "rwt-get-all-requests",
    "title": "RWT All requests",
    "subtitle": "RWT query to collect all requests for a specific period of time",
    "status": "active",
    "experimental": true,
    "date": "2024-03-05T00:00:00+00:00",
    "publisher": "Firely",
    "description": "RWT query to collect all requests for a specific period of time from InfluxDb",
    "copyright": "Firely",
    "parameter": [
        {
            "name": "from",
            "use": "in",
            "min": 1,
            "max": "1",
            "type": "dateTime",
            "documentation": "Start date of the period to be queried"
        },
        {
            "name": "to",
            "use": "in",
            "min": 1,
            "max": "1",
            "type": "dateTime",
            "documentation": "End date of the period to be queried"
        },
        {
            "name": "bucket",
            "use": "in",
            "min": 1,
            "max": "1",
            "type": "string",
            "documentation": "InfluxDb bucket to be queried"
        }
    ],
    "content": [
        {
            "contentType": "text/plain",
            "title": "Get all requests query",
            "data": "ZnJvbShidWNrZXQ6ICJ7YnVja2V0fSIpCiAgfD4gcmFuZ2Uoc3RhcnQ6IHtmcm9tfSwgc3RvcDoge3RvfSkKICB8PiBmaWx0ZXIoZm46IChyKSA9PiByWyJfbWVhc3VyZW1lbnQiXSA9PSAicmVxdWVzdHMiKQogIHw+IGNvdW50KCkKICB8PiBncm91cCgpCiAgfD4gc3VtKCk="
        }
    ]
}

Inserting Request Data Into Flux Query

Along with the general guidelines on Flux, there is a syntax rule for injecting $realworldtesting operation parameters into the queries. The following syntax is treated as a placeholder for a parameter values.

Curly braces are treated as a placeholder for a value to be replaced with a query parameter from $realworldtesting request.

Here is an example of a complete flux query containing placeholder parameters ({bucket}, {to} and {from}):

from(bucket: "{bucket}")
|> range(start: {from}, stop: {to})
|> filter(fn: (r) => r["_measurement"] == "requests")
|> count()
|> group()
|> sum()

The {bucket} placeholder is special, since it is used to inject the bucket value from the appsettings. So it is advised to use it with that in mind. All the placeholder parameters are replaced if:

  1. The Library resource defines parameters with the same names as a placeholder name (text in between opening and closing curly braces)

  2. $realworldtesting request supplies those parameters

Note

There are some restrictions for the parameter values that can be injected. Currently , , |, >, (, ), are not allowed symbols, and the $realworldtesting operation request will return HTTP 400 (BadRequest) if any of those symbols are present.