APIs

Datalake is an accelerator specific to power utilities. When enabled, it can be accessed via the menu button ( ) in the top-left corner of the Utilihive Console.

Utilihive Datalake exposes a set of standard APIs.

Main HTTP API

The main HTTP API is JSON REST and primarily implements inbound endpoints accepting CloudEvents messages. The API implements endpoints for structured-mode messages, message batches and binary-mode messages for all payload types.

The OpenAPI specification is served at /openapi.json and fully documents all data structures.

Authentication

Inbound message endpoints are intended for machine users, such as integrations, and uses HTTP Basic authentication. Other endpoint groups, such as endpoints for operating workflows, accept Identity JWT authentication.

GraphQL API

The GraphQL API is the main query interface and intended for client applications.

It exposes queries over two graphs formed by each distinct set of data models:

  1. One graph spanned by the core data models used in standard message payloads and database schemas.

  2. One graph spanned by the domain models, typically the CIM profile for the instance.

The Datalake tab in the Utilihive console provides a built-in GraphiQL IDE for writing GraphQL queries and for viewing documentation over all types, fields, queries and arguments.

Authentication

In addition to HTTP Basic users accepted by the main API, the GraphQL API acecpts Identity JWT authentication.

Authorization

Authorization is based on whitelisting Identity organization and roles. When a GraphQL query is requested by an authenticated Identity user, the server will check that the user matches at least one whitelisting rule for the given query.

HTTP basic users are authorized to all queries.

Source compositions

In order to enable queries that combine data from multiple source systems, Datalake may be configured with a data structure to define tree structures of sources, where all source leaves map directly to the source field of CloudEvents messages.

These source compositions may include their own set of whitelisting rules, which cascades and applies to the whole subtree. For any query that fetches data bound to sources, these additional authorization rules apply.

A typical configuration scenario is to authorize a set of roles for a customer organization by default and authorize tenant organizations to access a restricted subset of sources.

Datalake Flows

Datalake maintains a set of standard integrations for flow-server.

Previous Datalake API versions

Previous Datalake API versions are implemented as flows for backwards compatibility with legacy clients and applications.

Mapping of Legacy CIM Messages

Previous Datalake versions include data models based on the IEC 61968-9 "Interface Standard for Meter Reading and Control" standard. The flows for previous API versions simply transform these legacy CIM messages to CloudEvents format, retaining the payload and setting a CloudEvents type corresponding to the type of the CIM message.

The mapping from the payload to the core data models of the current version is delegated to specific payload handlers for the given type. In general, these mappings output the same data structures as for a CIM RDF/XML FullModel and is queryable with the same semantics.

The master resource identifier (mRID) of the legacy data models is handled in a special way in order to ensure a legal RDF identifier (entityId). Although CIM strongly recommends using a UUID for the mRID, and also states that the mRID should be mapped direclty to the RDF identifier, it is not always followed in practice. Therefore, if the mRID is not already a UUID, the mappings will construct and assign a UUID v5 as entityId using the "null" UUID as namespace (00000000-0000-0000-0000-000000000000). The original mRID is still retained and represented as any other value. In RDF terms, such CIM objects lacking a strongly defined identifier may be regarded as blank nodes, and the process of minting a new, globally unique identifier is referred to as skolemization.

Datalake clients that use both the legacy CIM API and the CloudEvents API must ensure to apply the same skolemization of mRIDs when assigning entityId. Flows may use the uh:uuid5() function available in the JSONiq data mapper.

CIM RDF/XML API

Datalake includes flows exposing a message interface according to the IEC 61970-552 specification commonly used for CIM model exchange in the energy domain.

The flows expose a generic restApi accepting the CIM RDF/XML message, maps the data to a CloudEvents message of type BatchStateChangeEvent using the md:Model.scenarioTime from the model header as the effective timestamp for the batch of state changes, and finally posts the message to the main HTTP API.

FullModel flow

The following shows an example CIM RDF/XML FullModel message:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:md="http://iec.ch/TC57/61970-552/ModelDescription/1#"
         xmlns:cim="http://iec.ch/TC57/2015/CIM-schema-cim15#">
  <md:FullModel rdf:about="_f92c6c2c-59c5-4a53-8f86-2053dfe710ed">
    <md:Model.scenarioTime>2024-05-01T09:50:37.000Z</md:Model.scenarioTime>
    <md:Model.created>2024-04-30T15:33:31.000Z</md:Model.created>
    <md:Model.description>Example model export</md:Model.description>
    <md:Model.version>1</md:Model.version>
    <md:Model.profile>http://example.com/CIM/Assets/1/5</md:Model.profile>
    <md:Model.modelingAuthoritySet/>
  </md:FullModel>
  <cim:Substation rdf:ID="_07937ca9-dfd7-4904-9a2d-1b7388162fab">
    <cim:IdentifiedObject.mRID>07937ca9-dfd7-4904-9a2d-1b7388162fab</cim:IdentifiedObject.mRID>
    <cim:IdentifiedObject.name>Example</cim:IdentifiedObject.name>
    <cim:Substation.r0equivalent>0</cim:Substation.r0equivalent>
    <cim:Substation.r1equivalent>0</cim:Substation.r1equivalent>
    <cim:Substation.r2equivalent>0</cim:Substation.r2equivalent>
    <cim:Substation.x0equivalent>0</cim:Substation.x0equivalent>
    <cim:Substation.x1equivalent>0</cim:Substation.x1equivalent>
    <cim:Substation.x2equivalent>0</cim:Substation.x2equivalent>
    <cim:PowerSystemResource.Location rdf:resource="#_a77aa2f6-6d5c-4146-8b5b-59cba34ed558"/>
    <cim:PowerSystemResource.PermissionArea rdf:resource="#_535510ac-f44a-4f3f-b4de-237ce291cba2"/>
  </cim:Substation>
  <cim:Location rdf:ID="_a77aa2f6-6d5c-4146-8b5b-59cba34ed558">
    <cim:IdentifiedObject.mRID>a77aa2f6-6d5c-4146-8b5b-59cba34ed558</cim:IdentifiedObject.mRID>
    <cim:Location.mainAddress>
      <cim:StreetAddress>
        <cim:StreetAddress.streetDetail>
          <cim:StreetDetail>
            <cim:StreetDetail.addressGeneral>Example street</cim:StreetDetail.addressGeneral>
          </cim:StreetDetail>
        </cim:StreetAddress.streetDetail>
      </cim:StreetAddress>
    </cim:Location.mainAddress>
    <cim:Location.CoordinateSystem rdf:resource="#_28f42a62-594d-49a1-8d37-2be180751e50"/>
  </cim:Location>
  <cim:PermissionArea rdf:ID="_535510ac-f44a-4f3f-b4de-237ce291cba2">
    <cim:IdentifiedObject.mRID>535510ac-f44a-4f3f-b4de-237ce291cba2</cim:IdentifiedObject.mRID>
    <cim:IdentifiedObject.name>Example</cim:IdentifiedObject.name>
  </cim:PermissionArea>
  <cim:CoordinateSystem rdf:ID="_28f42a62-594d-49a1-8d37-2be180751e50">
    <cim:IdentifiedObject.mRID>28f42a62-594d-49a1-8d37-2be180751e50</cim:IdentifiedObject.mRID>
    <cim:CoordinateSystem.crsUrn>urn:ogc:def:crs:OGC:1.3:CRS84</cim:CoordinateSystem.crsUrn>
  </cim:CoordinateSystem>
  <cim:PositionPoint rdf:ID="_f3c41e40-0f45-4bcd-a029-1fee5b296de6">
    <cim:PositionPoint.sequenceNumber>0</cim:PositionPoint.sequenceNumber>
    <cim:PositionPoint.xPosition>10.0</cim:PositionPoint.xPosition>
    <cim:PositionPoint.yPosition>60.0</cim:PositionPoint.yPosition>
    <cim:PositionPoint.Location rdf:resource="#_a77aa2f6-6d5c-4146-8b5b-59cba34ed558"/>
  </cim:PositionPoint>
</rdf:RDF>

Posting the message to the endpoint for the FullModel flow will populate the columnar database as follows:

  • All literal properties are represented as separate rows in the valueState table with a string-encoded value.

  • Compound objects, such as Location.mainAddress, are also represented by a valueState row, but with a JSON-encoded value.

  • Links between objects, defined by elements with the rdf:resource attribute, are represented as a row in the linkState table.

  • PositionPoint elements are handled in a special way: Points are grouped by location so that the whole geometry of each location is represented as a row in the geoValueState table.

Given a configured CIM profile for the Datalake instance, the GraphQL API queries the database in a generic fashion and maps it back onto the objects defined by the profile.

If a complete FullModel file is too big for flow-server’s memory, it can simply be split up into multiple files. Although the RDF identifiers use local URI syntax, the inserts are independent of the existing state of the database. The only consideration to splitting a FullModel into multiple files is that all PositionPoints linked to the same Location must reside in the same file.

DifferenceModel flow

The following shows an example CIM RDF/XML DifferenceModel message:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:md="http://iec.ch/TC57/61970-552/ModelDescription/1#"
         xmlns:cim="http://iec.ch/TC57/2015/CIM-schema-cim15#">
  <md:DifferenceModel rdf:about="_f92c6c2c-59c5-4a53-8f86-2053dfe710ed">
    <md:Model.scenarioTime>2024-05-05T15:32:11.000Z</md:Model.scenarioTime>
    <md:Model.created>2024-05-04T23:13:31.000Z</md:Model.created>
    <md:Model.description>Example model update</md:Model.description>
    <md:Model.version>2</md:Model.version>
    <md:Model.profile>http://example.com/CIM/Assets/1/5</md:Model.profile>
    <md:Model.modelingAuthoritySet/>
    <dm:reverseDifferences rdf:parseType="Statements">
      <cim:Substation rdf:ID="_07937ca9-dfd7-4904-9a2d-1b7388162fab">
        <cim:IdentifiedObject.name>Example</cim:IdentifiedObject.name>
        <cim:PowerSystemResource.PermissionArea rdf:resource="#_535510ac-f44a-4f3f-b4de-237ce291cba2"/>
      </cim:Substation>
      <cim:PermissionArea rdf:ID="_535510ac-f44a-4f3f-b4de-237ce291cba2">
        <cim:IdentifiedObject.mRID>535510ac-f44a-4f3f-b4de-237ce291cba2</cim:IdentifiedObject.mRID>
        <cim:IdentifiedObject.name>Example</cim:IdentifiedObject.name>
      </cim:PermissionArea>
    </dm:reverseDifferences>
    <dm:forwardDifferences parseType="Statements">
      <cim:Substation rdf:ID="_07937ca9-dfd7-4904-9a2d-1b7388162fab">
        <cim:IdentifiedObject.name>Updated example</cim:IdentifiedObject.name>
        <cim:PowerSystemResource.PermissionArea rdf:resource="#_b48a8c1c-ee0b-4ea5-8134-41d6da3ab027"/>
      </cim:Substation>
      <cim:PermissionArea rdf:ID="_b48a8c1c-ee0b-4ea5-8134-41d6da3ab027">
        <cim:IdentifiedObject.mRID>b48a8c1c-ee0b-4ea5-8134-41d6da3ab027</cim:IdentifiedObject.mRID>
        <cim:IdentifiedObject.name>New permission area example</cim:IdentifiedObject.name>
      </cim:PermissionArea>
    </dm:forwardDifferences>
  </md:DifferenceModel>
</rdf:RDF>

This example effectively updates the substation with a new permission area and a new name.

Internally in the database, the reverseDifferences are nulled at the effective timestamp defined by the md:Model.scenarioTime property (not to be confused with soft deletes) in order to store the history of states in a time-invariant way. For more information on mechanism of nulling states check out the Persistence section of this documentation.

The forwardDifferences are mapped the same way as FullModel objects. Updates, such as the substation’s IdentifiedObject.name, are collapsed so that only the forward difference state is passed onto the message that is formed and sent to the main HTTP API.

Note that to fully null an object, the reverseDifferences must include all attributes of the object, as encouraged by the specification. Deletes do not cascade to other objects.