Data Models

Datalake is an accelerator specific to power utilities. When enabled, it can be accessed via the menu button ( ) in the top-left corner of the Utilihive Console.

Datalake favors generic data models and open-schema principles in order to serve a wide variety of use cases. The core data models are built on entity-attribute-value (EAV) principles and embrace Resource Description Framework (RDF) semantics, and are designed to keep persisted data compliant with common semantic technologies and data formats. The message envelope as well as the different interfaces also follow well-defined industry standards. By employing data models built on these design principles, Datalake seeks to enable machine understanding and ease integration to and from other systems.

Datalake also allows to plug in custom domain models (such as CIM profiles) and can project the persisted data onto these models during query-time, enabling users and clients to consume their custom model directly over APIs without further needs for mappings and transformations.

All data models are fully documented by either OpenAPI specifications or by GraphQL schema available through the corresponding API for which they are exposed.

Base Data Structure for Time Series

All data entities in Datalake are defined with respect to time. In other words, as time series.

A series, defined as a sequence of observations indexed by time, is keyed on source and a seriesId, which is a composite of entityType, attribute and entityId.

The source is a URI-reference identifying the context in which the series of observations was produced, representing the logical partition of the series. This would typically include information about the system or application producing the data, but may also be used to represent multiple versions of the same series, for instance by using the fragment part of the URI.

The seriesId is analog to a class instance, representing the class name, field and object id, respectively. Hence, the entity type, attribute and entity id further specify what has been observed and by what particular entity or object.

A single observation row is thus uniquely keyed on source, seriesId and time and is represented by a common data model.

Derived Data Structures and Observation Types

Different modes of time series imply differences in data management and schema design. Datalake operates with the following observation types, each being managed in separate tables:

DataPoint

Standard time series consisting of a sequence of data points indexed by time (regular or irregular). Each data point may carry a value that is simple (univariate) or encode multiple measurements (multivariate).

CategoricalEvent

An event occurring at a point in time representing a categorical value, such as a particular alarm, notice or alert. For such observations, one would possibly want to query e.g. histograms across category and time and perform roll-up aggregations at some TTL (instead of applying lifecycle policies for direct deletion).

ValueStateChangeEvent

An event occurring at a point in time representing a change of state that is effective until the next state change occurs.

LinkStateChangeEvent

An event occurring at a point in time representing a change of link state between two objects that is effective until the next link state change occurs.

GeoStateChangeEvent

Change of geolocation state, a special case of ValueStateChangeEvent.

The set of observation types are represented with a union type and share the same table for persistence.

Datalake defines various formats for how a set of observations across different dimenions may be formulated. For further details, refer to the OpenAPI specification which includes complete schemas for each type and all corresponding envelopes.

Examples

Examples for various observations follows.

source entityType attribute entityId Type Description

example:mdm-system:xyz

UsagePoint

forwardWh

usagepoint-123

Fixed-interval DataPoint

Energy consumed on usage point.

example:hes-system:xyz

Meter

0.0.50.4.1.1.12.0.0.0.0.0.0.0.0.3.72.0

meter-123

Fixed-interval DataPoint

Energy consumed on a meter using CIM reading type to describe the attribute.

example:monitoring:xyz#10minRollingWindow

Service

cpu

service-123

Fixed-interval DataPoint

10-min rolling average over CPU consumption of a service.

example:monitoring:xyz#daily

Service

cpuPeak

service-123

Varying-interval DataPoint

Daily CPU peak for a service.

example:monitoring:xyz

Service

cpuThresholdReached

service-123

CategoricalEvent alert

CPU threshold was reached for a service.

example:monitoring:xyz

Service

cpuThresholdChange

service-123

StateChangeEvent

CPU threshold changed for a service.

Due to the nature of open-schema designs, many responsibilities around structure and convention is delegated to the domain model. For instance, a series of wind direction 24-hour forecast at 1000m altitude for a given geohash cell may be represented by different source and seriesId:

source entityType attribute entityId

example:weather-forecast:gfs#24h

geohash

windDirectionDegrees1000m

9q8yy9mf

example:weather-forecast:gfs#24h

geohash1000m

windDirectionDegrees

9q8yy9mf

example:weather-forecast:gfs?altitude=1000m#24h

geohash

windDirectionDegrees

9q8yy9mf

The set of queries that is to be performed against the time series is typically what would dictate the exact model. For instance, for querying vertical wind profiles across locations, it could be natural to treat geohash as the entity type, but for deriving wind momentum flux at any given altitude, operating on attributes that are independent of the altitude and passing in the altitude as a parameter — either through entity type or source — could be a better structure.

Following open-schema design principles enables full flexibility on how states are stored.
Let’s consider 2 scenarios:

  • Object and its state is static, meaning not expected to change over time.
    Storing serialized JSON object as a value appears to be a good approach in this case.

source entityType entityId attribute value version effectiveTime

example

Location

2a799c7e-8a78-4fee-acd1-32ef5bb6af4e

mainAddress

{"streetDetail":{"name":"Storgate","number":"1","withinTownLimits":true},"townDetail":{"code":"0166","country":"Norway","name":"Oslo"}}

1

2020-01-01T16:18:01

example

EndDevice

ff675279-b1e6-47f2-9270-dca8a9a49785

endDeviceInfo

{"isSolidState":true,"phaseCount":1,"ratedCurrent":16.0,"ratedVoltage":230.0,"assetModel":{"modelNumber":"A-XXX","modelVersion":"3.5"},"capability":{"autonomousDst":true,"communication":true,"connectDisconnect":true,"demandResponse":true,"electricMetering":true,"gasMetering":false,"metrology":false,"onRequestRead":true,"outageHistory":true,"reverseFlow":true}}

1

2024-01-15T02:11:43

Updating even one of the object’s values would require updating the whole object, as its entire representation is stored as one value.

  • Another scenario is a situation where state is expected to change over time. In such a case normalizing the object’s state and storing each value independently seems more appropriate.

source entityType entityId attribute value version effectiveTime

example

Asset

301dbc7a-249a-475d-b76d-ec44bbb45536

mrId

v1425

1

2017-01-06T20:03:54

example

Asset

301dbc7a-249a-475d-b76d-ec44bbb45536

type

CONNECTION

1

2017-01-06T20:03:54

example

Asset

301dbc7a-249a-475d-b76d-ec44bbb45536

names

[{"name":"123456789","nameType":{"name":"OSM id","description":"OpenStreetMap ID"}}]

1

2017-01-06T20:03:54

example

Asset

301dbc7a-249a-475d-b76d-ec44bbb45536

description

2-way branch

1

2017-01-06T20:03:54

example

Asset

301dbc7a-249a-475d-b76d-ec44bbb45536

description

3-way branch

2

2017-10-30T10:35:01

example

Asset

301dbc7a-249a-475d-b76d-ec44bbb45536

description

3-way branch

3

2019-07-26T23:12:16

In the example above the description changed over time, and it was possible to update only the changed state, due to the fact that each state is stored separately as an independent value.

Domain Models

Datalake can be configured with a set of domain models and generate API extensions for validating and querying data against the model. CIM profiles according to IEC 61970 and related specifications are especially well-supported, for which a richer set of features in Datalake may be exploited.