Data Models

Datalake is an accelerator specific to power utilities. When enabled, it can be accessed via the menu button ( ) in the top-left corner of the Utilihive Console.

Utilihive Datalake strives for a balance between strictness and flexibility of data models. Stricter data models perform better, guard data integrity, and yield fewer operational errors. However, the very concept of a data lake must also allow for sufficient flexibility to encapsulate the complete data domain that spans across all relevant business processes.

Therefore, the data models that Utilihive Datalake operates with on the backend data store are designed to be only as strict as they have to be. Utilihive Datalake imposes strictness in the integration layer, where more restrictive data structures can be mapped to and from the base models on the backend.

The majority of data models that are exposed outwards to the user space are based on industry standards. Specifically, IEC 61968-9 2nd Edition (i.e., CIM 2.0). These are well defined data structures and serve as a message-based interface to the data lake. Even though the backend store is designed to handle a much larger data domain, some attributes are inherited from this industry standard and employed in the base models.

Master Data

The following image outlines Datalake’s canonical models and relations:

A flowchart connects several data models.

Master data reflects business assets and their relations. Utilihive Datalake models the master data domain as a time-dependent and directed multigraph whose members are composed of the following base objects:

  • A base node object with attributes such as a unique identifier, a validity period, a node class and subclass (which can be user defined), and other attributes to hold generic properties (e.g., additional identifiers, tags, and the full data body).

  • A base linkage object, which includes attributes denoting the source and target object, class and subclass, and additional metadata for the link.

Object Classes/Nouns

Utilihive Datalake implements objects from CIM 2.0 by bijective mappings to and from these base objects. These more restrictive objects, such as a meter or usage point, are semantically interpreted in order to perform domain specific analytics and operations.

All master data objects implement the IdentifiedObject interface. These include the following:

Type Description

IdentifiedObjectDto

Based on CIM’s root class and represents any object. In practice, this is used in the data lake to model customer-specific entities, such as medium and high voltage assets. These generically identified objects may include a tree of implicitly linked child objects.

CustomerAgreementDto

Enables operations on the customer level and encapsulates the service category and billing information.

UsagePointDto

A logical installation point in the network and typically includes a geographical location utilized in GIS operations. A usage point may or may not have an end device or meter installed.

LocationDto

A physical location, including address and geospatial coordinates. Locations may be linked with multiple usage points and/or generic objects.

EndDeviceDto

Any device, physical or virtual, that collects or represents readings and events. The CIM specification states that readings and events may be linked to a usage point (and possibly other object classes such as customers or groupings). Thus, an EndDeviceDto is always solved by linking through a virtual end device that populates the usage point or name identifiers that reference back to the originating object(s).

UsagePointGroupDto

Represents a homogeneous tree of usage points. Any group or usage point may be a member in multiple other groups, which can be used to model a network of usage points.

EndDeviceGroupDto

Represents a homogeneous tree of end devices. Any group or end device may be a member in multiple other groups, which can be used to model a network of end devices.

Object Relations

The MasterDataLinkageDto type defines linkages between objects. Linkages are directed and always defined in pairs. Links can also be implicit by populating attributes in each model. The following relations between nouns are allowed:

  • CustomerAgreementConfigUsagePointConfig (one-to-many)

  • UsagePointConfigEndDeviceConfig (one-to-many)

  • UsagePointLocationConfigUsagePointConfig (one-to-many)

  • UsagePointGroupsUsagePointGroups (many-to-many)

  • UsagePointGroupsUsagePoint (many-to-many)

  • EndDeviceGroupsEndDeviceGroups (many-to-many)

  • EndDeviceGroupsEndDevice (many-to-many)

  • UsagePointLocationConfigUnknown (one-to-many)

  • Unknown* (many-to-many)

Implicit links defining the direct parent reference chain in the data models (e.g., endDevice.usagePoint.customerAgreement) enable operation on a linked entity as a whole and simplify the initial master data imports where all objects are linked.

Readings

The base model for readings represents time series data and includes specific attributes such as a timestamp, a value, quality flags, a reference to the physical properties that are measured, and other generic attributes to hold metadata.

Readings are tightly coupled with meters but can be scoped by other master data entities as well.

Alarms and Events

The base model for alarms and events is designed in the same fashion as readings but is optimized towards data points that occur at irregular time intervals.

Arbitrary File Objects

Utilihive Datalake also supports the storage and indexing of arbitrary files (e.g., BLOBs). Upon ingestion, the file is tokenized and analyzed, extracting all relevant metadata to be mapped to a base model.

The standard APIs include functionality to perform full text search, filtering, and querying based on the extracted metadata. However, this entity is disjoint from the other data entities.