Overview

Datalake is an accelerator specific to power utilities. When enabled, it can be accessed via the menu button ( ) in the top-left corner of the Utilihive Console.

Utilihive Datalake is a big data store and analytics solution that integrates seamlessly with the energy sector, enabling utilities to deliver data-driven solutions and innovate energy services.

Datalake provides a centralized data repository that focus on handling these domain specific entities:

  • Master data - complete history of assets and business entities.

  • Readings - time series data from sensors and end devices such as smart meters.

  • Events - representing alerts and notifications.

In addition, Utilihive Datalake can store and index arbitrary files, and maintains structural metadata and lineage over all data objects.

Concept

The main concept behind a data lake is to provide low-cost and scalable storage of data at different levels of structure and refinement, ranging from natural/raw form to data readily consumed by business processes. The motivation behind this is to capture and exploit information across a big data domain, whose characteristics is commonly defined by a high data volume, velocity, variety and veracity.

Hence, the key challenge of a data lake typically comes down to finding a good balance between flexibility and performance: The data must be sufficiently structured to enable efficient exploitation and analytics, but must also allow to operate across a broad range of data entities in a flexible manner.

Utilihive Datalake accommodates this challenge by providing constructs for partitioning data across two orthogonal axes:

  1. Data refinement, ranging from unstructured file objects to highly specific tabular structures.

  2. Data access tier, ranging from cold and infrequently accessed data to hot and frequently accessed data.

Using this principle, Utilihive Datalake is able to offer real-time and low-latency analytics as well as batch processing capabilities across a vast data repository.

Capabilities

The main cabilities of Utilihive Datalake are summarized below:

  • Store and index data at scale for a low cost.

  • Classical data warehouse capabilities for domain entities.

  • Exposes APIs that follow industry standards.

  • Highly configurable storage tiers and data lifecycle policies.

  • Flexible schema for inbound data objects, records are linked and structured upon time of use (schema-on-read).

  • Cloud native and cloud agnostic.