Implement User Defined Functions
The Data Archiving Library provides several APIs for indexing data which you should implement:
-
SimpleUDF.getKeys(metadata, message)
or MultiKeysUDF.getMultipleKeys(metadata, message)
or SplittedUDF.getSplittedKeys(metadata, message)
aggregate(keys, messages)
You can implement this API or reuse the default implementation:
transformForDeadLetter(message)
You can use this API to setup your function, including the registration of user defined metrics:
open(parameters, runtimeContext)
This topic describes these methods. For complete details, see: JavaDoc
For each message, use this method to provide the index attributes to the Data Archiving Library when you have a single value in each attribute. The index attributes can also be extracted from the message.
- The
metadata
argument provides this information: - The time when data was ingested into the HERE platform (
INGESTION_TIME
) - The time when data was processed by the Data Archiving Library (
PROCESSING_TIME
) - The output catalog and index layer ID (
SINK_CATALOG
, SINK_LAYER
) - The input catalog and stream layer ID (
SOURCE_CATALOG
, SOURCE_LAYER
) - The message ID of the input data (
MESSAGE_ID
)
- Note that the attribute names need to match the
indexDefinitions
of the index layer. - The
timewindow
type is required and only one timewindow
should be returned from this method. - If an index definition of type
heretile
is present, then the zoom level of tile value should match the zoom level defined in index layer. - If there is an error or exception in this implemented method, then that message will not be archived.
For each message, use this method to provide the index attributes to the Data Archiving Library when you have multiple values in each attribute. The index attributes can also be extracted from the message.
- With
MultiKeysUDF
, you must return a list of values for each attribute. If you do not return a list of values for each attribute, you will get validation exceptions. - All input parameters have the same content as the
SimpleUDF.getKeys
method. - All validation rules for each value in the values list of each attribute is the same as the
SimpleUDF.getKey
method. - If there is an error or exception in this implemented method, that message will not be archived.
-
All list of values of each attribute will be combined together. Here are some examples:
Example 1:
Input
attribute1: {value1_1, value1_2}, attribute2: {value2_1, value2_2}
Output
{value1_1, value2_1}, {value1_1, value2_2}, {value1_2, value2_1}, {value1_2, value2_2}
Example 2:
Input
attribute1: {value1}, attribute2: {value2_1, value2_2}, attribute3: {value3}
Output
{value1, value2_1, value3}, {value1, value2_2, value3}
Use this method to provide the index attributes to the Data Archiving Library when you have the same value that maps to multiple attributes and you want to split the message into multiple messages. The index attributes can also be extracted from the message.
- With
SplittedUDF
, you can split a message into smaller messages with different indexes. - The indexing attributes for each smaller message will be the key of returned
Map<Map<String, Object>, byte[]>
with corresponding smaller message value as byte[]
. Each attribute is represented as Map<String, Object>
. - All input parameters have the same content as the
SimpleUDF.getKeys
method. - All validation rules for each value of
Map<String, Object>
are the same as the SimpleUDF.getKey
method. - If there is an error or exception in this implemented method, that message will not be archived.
aggregate(keys, messages)
For each group of messages (based on indexing attributes in an aggregation window), this method correlates the group to the attributes (keys) specified in the function parameter keys
.
For the error strategy 'deadletter', the user may define a function to transform a message before archiving in the deadletter layer. See deadletter strategy for more information.
open(parameters, runtimeContext)
Initialization method that is called before the actual working methods (like getKeys/aggregate
) and thus suitable for one time setup work. By default, this method is a no-op.