Technical preview function is in a state that it can be tried. The development is complete, there is documentation and there are samples, tutorials and hands-on labs as appropriate.
The community is looking for feedback on the function before releasing it. This feedback may result in changes to the external interfaces.
Discovery Engine Open Metadata Access Service (OMAS)¶
The Discovery Engine OMAS provides APIs and events for metadata discovery tools that are surveying the data landscape and recording information in metadata repositories.
The Open Discovery Framework (ODF) provides a comprehensive set of open APIs that describe the interaction between metadata discovery tools and a metadata server. The aim is to make it easy for metadata discovery tools to work with open metadata repositories.
The capabilities defined in the ODF fall into 4 broad categories.
- The metadata server APIs - these are implemented by the Discovery Engine OMAS and include:
- Discovery configuration API - for configuring discovery engines and services - and also retrieving this configuration.
- Asset catalog API - for finding assets in the metadata repository.
- Asset store API - for retrieving a specific asset's metadata and connector.
- Annotation store API - for storing new metadata about the asset.
- The discovery services - these are the specialist plugin services that each perform a particular type of analysis. These are implemented by the metadata discovery tool (or interface with the discovery tool's APIs to drive specific types of analysis).
- The discovery engines - these manage the work of a collection of related discovery services.
- The discovery server - this hosts one or more discovery engines. It provides a REST API to request specific analysis on particular assets, monitor progress of the discovery services and review the results. In Egeria, the discovery server is implemented by the Asset Analysis OMES running in an engine host.
Figure 1 shows how these capabilities work together.
Figure 1: Interfaces of the Discovery Engine OMAS
- The engine host server retrieves configuration from the Governance Engine OMAS.
- When a discovery engine receives a request to analyse an asset, it retrieves the annotations from previous analysis of this asset.
- While the discovery service is running, it is writing new annotations about the asset through the Discovery Engine OMAS.
More details of this processing follows.
Discovery Engine Configuration¶
The configuration of the discovery engines and the discovery services that they support are managed in the metadata server through the Governance Engine OMAS.
The Engine Host OMAG Server is typically located close to the data assets to minimize the network traffic resulting from the analysis. Where the data assets are distributed in multiple locations, it is possible to deploy an Engine Host server in each location so the discovery workload is kept close to the data.
A single Discovery Engine OMAS can support multiple engine hosts deployed in this way.
The Asset Analysis OMES on the engine host server is configured with the location of the metadata server where the Discovery Engine OMAS is running along with the names of the discovery engines it will host. The same discovery engine can simultaneously run on multiple engine host servers. This means the Asset Analysis OMES can host all of the discovery engines it needs to analyse the assets at its location.
When the Asset Analysis OMES starts in the engine host, it calls the Governance Engine OMAS to retrieve the configuration for each of its discovery engines (see Figure 1, number 1). It also connects to the Governance Engine OMAS's out topic to receive any updates on this configuration while it is running.
Within the discovery engine's configuration are the list of discovery request types it supports that are in turn each linked to the discovery service that should run when one of these discovery types is requested to be run against a specific asset. This is shown in figure 2.
Figure 2: Discovery Engine Configuration
Processing Discovery Requests¶
When a discovery request is made, the discovery engine creates an instance of the discovery service and gives it access to a discovery context. The discovery context provides access to existing metadata known about the Asset, a connector to access the data stored in the asset and a store to record the new metadata it has discovered about the asset. Behind the scenes, the discovery context is calling the Discovery Engine OMAS to both retrieve metadata about the Asset and its connector (see Figure 1, number 2), and to store the new metadata (Figure 1, number 3).
The Open Discovery Framework (ODF) provides more information about the discovery engines and discovery services along with the metadata APIs.
In Egeria, both the metadata server where the Discovery Engine OMAS runs and the engine host whether the Asset Analysis OMES runs are types of OMAG Servers. More information on the operation of the engine host can be found under the Engine Services.
An overview of automated metadata discovery approaches is available in the discovery and stewardship feature.