There are two ways of adding a metadata ingestion source.
- You are going to contribute the custom source directly to the Datahub project.
- You are writing the custom source for yourself and are not going to contribute back (yet).
If you are going for case (1) just follow the steps 1 to 9 below. In case you are building it for yourself you can skip steps 4-9 (but maybe write tests and docs for yourself as well) and follow the documentation on how to use custom ingestion sources without forking Datahub.
:::note
This guide assumes that you've already followed the metadata ingestion developing guide to set up your local environment.
:::
We use pydantic for configuration, and all models must inherit
from ConfigModel
. The file source is a good example.
The reporter interface enables the source to report statistics, warnings, failures, and other information about the run.
Some sources use the default SourceReport
class, but others inherit and extend that class.
The core for the source is the get_workunits
method, which produces a stream of MCE objects.
The file source is a good and simple example.
The MetadataChangeEventClass is defined in the metadata models which are generated
under metadata-ingestion/src/datahub/metadata/schema_classes.py
. There are also
some convenience methods for commonly used operations.
Declare the source's pip dependencies in the plugins
variable of the setup script.
Declare the source under the entry_points
variable of the setup script. This enables the source to be
listed when running datahub check plugins
, and sets up the source's shortened alias for use in recipes.
Tests go in the tests
directory. We use the pytest framework.
Create a copy of source-docs-template.md
and edit all relevant components.
Add the plugin to the table under CLI Sources List, and add the source's documentation underneath the sources folder.
Add the source in get_platform_from_sqlalchemy_uri
function
in sql_common.py if the source has an sqlalchemy source
Add logo image in images folder and add it to be ingested in boot