Skip to content

Commit

Permalink
Merge pull request #33 from clarin-eric/develop
Browse files Browse the repository at this point in the history
Merge the develop branch where the 1.2 release was prepared
  • Loading branch information
menzowindhouwer authored Jan 7, 2021
2 parents 1d7270a + 19bfa7b commit 155404a
Show file tree
Hide file tree
Showing 38 changed files with 3,641 additions and 1,202 deletions.
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,9 @@ run/
nb-configuration.xml
nbactions.xml

.classpath
.factorypath
.project
.settings
.vscode
.DS_Store
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
sudo: false
language: java
jdk:
- openjdk8
- openjdk11

40 changes: 22 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,10 @@ OAI-PMH **endpoint**.

# Building

Building this app requires JDK 8 and Apache Maven. It can be built
Building this app requires JDK 11 and Apache Maven. It can be built
simply using the command:

```mvn clean package assembly:assembly```
```mvn clean install```

If you use a Java IDE, it is highly likely it also offers a simple way
to do the above.
Expand All @@ -37,7 +37,7 @@ You can also use the `build.sh` script to run a build within an environment
provisioned with suitable versions of the JDK and Maven. Requires docker.

The above build process creates a package named
`oai-harvest-manager-x.y.z.tar.gz` (where x.y.z is a version number).
`target/oai-harvest-manager-x.y.z.tar.gz` (where x.y.z is a version number).

# Running the Application

Expand All @@ -60,6 +60,8 @@ override the timeout value defined in `config.xml`, if any. The first
parameter that does not contain = is taken as the configuration file
name.

If you used `build.sh` to run a build you can use `run.sh config.xml` to run this build


# Configuration

Expand All @@ -77,8 +79,9 @@ file. The configuration file is composed of four sections:
listed.

To get a clear idea of the structure of the configuration file, see
the [sample configuration files](src/main/resources) in juxtaposition
with the explanation for each section below.
the [sample configuration files](src/main/resources) or the
[CLARIN configuration files](https://github.com/clarin-eric/oai-harvest-config) in
juxtaposition with the explanation for each section below.

## Configuring Settings

Expand Down Expand Up @@ -142,7 +145,11 @@ action types are available:
- The *transform* action applies a mapping, defined in an XSLT file,
to the metadata record. This can be used, among other things, for
semantic mapping between metadata schemata. See the included
configuration files for an example.
configuration files for an example. The XSLT recieves various parameters:
1. ```config``` the configuration file used
2. ```provider_name``` the provider name
3. ```provider_uri``` the endpoint
4. ```record_identifier``` the id of the record to transform

For each provider, the first format definition that the provider
supports will determine the action sequence to be executed. If one of
Expand Down Expand Up @@ -174,6 +181,10 @@ For each provider, the following can be defined:
delay and timeout) can be overwritten for a specific provider by
adding them as attributes to the provider element.

- The attribute *exclusive*, when set to true, indicates that the
provider should be harvested on its own, i.e. no other harvesting threads
should be active, this can be used when a provider has some huge records.

- The provider element may contain multiple *set* child elements,
which specify the names of OAI-PMH sets to be harvested.

Expand All @@ -182,8 +193,10 @@ a *centre registry*. So far, this registry is only used by the CLARIN community.
The registry is specified by its URL. All the provider endpoints defined in the
registry will be harvested. Sometimes, it might be necessary to exclude an
endpoint from the ones defined in the registry. This can be done by specifying
its URL in the configuration file used for harvesting. Please review the
instructions in the configuration files supplied in the package.
its URL in the configuration file used for harvesting. In other cases
an endpoint loaded from the registry needs its specific configuration timeout,
this can be done in a similar vain as excluding. Please review the
instructions in the configuration files supplied in the package.

# Static Providers

Expand Down Expand Up @@ -222,10 +235,6 @@ convenient for debugging specific providers.

# Implementation Notes

Saxon is used as the XPath engine, although only standard APIs are
used and hence changing to a different XPath processor would be
trivial.

Processing for each provider runs in a separate thread. It is not
possible to target a single provider with multiple threads (except in
the special case where sets are used; then it is possible to mention
Expand Down Expand Up @@ -259,9 +268,4 @@ action actionSequences, and 5 each for the directories ```cmdi``` and

The pooling implementation is particularly important when
transformations are used, as preparing a transformation object
involves parsing the XSLT, potentially a time-consuming process.


# Build Status

[![Build Status](https://travis-ci.org/TheLanguageArchive/oai-harvest-manager.png?branch=master)](https://travis-ci.org/TheLanguageArchive/oai-harvest-manager)
involves parsing the XSLT, potentially a time-consuming process.
8 changes: 2 additions & 6 deletions assembly.xml
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,9 @@

<!-- Do the XML files in the resources directory. Please note that Maven
will also include them in them in the jar. -->
<include>config*.xml</include>
<include>oai2.xsl</include>
<include>config*.xml</include>
<include>addOAISetName.xsl</include>
<include>olac2cmdi.xsl</include>
<include>sil_to_iso6393.xml</include>
<include>filter.xsl</include>

<!-- Do include the log4j properties file in the resources directory
while excluding it from the jar. Please refer to the configuration of
Expand All @@ -29,8 +27,6 @@
<outputDirectory></outputDirectory>
<includes>
<include>run-harvester.sh</include>
<include>expand-map.sh</include>
<include>expandMap.xsl</include>
</includes>
<fileMode>0755</fileMode>
<filtered>true</filtered>
Expand Down
2 changes: 1 addition & 1 deletion build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@

#configuration
APP_NAME="oai-harvest-manager"
MAVEN_IMAGE="maven:3.6.3-jdk-8"
MAVEN_IMAGE="maven:3.6.3-jdk-11"
CLEAN_CACHE=${CLEAN_CACHE:-false}

SCRIPT_DIR="$( cd "$(dirname "$0")" ; pwd -P )"
Expand Down
Loading

0 comments on commit 155404a

Please sign in to comment.