Preparation release 0.8.1 #1123

lfoppiano · 2024-06-09T22:58:54Z

This PR contains the updates for the release 0.8.1

coveralls · 2024-06-09T23:02:16Z

coverage: 40.787%. remained the same
when pulling b6a2a20 on release-0.8.1
into 694f0ed on master.

coveralls · 2024-06-16T14:35:49Z

coverage: 40.799% (+0.01%) from 40.787%
when pulling 4675511 on release-0.8.1
into 694f0ed on master.

coveralls · 2024-06-17T08:30:12Z

coverage: 40.787%. remained the same
when pulling f1d703c on release-0.8.1
into 694f0ed on master.

coveralls · 2024-06-17T10:16:11Z

coverage: 40.787%. remained the same
when pulling c408076 on release-0.8.1
into 694f0ed on master.

lfoppiano · 2024-06-22T08:37:37Z

I've ran the evaluation with a partial glutton (around 80-90M records from

Since I don't have a GPU machine I can log in, I

first ran the extraction using the client + an instance on GPU + partial glutton.
I renamed the files .grobid.tei.xml to .fulltext.tei.xml and then
I ran the evaluation with no regeneration of the grobid extraction.

Since I did not use the standard method, this should be taken with a pinch of salt.

TLDR: Header metadata and citation context performances have decreased, the rest as increased.

======= Header metadata ======= 

Evaluation on 1943 random PDF files out of 1941 PDF (ratio 1.0).

======= Strict Matching ======= (exact matches)

===== Field-level results =====

label                accuracy     precision    recall       f1           support

abstract             82.45        16.78        16.48        16.63        1911   
authors              95.68        79.94        79.65        79.79        1941   
first_author         98.93        95.29        94.95        95.12        1941   
keywords             94.22        64.99        63.62        64.3         1380   
title                95.65        80.39        79.52        79.95        1943   

all (micro avg.)     93.39        67.94        67.21        67.57        9116   
all (macro avg.)     93.39        67.48        66.84        67.16        9116   


======== Soft Matching ======== (ignoring punctuation, case and space characters mismatches)

===== Field-level results =====

label                accuracy     precision    recall       f1           support

abstract             92.1         63.83        62.69        63.25        1911   
authors              95.97        81.28        80.99        81.14        1941   
first_author         99.01        95.66        95.31        95.48        1941   
keywords             95.5         73.65        72.1         72.87        1380   
title                97.43        88.87        87.91        88.38        1943   

all (micro avg.)     96           81.2         80.33        80.77        9116   
all (macro avg.)     96           80.66        79.8         80.22        9116   


==== Levenshtein Matching ===== (Minimum Levenshtein distance at 0.8)

===== Field-level results =====

label                accuracy     precision    recall       f1           support

abstract             97.68        91.05        89.43        90.23        1911   
authors              97.18        87.02        86.71        86.86        1941   
first_author         99.1         96.12        95.78        95.95        1941   
keywords             97.05        84.16        82.39        83.27        1380   
title                98.55        94.17        93.15        93.66        1943   

all (micro avg.)     97.91        90.91        89.93        90.42        9116   
all (macro avg.)     97.91        90.51        89.49        89.99        9116   


= Ratcliff/Obershelp Matching = (Minimum Ratcliff/Obershelp similarity at 0.95)

===== Field-level results =====

label                accuracy     precision    recall       f1           support

abstract             96.88        87.11        85.56        86.33        1911   
authors              96.36        83.14        82.84        82.99        1941   
first_author         98.93        95.29        94.95        95.12        1941   
keywords             96.35        79.42        77.75        78.58        1380   
title                98.15        92.3         91.3         91.8         1943   

all (micro avg.)     97.33        87.97        87.02        87.49        9116   
all (macro avg.)     97.33        87.45        86.48        86.96        9116   

===== Instance-level results =====

Total expected instances:       1943
Total correct instances:        195 (strict) 
Total correct instances:        786 (soft) 
Total correct instances:        1274 (Levenshtein) 
Total correct instances:        1121 (ObservedRatcliffObershelp) 

Instance-level recall:  10.04   (strict) 
Instance-level recall:  40.45   (soft) 
Instance-level recall:  65.57   (Levenshtein) 
Instance-level recall:  57.69   (RatcliffObershelp) 

======= Citation metadata ======= 

Evaluation on 1943 random PDF files out of 1941 PDF (ratio 1.0).

======= Strict Matching ======= (exact matches)

===== Field-level results =====

label                accuracy     precision    recall       f1           support

authors              97.58        83.04        76.32        79.54        85778  
date                 99.23        94.61        84.26        89.13        87067  
first_author         98.53        89.78        82.5         85.99        85778  
inTitle              96.19        73.23        71.88        72.55        81007  
issue                99.68        91.11        87.76        89.41        16635  
page                 98.61        94.57        83.7         88.81        80501  
title                97.21        79.67        75.31        77.43        80736  
volume               99.44        96.02        89.83        92.82        80067  

all (micro avg.)     98.31        87.22        80.75        83.86        597569 
all (macro avg.)     98.31        87.76        81.44        84.46        597569 


======== Soft Matching ======== (ignoring punctuation, case and space characters mismatches)

===== Field-level results =====

label                accuracy     precision    recall       f1           support

authors              97.65        83.51        76.76        79.99        85778  
date                 99.23        94.61        84.26        89.13        87067  
first_author         98.55        89.95        82.66        86.15        85778  
inTitle              97.85        84.92        83.35        84.13        81007  
issue                99.68        91.11        87.76        89.41        16635  
page                 98.61        94.57        83.7         88.81        80501  
title                98.82        91.44        86.43        88.87        80736  
volume               99.44        96.02        89.83        92.82        80067  

all (micro avg.)     98.73        90.62        83.89        87.13        597569 
all (macro avg.)     98.73        90.77        84.34        87.41        597569 


==== Levenshtein Matching ===== (Minimum Levenshtein distance at 0.8)

===== Field-level results =====

label                accuracy     precision    recall       f1           support

authors              98.45        89.22        82           85.46        85778  
date                 99.23        94.61        84.26        89.13        87067  
first_author         98.58        90.16        82.85        86.35        85778  
inTitle              98.03        86.18        84.59        85.37        81007  
issue                99.68        91.11        87.76        89.41        16635  
page                 98.61        94.57        83.7         88.81        80501  
title                99.14        93.81        88.66        91.16        80736  
volume               99.44        96.02        89.83        92.82        80067  

all (micro avg.)     98.9         91.97        85.14        88.42        597569 
all (macro avg.)     98.9         91.96        85.46        88.56        597569 


= Ratcliff/Obershelp Matching = (Minimum Ratcliff/Obershelp similarity at 0.95)

===== Field-level results =====

label                accuracy     precision    recall       f1           support

authors              98           85.98        79.03        82.36        85778  
date                 99.23        94.61        84.26        89.13        87067  
first_author         98.53        89.8         82.52        86           85778  
inTitle              97.65        83.5         81.95        82.72        81007  
issue                99.68        91.11        87.76        89.41        16635  
page                 98.61        94.57        83.7         88.81        80501  
title                99.08        93.4         88.28        90.77        80736  
volume               99.44        96.02        89.83        92.82        80067  

all (micro avg.)     98.78        91.02        84.26        87.51        597569 
all (macro avg.)     98.78        91.13        84.67        87.75        597569 

===== Instance-level results =====

Total expected instances:               90125
Total extracted instances:              85898
Total correct instances:                38759 (strict) 
Total correct instances:                50899 (soft) 
Total correct instances:                55786 (Levenshtein) 
Total correct instances:                52324 (RatcliffObershelp) 

Instance-level precision:       45.12 (strict) 
Instance-level precision:       59.26 (soft) 
Instance-level precision:       64.94 (Levenshtein) 
Instance-level precision:       60.91 (RatcliffObershelp) 

Instance-level recall:  43.01   (strict) 
Instance-level recall:  56.48   (soft) 
Instance-level recall:  61.9    (Levenshtein) 
Instance-level recall:  58.06   (RatcliffObershelp) 

Instance-level f-score: 44.04 (strict) 
Instance-level f-score: 57.83 (soft) 
Instance-level f-score: 63.38 (Levenshtein) 
Instance-level f-score: 59.45 (RatcliffObershelp) 

Matching 1 :    68335

Matching 2 :    4155

Matching 3 :    1859

Matching 4 :    662

Total matches : 75011

======= Citation context resolution ======= 

Total expected references:       90125 - 46.38 references per article
Total predicted references:      85898 - 44.21 references per article

Total expected citation contexts:        139835 - 71.97 citation contexts per article
Total predicted citation contexts:       115386 - 59.39 citation contexts per article

Total correct predicted citation contexts:       97290 - 50.07 citation contexts per article
Total wrong predicted citation contexts:         18096 (wrong callout matching, callout missing in NLM, or matching with a bib. ref. not aligned with a bib.ref. in NLM)

Precision citation contexts:     84.32
Recall citation contexts:        69.57
fscore citation contexts:        76.24

======= Fulltext structures ======= 

Evaluation on 1943 random PDF files out of 1941 PDF (ratio 1.0).

======= Strict Matching ======= (exact matches)

===== Field-level results =====

label                accuracy     precision    recall       f1           support

figure_title         96.63        31.47        24.64        27.64        7281   
reference_citation   59.15        57.42        58.68        58.05        134196 
reference_figure     94.74        61.21        65.9         63.47        19330  
reference_table      99.22        83.01        88.39        85.62        7327   
section_title        94.73        76.39        67.76        71.82        27619  
table_title          98.76        57.29        50.29        53.56        3971   

all (micro avg.)     90.54        60.41        60.32        60.36        199724 
all (macro avg.)     90.54        61.13        59.28        60.02        199724 


======== Soft Matching ======== (ignoring punctuation, case and space characters mismatches)

===== Field-level results =====

label                accuracy     precision    recall       f1           support

figure_title         98.52        78.72        61.63        69.13        7281   
reference_citation   61.86        61.68        63.03        62.34        134196 
reference_figure     94.6         61.69        66.41        63.97        19330  
reference_table      99.2         83.19        88.58        85.8         7327   
section_title        95.43        81.25        72.07        76.38        27619  
table_title          99.35        81.87        71.87        76.55        3971   

all (micro avg.)     91.49        65.76        65.67        65.72        199724 
all (macro avg.)     91.49        74.73        70.6         72.36        199724 


====================================================================================

lfoppiano · 2024-06-28T07:25:21Z

I'm attaching all the results as files for completeness:

kermitt2 · 2024-07-02T18:27:39Z

Hi Luca ! I think there is a major issue with the the jvm version indicated by the Kotlin jvmToolchain

kotlin {
        jvmToolchain(17)
    }

The classes and jar become incompatible with jvm lower than 17... So it's not possible to run grobid any more with a jvm 11:

Error: LinkageError occurred while loading main class org.grobid.trainer.NameAddressTrainer
        java.lang.UnsupportedClassVersionError: org/grobid/trainer/NameAddressTrainer has been compiled by a more recent version of the Java Runtime (class file version 61.0), this version of the Java Runtime only recognizes class file versions up to 55.0

In addition, it has blocking consequences for other modules and libraries using grobid which can't be run with jvm 17.

The solution seems to simply make everything to java 11:

    kotlin {
        jvmToolchain(11)
    }

although source compatibility java 11 is not working:

    sourceCompatibility = 1.11
    targetCompatibility = 1.11

gives

lopez@smallbook:~/grobid$ ./gradlew clean install

FAILURE: Build failed with an exception.

* Where:
Build file '/home/lopez/grobid/build.gradle' line: 268

* What went wrong:
Could not determine the dependencies of task ':grobid-core:shadowJar'.
> The new Java toolchain feature cannot be used at the project level in combination with source and/or target compatibility

kermitt2 · 2024-07-02T18:38:28Z

It seems the Java 11 compatibility is broken by the recent changes in FundingAcknowledgementParser:

./gradlew clean install

> Task :grobid-core:compileJava
/home/lopez/grobid/grobid-core/src/main/java/org/grobid/core/engines/FundingAcknowledgementParser.java:193: error: cannot find symbol
                List<OffsetPosition> annotationsPositionTokens = annotations.stream().map(AnnotatedXMLElement::getOffsetPosition).toList();
                                                                                                                                 ^
  symbol:   method toList()
  location: interface Stream<OffsetPosition>
/home/lopez/grobid/grobid-core/src/main/java/org/grobid/core/engines/FundingAcknowledgementParser.java:253: error: cannot find symbol
            .map(AnnotatedXMLElement::getOffsetPosition).toList());
                                                        ^
  symbol:   method toList()
  location: interface Stream<OffsetPosition>
/home/lopez/grobid/grobid-core/src/main/java/org/grobid/core/engines/FundingAcknowledgementParser.java:259: error: cannot find symbol
                .toList();
                ^
  symbol:   method toList()
  location: interface Stream<OffsetPosition>
/home/lopez/grobid/grobid-core/src/main/java/org/grobid/core/engines/FundingAcknowledgementParser.java:266: error: cannot find symbol
                    .toList();
                    ^
  symbol:   method toList()
  location: interface Stream<Integer>
/home/lopez/grobid/grobid-core/src/main/java/org/grobid/core/engines/FundingAcknowledgementParser.java:294: error: cannot find symbol
                            .toList());
                            ^
  symbol:   method toList()
  location: interface Stream<BoundingBox>
/home/lopez/grobid/grobid-core/src/main/java/org/grobid/core/engines/FundingAcknowledgementParser.java:304: error: cannot find symbol
                        String coordsAsString = String.join(";", postMergeBoxes.stream().map(BoundingBox::toString).toList());
                                                                                                                   ^
  symbol:   method toList()
  location: interface Stream<String>
/home/lopez/grobid/grobid-core/src/main/java/org/grobid/core/engines/FundingAcknowledgementParser.java:372: error: cannot find symbol
                    .toList();
                    ^
  symbol:   method toList()
  location: interface Stream<AnnotatedXMLElement>
/home/lopez/grobid/grobid-core/src/main/java/org/grobid/core/engines/FundingAcknowledgementParser.java:410: error: cannot find symbol
                        .toList();
                        ^
  symbol:   method toList()
  location: interface Stream<AnnotatedXMLElement>

lfoppiano · 2024-07-02T21:13:49Z

Hi @kermitt2,
I was going to do it after, with the idea of upgrading to 17 what's needs to be upgraded.

I checked grobid-quantities, software-mentions, datastet and they seems to be compatible with JDK 17. I would say that the old modules may stay with an older version.
In any case, I can help you on updating and testing them. Let me know what I can do.

If you want to keep jdk 11 compatibility, for the second problem, you can replace toList() with .collect(Collectors.toList()).

kermitt2 · 2024-07-03T07:36:23Z

I think it's good to move to JDK 17 in general, but we need to update the other modules first, otherwise this is blocking for users. This is also a general issue for everything that depends on Grobid and for existing production environment where Grobid runs. For example I am currently stuck and failed to upgrade entity-fishing from JDK 8 to JDK 11 and this is very annoying for the users.

I think it's better to ensure JDK 11 compatibility for this release - 17 would be a breaking change for version 0.9.0, especially given that the move to 17 is more for our comfort than providing really actual advantages?

lfoppiano · 2024-07-03T07:51:14Z

I think it's good to move to JDK 17 in general, but we need to update the other modules first, otherwise this is blocking for users. This is also a general issue for everything that depends on Grobid and for existing production environment where Grobid runs. For example I am currently stuck and failed to upgrade entity-fishing from JDK 8 to JDK 11 and this is very annoying for the users.

OK, no problem. I might be to optimistic in thinking that people would have migrated to Docker by now.

Let me help you with entity-fishing. Could you commit and push everything you've done so far on a branch of the project, I will have a look ASAP 😉
If there are other modules that need to be updated please do let me know.

I think it's better to ensure JDK 11 compatibility for this release - 17 would be a breaking change for version 0.9.0, especially given that the move to 17 is more for our comfort than providing really actual advantages?

Sure. 👍

lfoppiano · 2024-07-04T03:57:00Z

@kermitt2 I tested the latest commits 56d351c and it work with JDK 11 on my Apple M2.

kermitt2 · 2024-07-04T06:40:54Z

Thank you very much @lfoppiano it is working also for me now with jdk 11 on Linux (as you, I usually run jdk 17, and it's why I saw the issue only recently).

About entity-fishing, the master has the latest commit if I am not wrong, and running with grobid 0.8.0 and jdk 11 fails because the current version uses an incubator module that has disappeared after jdk 1.8. I did not analyze further which dependency uses this module and if there is a possible replacement in jdk 11.

coveralls · 2024-07-17T13:27:32Z

coverage: 40.751% (-0.04%) from 40.787%
when pulling d15e4d2 on release-0.8.1
into 399ef9d on master.

kermitt2 · 2024-08-22T15:23:59Z

I observed the crashes with more PDF, usually from 10-20K I think, and never getting more than 25-30K PDF.
Both with server running with ./gradlew run or with the Docker image.

When running with gradlew:
JVM 17.0.12 ubuntu build
Linux 6.8.0-40-generic amd64

grobid_client_python with concurrency at 15
grobid service with concurrency unchanged at 10

No crash with JVMtoolkit set to JDK 17 after 700K PDF.

lfoppiano · 2024-08-23T07:35:36Z

Thanks @kermitt2 !
I artificially enlarged the set of PDF documents by simply make three copies and merging them with different names, I did tested then on around 30K documents but the JVM did not crash. 😭

I try again with a larger dataset, I might need some more days to assemble it, meanwhile if you still have the JVM dump somewhere, could you share it?

lfoppiano · 2024-08-26T20:47:49Z

I added an additional 40000 unique articles, to the previous 30000, ran again but could not reproduce the problem. I'm using a 8vCPU with 32Gb of RAM, only CRF with jdk 17, and jdk 11 version of the bytecode. 😭

As alternative, to solve the issue with JDK 11, I could try to run entity-fishing with JDK 17 💦 Are there other modules that require JDK 11?

# Conflicts: # .github/workflows/ci-build-manual-crf.yml

kermitt2 · 2024-08-30T14:55:07Z

Back to the JVM crash problem:

I am running on a machine with Ubuntu 22.04 and only JDK 17 installed (from the Ubuntu packages).
Having jvmToolchain set to 11 (as in this current branch), I have apparently a JDK 11 downloaded and used by gradle for building+running the project, which results in the JVM crashes. I include 2 examples of SIGSEGV errors coming after running Grobid a while - the indicated compiled method where the crash happens is usually always different from one crash to another
As visible in the error report, Gradle has downloaded a JVM version 11 (JRE version: OpenJDK Runtime Environment Temurin-11.0.23+9), which is not installed on my system
When jvmToolchain is set to 17, my installed JVM 17 is used, and there is no crash
When removing jvmToolchain from gradle and indicating sourceCompatibility = 1.11, my installed JVM 17 is used, and there is no crash.

The same behavior happens when using command line ./gradlew run or when using a docker image.

More info on javaToolchains as appearing on my system:

:~/grobid$ ./gradlew -q javaToolchains

 + Options
     | Auto-detection:     Enabled
     | Auto-download:      Enabled

 + Eclipse Adoptium JDK 11.0.23+9
     | Location:           /home/lopez/.gradle/jdks/jdk-11.0.23+9
     | Language Version:   11
     | Vendor:             Eclipse Adoptium
     | Is JDK:             true
     | Detected by:        Auto-provisioned by Gradle

 + Ubuntu JDK 17.0.12+7-Ubuntu-1ubuntu222.04
     | Location:           /usr/lib/jvm/java-17-openjdk-amd64
     | Language Version:   17
     | Vendor:             Ubuntu
     | Is JDK:             true
     | Detected by:        Common Linux Locations

 + Invalid toolchains
     + /usr/lib/jvm/openjdk-17
       | Error:              A problem occurred starting process 'command '/usr/lib/jvm/openjdk-17/bin/java''

It could be that this downloaded JDK 11 is not compatible with this Ubuntu system, following this issue:
A fatal error has been detected by the Java Runtime Environment adoptium/adoptium-support#1156
With the current setting, I guess despite the openjdk:17-jdk-slim base image, it will be a JDK 11 that will be used both for build and run, downloaded by gradle when building the image. So this makes the JDK of the base image useless. I suspect it also creates this issue when running on some systems that do not like this JDK 11 selected and downloaded by gradle javaToolchain.
For this release, we could maybe remove jvmToolchain from gradle? Kotlin is just used for testing?

logs-error-2.txt
logs-error-1.txt

…arget compatibility with kotlin

lfoppiano · 2024-08-30T16:15:58Z

Hi @kermitt2 thanks again, this indeed helps more understanding the problem. In my test I had the JDK 11.0.24 that was automatically downloaded by gradle.

Anyway, I pushed a small change that should solve the issue and allow us to keep everything 🤞, in brief:

in gradle.properties I added a flag to avoid the JDK to download anything automatically (with the javatoochain, this will break the build if toolchain=11, and system=JDK 17),
remove any trace of jvmToolchain, and revert to the old working style sourceCompatibility/targetCompatibility = 1.11
added a section to build the kotlin stuff without using the jvmToolchain, setting there as well the JDK 11

Regarding the observation with docker, in principle we don't use gradle to run the service, so, I'm not sure why of the crashes... 🤔

lfoppiano · 2024-09-03T06:25:52Z

I've ran grobid natively with gradle, built with the latest commits on this branch, on ~70000 documents using JDK 17.0.12 and JDK 11.0.24 (installed with the ubuntu 22.04).
I report no issue with any of them. Maybe the issue was a specific with JDK 11.0.23?

lfoppiano · 2024-09-07T15:35:30Z

I did test also the docker image resulting from my last change and it was not crashing.
I did also investigate the reason why we have the crash when mixing JDK 17/11 in docker, but I cannot find an answer, because the only JDK available is the 17 and is the one that it's used by the script (we use the distribution script, rather than the gradle run), anyway as previously mentioned by you, @kermitt2 we can ship anyway a JDK 17/17 version with docker.

lfoppiano · 2024-09-09T02:57:47Z

For version 0.8.1 I have set up the infrastructure so that I can reproduce the same end 2 end evaluation results :-)
For running it on Linux natively with DL and conda, I needed to use the branch #1010 (not to be added in this release)

kermitt2 · 2024-09-14T07:03:53Z

I made some test with the updated version without jvmToolchain and automatic download of JVM and I had no problem anymore. So with sourceCompatibility/targetCompatibility = 1.11 both my Ubuntu local JDK 17 and 11 work fine on large volume of PDF.

The problem I think was related to the built version of the JDK downloaded by jvmToolchain. It was a JDK 11 distribution from eclipse (OpenJDK Runtime Environment Temurin-11.0.23+9 ), while normally we should use the Ubuntu packaged one for safety. It means jvmToolchain might not be reliable in the future, because it might download one JDK built independently from the linux distribution instead of the one specifically built for the used linux distribution.

For the docker image, I suppose the Grobid project was built with the downloaded JDK 11 (in the first build layer), then Ubuntu JRE 17 from the base image was used in the runtime, so possible clash of JDK here.

I think we're good for the release ? :)

lfoppiano · 2024-09-14T07:05:47Z

Great!!!!!

I can take care of the release, leaving to you only double checking it? 😄

lfoppiano · 2024-09-14T10:50:02Z

Grobid

Jars ✅
Docker CRF ✅
Docker Full ✅ (lfoppiano/grobid:0.8.1-full, but can be re-tagged quickly, see below)

The docker images were built with github actions. I just re-tagged it accordingly. You can save time for build by re-tagging the full image and push it under grobid:

  docker pull lfoppiano/grobid:0.8.1-full
  docker tag lfoppiano/grobid:0.8.1-full grobid/grobid:0.8.1-full
  docker push grobid/grobid:0.8.1-full

Grobid modules

Here the list of grobid modules, I did not included the one that are old, it's hard to maintain everything, @kermitt2 feel free to add if there are other

Pub2TEI ✅
DataStet (updated the DataSeer's version only) ✅
Grobid quantities ✅
Software Mentions
Entity-fishing

Since I cannot control the S3 repository, I usually ship the JARs with the repository as flat dependencies, this requires specify all the dependencies, but I don't know anything better.

@kermitt2 do you want me to update Software Mentions and Entity-fishing as well?

kermitt2 · 2024-09-14T12:13:52Z

@lfoppiano all the artifacts for 0.8.1 have been published on https://grobid.s3.eu-west-1.amazonaws.com/repo
I don't have any errors when building from the DIY repo. Can you give me some info about your errors?

kermitt2 · 2024-09-14T12:16:05Z

@lfoppiano I'll update software-mentions, entity-fishing, DataStet sure

lfoppiano · 2024-09-14T12:20:19Z

I dont' have any particular error, but if I decide to move to a SNAPSHOT version for development I will need to ship the JARs anyway in my repo.

OK. For DataStet I've updated the DataSeer's branch (https://github.com/DataSeer/datastet) cause I don't have access to your repository. I'm not sure I pushed up some PRs already.

kermitt2 · 2024-09-14T12:26:10Z

I dont' have any particular error, but if I decide to move to a SNAPSHOT version for development I will need to ship the JARs anyway in my repo.

Does it mean it is working ? You normally have snapshot versions in your local maven repo for development. These DIY stuff anyway are more for java clients, but you should never need a localLibs/grobid-core-0.8.1.jar added in a project no?

lfoppiano · 2024-09-14T12:34:02Z

Yes it works. :-)

For grobid-quantities and grobid-superconductors I do ship the jars in the repo. In this case, grobid-superconductors also ships the grobid-quantities's JAR.

lfoppiano · 2024-09-14T16:23:03Z

@kermitt2 for DataStet I've implemented few useful things: 1) TEI processing and 2) parallel processing for DataSeerML (I know it's obsolete, but it was needed at DataSeer) 3) refactor the build using the grobid-full image.

I will send a couple of PRs next week. Would be good to have a review (without rush) so that I can consolidate my knowledge on the application for the BSO project 😄

kermitt2 · 2024-09-15T18:27:38Z

@lfoppiano So on my side, I have updated software-mentions, grobid-ner, DataStet (standard), entity-fishing, grobid demo on HuggingFace.

I will study the PR for DataStet carefully because processing a TEI is likely very complicated. Great addition I think.

I notice that the Docker image for Grobid is 2 GB larger than before (compressed size) with 0.8.1. Not that it is a problem I think, but any particular reasons?

lfoppiano · 2024-09-15T19:04:53Z

Great thanks!

The image of the 0.8.1 that I built via github actions is 10.92 Gb (compressed), version 0.8.0 was 10.5 approximately. 🤔 It might be something to do with your build (maybe you used a source with additional models that have been included?).

kermitt2 · 2024-09-15T19:08:20Z

It might be something to do with your build (maybe you used a source with additional models that have been included?).

Ah yes sorry, this is exactly what happened :D

kermitt2 and others added 6 commits February 25, 2024 01:19

support HAL ID from consolidation service

97cd71d

query by hal id only

a7227b5

quick fix for #1113

22d602e

Merge branch 'master' into glutton-0.3

a77114d

update readme and changelog

0a95872

update documentation

b6a2a20

lfoppiano added this to the 0.8.1 milestone Jun 10, 2024

update doc, cleaning, support python env>3.9

4675511

kermitt2 and others added 2 commits June 17, 2024 10:18

use crossref by default for consolidation

f1d703c

fix delft command

c408076

lfoppiano added 2 commits June 18, 2024 06:49

Merge branch 'master' into release-0.8.1

50860c5

Merge branch 'refs/heads/master' into glutton-0.3

ca18bd6

lfoppiano added 2 commits July 3, 2024 16:55

remove jdk 17 only constructs

8210d69

downgrade to jdk 11

56d351c

Merge branch 'refs/heads/master' into release-0.8.1

99f653a

kermitt2 force-pushed the release-0.8.1 branch from 92f0a59 to 99f653a Compare August 9, 2024 15:09

lfoppiano added 5 commits August 27, 2024 22:02

udpate github actions and fix dockerfile

fc70816

update secrets env variables

a2720a3

add suffix when building manually

7025697

Merge branch 'master' into release-0.8.1

5b045af

# Conflicts: # .github/workflows/ci-build-manual-crf.yml

only add the suffix to the latest-crf

e40e30c

lfoppiano added 2 commits August 30, 2024 17:38

avoid gradle to download jdks automatically, and use the old source/t…

8cc06b7

…arget compatibility with kotlin

fix kotlin and java target warning

d15e4d2

lfoppiano mentioned this pull request Sep 2, 2024

GROBID splits sentences, puts second half in a figure description #1160

Open

lfoppiano merged commit 4cad850 into master Sep 14, 2024
10 of 12 checks passed

lfoppiano deleted the release-0.8.1 branch September 14, 2024 12:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preparation release 0.8.1 #1123

Preparation release 0.8.1 #1123

lfoppiano commented Jun 9, 2024

coveralls commented Jun 9, 2024

coveralls commented Jun 16, 2024

coveralls commented Jun 17, 2024

coveralls commented Jun 17, 2024

lfoppiano commented Jun 22, 2024 •

edited

Loading

lfoppiano commented Jun 28, 2024

kermitt2 commented Jul 2, 2024

kermitt2 commented Jul 2, 2024

lfoppiano commented Jul 2, 2024 •

edited

Loading

kermitt2 commented Jul 3, 2024

lfoppiano commented Jul 3, 2024

lfoppiano commented Jul 4, 2024

kermitt2 commented Jul 4, 2024

coveralls commented Jul 17, 2024 •

edited

Loading

kermitt2 commented Aug 22, 2024

lfoppiano commented Aug 23, 2024

lfoppiano commented Aug 26, 2024

kermitt2 commented Aug 30, 2024

lfoppiano commented Aug 30, 2024

lfoppiano commented Sep 3, 2024

lfoppiano commented Sep 7, 2024

lfoppiano commented Sep 9, 2024

kermitt2 commented Sep 14, 2024

lfoppiano commented Sep 14, 2024

lfoppiano commented Sep 14, 2024

kermitt2 commented Sep 14, 2024

kermitt2 commented Sep 14, 2024

lfoppiano commented Sep 14, 2024

kermitt2 commented Sep 14, 2024

lfoppiano commented Sep 14, 2024 •

edited

Loading

lfoppiano commented Sep 14, 2024

kermitt2 commented Sep 15, 2024

lfoppiano commented Sep 15, 2024

kermitt2 commented Sep 15, 2024

Preparation release 0.8.1 #1123

Preparation release 0.8.1 #1123

Conversation

lfoppiano commented Jun 9, 2024

coveralls commented Jun 9, 2024

coveralls commented Jun 16, 2024

coveralls commented Jun 17, 2024

coveralls commented Jun 17, 2024

lfoppiano commented Jun 22, 2024 • edited Loading

lfoppiano commented Jun 28, 2024

kermitt2 commented Jul 2, 2024

kermitt2 commented Jul 2, 2024

lfoppiano commented Jul 2, 2024 • edited Loading

kermitt2 commented Jul 3, 2024

lfoppiano commented Jul 3, 2024

lfoppiano commented Jul 4, 2024

kermitt2 commented Jul 4, 2024

coveralls commented Jul 17, 2024 • edited Loading

kermitt2 commented Aug 22, 2024

lfoppiano commented Aug 23, 2024

lfoppiano commented Aug 26, 2024

kermitt2 commented Aug 30, 2024

lfoppiano commented Aug 30, 2024

lfoppiano commented Sep 3, 2024

lfoppiano commented Sep 7, 2024

lfoppiano commented Sep 9, 2024

kermitt2 commented Sep 14, 2024

lfoppiano commented Sep 14, 2024

lfoppiano commented Sep 14, 2024

Grobid

Grobid modules

kermitt2 commented Sep 14, 2024

kermitt2 commented Sep 14, 2024

lfoppiano commented Sep 14, 2024

kermitt2 commented Sep 14, 2024

lfoppiano commented Sep 14, 2024 • edited Loading

lfoppiano commented Sep 14, 2024

kermitt2 commented Sep 15, 2024

lfoppiano commented Sep 15, 2024

kermitt2 commented Sep 15, 2024

lfoppiano commented Jun 22, 2024 •

edited

Loading

lfoppiano commented Jul 2, 2024 •

edited

Loading

coveralls commented Jul 17, 2024 •

edited

Loading

lfoppiano commented Sep 14, 2024 •

edited

Loading