-
Notifications
You must be signed in to change notification settings - Fork 36
Logging
A log file is created every time the conversion trigger is pulled. It is placed in the conversion cockpit's doc/log/
directory.
$CSV2RDF4LOD_HOME/bin/convert.sh
and $CSV2RDF4LOD_HOME/bin/convert-aggregate.sh
always log messages to:
CSV2RDF4LOD_LOG="doc/logs/csv2rdf4lod_log_e${eID}_`date +%Y-%m-%dT%H_%M_%S`.txt"
The number of logs in this directory is asserted as conversion:num_invocation_logs in the aggregated data dump Turtle file publish/<dataset-id>-<version-id>.ttl
cr-latest-logs.sh
can be run from the conversion cockpit to list all files produced after the last time the conversion trigger was pulled. This can be used to quickly look for certain types of errors. For example, the following commands looks for java Exceptions. The first two commands are shown to indicate where in the conversion root cr-latest-logs.sh
is run.
bash-3.2$ cr-pwd.sh
source/hub-healthdata-gov/hospital-compare/version/2012-Jul-17
bash-3.2$ cr-pwd-type.sh
cr:conversion-cockpit
bash-3.2$ grep -B1 Exception `cr-latest-logs.sh`
doc/logs/csv2rdf4lod_log_e1_2012-09-25T19_35_44.txt-manual/HQI_STATE_HCAHPS_MSR.csv.global.e1.params.ttl
doc/logs/csv2rdf4lod_log_e1_2012-09-25T19_35_44.txt:org.openrdf.rio.RDFParseException: Namespace prefix 'agg' used but not defined [line 150]
--
doc/logs/csv2rdf4lod_log_e1_2012-09-25T19_35_44.txt-manual/HQI_STATE_HCAHPS_MSR.csv.global.e1.params.ttl
doc/logs/csv2rdf4lod_log_e1_2012-09-25T19_35_44.txt:org.openrdf.rio.RDFParseException: Namespace prefix 'agg' used but not defined [line 150]
--
doc/logs/csv2rdf4lod_log_e1_2012-09-25T19_35_49.txt-manual/HQI_US_NATIONAL_HCAHPS_MSR.csv.global.e1.params.ttl
doc/logs/csv2rdf4lod_log_e1_2012-09-25T19_35_49.txt:org.openrdf.rio.RDFParseException: Namespace prefix 'agg' used but not defined [line 135]
--
doc/logs/csv2rdf4lod_log_e1_2012-09-25T19_35_49.txt-manual/HQI_US_NATIONAL_HCAHPS_MSR.csv.global.e1.params.ttl
doc/logs/csv2rdf4lod_log_e1_2012-09-25T19_35_49.txt:org.openrdf.rio.RDFParseException: Namespace prefix 'agg' used but not defined [line 135]
Although having the number of logs around is useful, they can get big. We can trim them down so they take up less space, but are still around to indicate the amount of effort put into enhancing it.
When we are at the [data root](csv2rdf4lod automation data root):
$ cr-pwd.sh
source/
We can run $CSV2RDF4LOD_HOME/bin/util/cr-trim-logs.sh and skim through the sizes of the logs, and the size it will become if we trim it:
$ cr-trim-logs.sh
...
...
========== source/data-gov/1554/version/2011-Jan-12 ========================================
319M doc/logs total
doc/logs/csv2rdf4lod_log_e1_2011-03-29T13_37_26.txt 24 -> 12
doc/logs/csv2rdf4lod_log_e1_2011-03-29T13_39_16.txt 24 -> 12
doc/logs/csv2rdf4lod_log_e1_2011-03-29T13_42_29.txt 28 -> 12
doc/logs/csv2rdf4lod_log_e1_2011-03-29T13_42_44.txt 328 -> 12
doc/logs/csv2rdf4lod_log_e1_2011-03-29T13_44_41.txt 24 -> 12
doc/logs/csv2rdf4lod_log_e1_2011-03-29T13_44_48.txt 328 -> 12
doc/logs/csv2rdf4lod_log_e1_2011-03-29T13_59_12.txt 19344 -> 4
doc/logs/csv2rdf4lod_log_e1_2011-03-29T14_00_30.txt 19344 -> 4
doc/logs/csv2rdf4lod_log_e1_2011-03-29T14_12_31.txt 19344 -> 4
doc/logs/csv2rdf4lod_log_e1_2011-03-29T14_14_59.txt 28 -> 4
doc/logs/csv2rdf4lod_log_e1_2011-03-29T14_17_32.txt 28 -> 4
doc/logs/csv2rdf4lod_log_e1_2011-03-29T14_19_19.txt 28 -> 4
doc/logs/csv2rdf4lod_log_e1_2011-03-29T14_21_26.txt 28 -> 4
doc/logs/csv2rdf4lod_log_e1_2011-03-29T14_24_47.txt 28 -> 12
doc/logs/csv2rdf4lod_log_e1_2011-03-29T14_25_10.txt 28 -> 4
doc/logs/csv2rdf4lod_log_e1_2011-03-29T14_34_23.txt 28 -> 4
doc/logs/csv2rdf4lod_log_e1_2011-03-29T14_35_41.txt 19904 -> 4
doc/logs/csv2rdf4lod_log_e1_2011-03-29T14_41_59.txt 19908 -> 8
doc/logs/csv2rdf4lod_log_e1_2011-03-29T14_44_37.txt 4
doc/logs/csv2rdf4lod_log_e1_2011-03-29T14_51_12.txt 19908 -> 8
doc/logs/csv2rdf4lod_log_e1_2011-03-29T14_52_55.txt 19916 -> 8
doc/logs/csv2rdf4lod_log_e1_2011-03-29T15_08_18.txt 4304 -> 8
doc/logs/csv2rdf4lod_log_e1_2011-03-29T15_08_38.txt 19916 -> 8
doc/logs/csv2rdf4lod_log_e1_2011-03-29T15_11_07.txt 40 -> 8
doc/logs/csv2rdf4lod_log_e1_2011-03-29T15_12_19.txt 40 -> 8
doc/logs/csv2rdf4lod_log_e1_2011-03-29T15_14_22.txt 40 -> 8
doc/logs/csv2rdf4lod_log_e1_2011-03-29T15_56_35.txt 44 -> 8
doc/logs/csv2rdf4lod_log_e1_2011-04-12T08_47_21.txt 44 -> 8
doc/logs/csv2rdf4lod_log_raw_2011-03-21T14_20_41.txt 40 -> 12
Note: did not trim logs. Use cr-trim-logs.sh -w to modify doc/logs/*.txt
...
...
We can see the sizes of the logs, in case we want to verify that they ARE taking up a lot of space:
$ cr-trim-logs.sh | grep total
604K doc/logs total
116K doc/logs total
319M doc/logs total
216K doc/logs total
136K doc/logs total
16K doc/logs total
12K doc/logs total
24K doc/logs total
24K doc/logs total
When you're ready to trim the files (and save space), use -w
to write:
$ cr-trim-logs.sh -w | grep total
604K doc/logs total
116K doc/logs total
319M doc/logs total
216K doc/logs total
136K doc/logs total
16K doc/logs total
12K doc/logs total
24K doc/logs total
24K doc/logs total
Then you can see the new smaller sizes:
$ cr-trim-logs.sh | grep total
552K doc/logs total
96K doc/logs total
160M doc/logs total # This is still huge b/c it was committed to svn before trimming. Bad!
136K doc/logs total
136K doc/logs total
16K doc/logs total
12K doc/logs total
12K doc/logs total
12K doc/logs total
These are in tmp b/c Virtuoso needs permission to write.
$CSV2RDF4LOD_HOME/bin/util/virtuoso/vload stores logs to $CSV2RDF4LOD_HOME/tmp/vload/input-files/*.log
with the latest at $CSV2RDF4LOD_HOME/tmp/vload/input-files/latest.log
. More properly configured installs will log to $conversion_root/$CSV2RDF4LOD_PUBLISH_OUR_SOURCE_ID/$me/version/$versionID/doc/logs
, e.g. /srv/logd/data/source/twc-rpi-edu/cr-vload/version/17f34aca66e186e543d3f1a649fdb0fe/doc/logs/
.
$CSV2RDF4LOD_HOME/bin/util/virtuoso/vdelete stores logs to $CSV2RDF4LOD_HOME/tmp/vdelete/*.log
with the latest at $CSV2RDF4LOD_HOME/tmp/vdelete/latest.log
.
populate-endpoint.sh needs to be generalized beyond LOGD. It loads metadata from all conversions into a named graph and caches query results to static files to reduce endpoint load when supporting a web site.
${CSV2RDF4LOD_HOME}/log/populate-endpoint.sh/*.log
In debugging situations, I might have you turn this on. It should rarely be needed.
The Java implementation uses java.util.logging
to log.
Turning logging on is parameterized by the CSV2RDF4LOD_CONVERT_DEBUG_LEVEL environment variable and takes affect within $CSV2RDF4LOD_HOME/bin/convert.sh
:
javaprops="-Djava.util.logging.config.file=$CSV2RDF4LOD_HOME/bin/logging/finest.properties"
#javaprops=""
So,
$ export CSV2RDF4LOD_CONVERT_DEBUG_LEVEL=finer
other valid values include fine
, finer
, and finest
.
CSV2RDF4LOD_HOME/bin/logging/
contains fine
, finer
, and finest.properties
.
(If you REALLY wanna get your hands dirty, add your.properties
in CSV2RDF4LOD_HOME/bin/logging/
and set your DEBUG_LEVEL to your
.)