This project intends to finish the word-sense annotations of the WordNet gloss corpus.
See the INSTALL
file.
- sensetion-sense-menu-show-synset-id
- set this variable to
true
to show the synset ids in the list of synsets during the annotation.
Use M-x sensetion
to start annotating. If this is the first time you
call it, it might take some time to index the files (you can do other
stuff on Emacs while it works); else it’ll read the index files, and
start the annotation process.
M-x sensetion
will ask you for a lemma and a PoS tag. You can press
TAB
for completion of the lemma. This will build a buffer with
instances of this lemma+pos that have not been annotated
yet. Unannotated words will show as red/pink, while
previously-annotated tokens will show as green, and newly-annotated
tokens will show as blue (all these colors can be customized by the
user). For annotated tokens, dark colours indicate confidence in the
annotation. You can navigate through annotatable tokens with <
and
>
.
If you wish to sense-tag a token, press /
on it. You may select one
or many senses – or no sense at all – by pressing the appropriate
keyboard key. Senses which are already selected are prefixed by a plus
sign (+
); when satisfied, press enter/return. If you’d like to quit,
press q
; note that quitting does not undo anything (if you selected
an option and then quit, its effects were already carried out and
saved). You can see how many tokens still need to be annotated in the
mode-line, next to the sensetion
indicator.
If you wish to change a token’s lemma use l
. If you wish to say a
token is not annotatable (i.e., ignore it), use i
. If you wish to
say you are unsure about an annotation, use ?
.
There is support for word collocations, such as phrasal verbs. The
tokens part of a collocation are united by a key, which is shown in
their bottom left corner. You can unglob a collocation by pressing u
in any token of the collocation. To glob tokens, you mark them with
m
and finally press g
to create the collocation. If you marked a
token by mistake, you can unmark tokens by pressing m
again. If you
try to edit the lemma of a token part of collocation, you will be
asked if you would like to edit the token itself or its collocation.
You can move sentences up or down with C-↑
and C-↓
. Clustering
tokens with the same sense together might be useful.
Note that you can customize most things (like annotation colors) with
M-x customize-group RET sensetion
.
You can call a command using M-x <command-name>
or by pressing its
keybinding. If you find that there are too many editing and navigation
commands to memorize, you just need to memorize the command s
, which
invokes a menu which includes all other commands.
command name | key binding | description |
---|---|---|
sensetion | - | Start sensetion annotation process. |
sensetion-annotate | - | Start annotating a new lemma/PoS tag. |
sensetion-edit-synset | . | Edit sentence source data (be careful!) |
sensetion-edit-sense | / | Annotate sense of selected token at point |
sensetion-edit-lemma | l | Annotate lemma of token at point |
sensetion-edit-ignore | i | Marks file as to be ignored in the annotation process |
sensetion-edit-unsure | ? | Marks annotation as done with little confidence |
sensetion-toggle-glob-mark | m | Mark/Unmark token for globbing |
sensetion-glob | g | Glob the marked tokens as a new collocation and ask for its lemma |
sensetion-unglob | u | Unglob collocation of token at point (removes collocation) |
sensetion-toggle-scripts | v | Show/hide super/subscripts |
sensetion-next-selected | > | Move point to next selected token |
sensetion-previous-selected | < | Move point to previous selected token |
sensetion-move-line-up | C-↑ | Move sentence up |
sensetion-move-line-down | C-↓ | Move sentence down |
If the index goes out of sync, you can force a new indexation with
M-x sensetion-make-index
.
- Any annotations are saved to their files at the moment they are done.
- The updated index is saved by default to your annotation directory
in the file
.sensetion-index
when you quit emacs gracefully (that’s why it hangs a little). This path is customizable.
We recommend (although it is not strictly necessary) setting up a git repository for the annotation files (see any git tutorial if you are unfamiliar with it). Use
git diff --color-words=.
(note the period .
) to see the changes you made after the previous
commit.
In any case, please back up your work!
- Give clear instructions to reproduce the bug;
- Call
M-x toggle-debug-on-error
, reproduce the bug, and send the backtrace with your report (you may open an issue).
You can assign the function org-copy-visible
to your copying command
key in the annotation buffers by adding these two lines to your
sensetion use-package
declaration:
(use-package sensetion
:commands sensetion
+ :bind (:map sensetion-mode-map
+ ("M-w" . org-copy-visible))
We convert the original XML files to property lists, whose grammar can
be found in glosstag/grammar.txt
..
The script that converts the original XML WN gloss corpus is at
convert.sh
. To re-run the conversion:
- download the WordNet gloss corpus;
- you can validate it with the
xmllint
utility from thelibxml
package:xmllint --dtdvalid dtd/glosstag.dtd merged/*.xml
- you can validate it with the
- download and setup sbcl (although any common lisp implementation should work);
- setup quicklisp;
- create a symbolic link from the
glosstag
directory inside thequicklisp/local-projects/
. - run:
./convert.sh ~/WordNet-3.0/glosstag/merged/ DESTINATION-PATH/
where the first parameter is a directory is from the gloss corpus archive, the last parameter is the directory is where you want to put the files. (Use absolute paths if you have problems with the command.) Note that the trailing slash in
glosstag/
is important. You must have the glosstag DTD in the same directory as the annotation files.
Under heavy development – user interface is unstable, and the code is still to be generalized so that it can be made useful for annotation of other corpora (maybe even of other stuff).