Word-sense annotation in Emacs

This project intends to finish the word-sense annotations of the WordNet gloss corpus.

Installation

See the INSTALL file.

Customization

sensetion-sense-menu-show-synset-id: set this variable to true to show the synset ids in the list of synsets during the annotation.

Usage

Use M-x sensetion to start annotating. If this is the first time you call it, it might take some time to index the files (you can do other stuff on Emacs while it works); else it’ll read the index files, and start the annotation process.

M-x sensetion will ask you for a lemma and a PoS tag. You can press TAB for completion of the lemma. This will build a buffer with instances of this lemma+pos that have not been annotated yet. Unannotated words will show as red/pink, while previously-annotated tokens will show as green, and newly-annotated tokens will show as blue (all these colors can be customized by the user). For annotated tokens, dark colours indicate confidence in the annotation. You can navigate through annotatable tokens with < and >.

If you wish to sense-tag a token, press / on it. You may select one or many senses – or no sense at all – by pressing the appropriate keyboard key. Senses which are already selected are prefixed by a plus sign (+); when satisfied, press enter/return. If you’d like to quit, press q; note that quitting does not undo anything (if you selected an option and then quit, its effects were already carried out and saved). You can see how many tokens still need to be annotated in the mode-line, next to the sensetion indicator.

If you wish to change a token’s lemma use l. If you wish to say a token is not annotatable (i.e., ignore it), use i. If you wish to say you are unsure about an annotation, use ?.

There is support for word collocations, such as phrasal verbs. The tokens part of a collocation are united by a key, which is shown in their bottom left corner. You can unglob a collocation by pressing u in any token of the collocation. To glob tokens, you mark them with m and finally press g to create the collocation. If you marked a token by mistake, you can unmark tokens by pressing m again. If you try to edit the lemma of a token part of collocation, you will be asked if you would like to edit the token itself or its collocation.

You can move sentences up or down with C-↑ and C-↓. Clustering tokens with the same sense together might be useful.

Note that you can customize most things (like annotation colors) with M-x customize-group RET sensetion.

Command summary

You can call a command using M-x <command-name> or by pressing its keybinding. If you find that there are too many editing and navigation commands to memorize, you just need to memorize the command s, which invokes a menu which includes all other commands.

command name	key binding	description
sensetion	-	Start sensetion annotation process.
sensetion-annotate	-	Start annotating a new lemma/PoS tag.
sensetion-edit-synset	`.`	Edit sentence source data (be careful!)
sensetion-edit-sense	`/`	Annotate sense of selected token at point
sensetion-edit-lemma	`l`	Annotate lemma of token at point
sensetion-edit-ignore	`i`	Marks file as to be ignored in the annotation process
sensetion-edit-unsure	`?`	Marks annotation as done with little confidence
sensetion-toggle-glob-mark	`m`	Mark/Unmark token for globbing
sensetion-glob	`g`	Glob the marked tokens as a new collocation and ask for its lemma
sensetion-unglob	`u`	Unglob collocation of token at point (removes collocation)
sensetion-toggle-scripts	`v`	Show/hide super/subscripts
sensetion-next-selected	`>`	Move point to next selected token
sensetion-previous-selected	`<`	Move point to previous selected token
sensetion-move-line-up	`C-↑`	Move sentence up
sensetion-move-line-down	`C-↓`	Move sentence down

Indexing

If the index goes out of sync, you can force a new indexation with M-x sensetion-make-index.

Saving your work

Any annotations are saved to their files at the moment they are done.
The updated index is saved by default to your annotation directory in the file .sensetion-index when you quit emacs gracefully (that’s why it hangs a little). This path is customizable.

Seeing your work

We recommend (although it is not strictly necessary) setting up a git repository for the annotation files (see any git tutorial if you are unfamiliar with it). Use

git diff --color-words=.

(note the period .) to see the changes you made after the previous commit.

In any case, please back up your work!

Report bugs

Give clear instructions to reproduce the bug;
Call M-x toggle-debug-on-error, reproduce the bug, and send the backtrace with your report (you may open an issue).

FAQ – Frequently Asked Questions

How can I copy and paste annotation text without super/subscripts?

You can assign the function org-copy-visible to your copying command key in the annotation buffers by adding these two lines to your sensetion use-package declaration:

(use-package sensetion
   :commands sensetion
+  :bind (:map sensetion-mode-map
+              ("M-w" . org-copy-visible))

Annotation format

We convert the original XML files to property lists, whose grammar can be found in glosstag/grammar.txt..

The script that converts the original XML WN gloss corpus is at convert.sh. To re-run the conversion:

download the WordNet gloss corpus;
- you can validate it with the xmllint utility from the libxml package:
```
xmllint --dtdvalid dtd/glosstag.dtd merged/*.xml
        
```
download and setup sbcl (although any common lisp implementation should work);
setup quicklisp;
create a symbolic link from the glosstag directory inside the quicklisp/local-projects/.
run:
```
./convert.sh ~/WordNet-3.0/glosstag/merged/ DESTINATION-PATH/
    
```
where the first parameter is a directory is from the gloss corpus archive, the last parameter is the directory is where you want to put the files. (Use absolute paths if you have problems with the command.) Note that the trailing slash in glosstag/ is important. You must have the glosstag DTD in the same directory as the annotation files.

Status

Under heavy development – user interface is unstable, and the code is still to be generalized so that it can be made useful for annotation of other corpora (maybe even of other stuff).

Name		Name	Last commit message	Last commit date
Latest commit History 158 Commits
.emacs.d		.emacs.d
glosstag		glosstag
static		static
Dockerfile		Dockerfile
INSTALL		INSTALL
LICENSE		LICENSE
README		README
convert.lisp		convert.lisp
convert.sh		convert.sh
sensetion-data.el		sensetion-data.el
sensetion-edit.el		sensetion-edit.el
sensetion-utils.el		sensetion-utils.el
sensetion.el		sensetion.el

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Word-sense annotation in Emacs

Installation

Customization

Usage

Command summary

Indexing

Saving your work

Seeing your work

Report bugs

FAQ – Frequently Asked Questions

How can I copy and paste annotation text without super/subscripts?

Annotation format

Status

About

Releases

Packages

Languages

License

alexandretessarollo/sensetion.el

Folders and files

Latest commit

History

Repository files navigation

Word-sense annotation in Emacs

Installation

Customization

Usage

Command summary

Indexing

Saving your work

Seeing your work

Report bugs

FAQ – Frequently Asked Questions

How can I copy and paste annotation text without super/subscripts?

Annotation format

Status

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages