Skip to content

Commit

Permalink
feat: Synonyms in taxonomized suggestions (#9395)
Browse files Browse the repository at this point in the history
When creating suggestions for taxonomized fields like categories, the API gives back results where one of the synonyms match. However, they are not displayed when they don't include the typed string. This tries to fix that by returning which synonym the input matched to build the suggestions list.
  • Loading branch information
Naruyoko authored Feb 19, 2024
1 parent 058f154 commit 908603a
Show file tree
Hide file tree
Showing 7 changed files with 217 additions and 35 deletions.
23 changes: 18 additions & 5 deletions docs/api/ref/api-v3.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ x-stoplight:
info:
title: Open Food Facts Open API V3 - under development
description: |
As a developer, the Open Food Facts API allows you to get information
and contribute to the products database. You can create great apps to
As a developer, the Open Food Facts API allows you to get information
and contribute to the products database. You can create great apps to
help people make better food choices and also provide data to enhance the database.
termsOfService: 'https://openweathermap.org/terms'
contact:
Expand Down Expand Up @@ -50,12 +50,12 @@ paths:
in: query
name: fields
description: |-
Comma separated list of fields requested in the response.
Comma separated list of fields requested in the response.
Special values:
Special values:
* "none": returns no fields
* "raw": returns all fields as stored internally in the database
* "all": returns all fields except generated fields that need to be explicitly requested such as "knowledge_panels".
* "all": returns all fields except generated fields that need to be explicitly requested such as "knowledge_panels".
Defaults to "all" for READ requests. The "all" value can also be combined with fields like "attribute_groups" and "knowledge_panels".'
responses:
Expand Down Expand Up @@ -239,6 +239,14 @@ paths:
description: Array of sorted strings suggestions in the language requested in the "lc" field.
items:
type: string
matched_synonyms:
type: object
description: |
Dictionary of strings associating canonical names (as seen in suggestions field) with the synonym that best matches the query. An entry is present for all suggestions, even when the synonym is the same with the canonical name.
This value is present only if get_synonyms parameter is present.
additional_properties:
type: string
operationId: get-api-v3-taxonomy_suggestions-taxonomy
description: |-
Open Food Facts uses multilingual [taxonomies](https://wiki.openfoodfacts.org/Global_taxonomies) to normalize entries for categories, labels, ingredients, packaging shapes / materials / recycling instructions and many more fields.
Expand Down Expand Up @@ -282,6 +290,11 @@ paths:
in: query
name: limit
description: 'Maximum number of suggestions. Default is 25, max is 400.'
- schema:
type: string
in: query
name: get_synonyms
description: 'Whether or not to include "matched_synonyms" in the response. Set to 1 to include.'
- schema:
type: string
in: query
Expand Down
47 changes: 43 additions & 4 deletions html/js/product-multilingual.js
Original file line number Diff line number Diff line change
Expand Up @@ -542,14 +542,22 @@ function initializeTagifyInput(el) {
autocomplete: true,
whitelist: get_recents(el.id) || [],
dropdown: {
enabled: 0
enabled: 0,
maxItems: 100
}
});

let abortController;
let debounceTimer;
const timeoutWait = 300;

function updateSuggestions() {
const value = input.state.inputText;
const lc = (/^\w\w:/).exec(value);
const term = lc ? value.substring(lc[0].length) : value;
input.dropdown.show(term);
}

input.on("input", function (event) {
const value = event.detail.value;
input.whitelist = null; // reset the whitelist
Expand All @@ -565,16 +573,47 @@ function initializeTagifyInput(el) {

abortController = new AbortController();

fetch(el.dataset.autocomplete + "&string=" + value, {
fetch(el.dataset.autocomplete + "&string=" + value + "&get_synonyms=1", {
signal: abortController.signal
}).
then((RES) => RES.json()).
then(function (json) {
input.whitelist = json.suggestions;
input.dropdown.show(value); // render the suggestions dropdown
const lc = (/^\w\w:/).exec(value);
let whitelist = Object.values(json.matched_synonyms);
if (lc) {
whitelist = whitelist.map(function (e) {
return {"value": lc + e, "searchBy": e};
});
}
const synonymMap = Object.create(null);
// eslint-disable-next-line guard-for-in
for (const k in json.matched_synonyms) {
synonymMap[json.matched_synonyms[k]] = k;
}
input.synonymMap = synonymMap;
input.whitelist = whitelist;
updateSuggestions(); // render the suggestions dropdown
});
}, timeoutWait);
}
updateSuggestions();
});

input.on("dropdown:show", function() {
if (!input.synonymMap) {
return;
}
$(input.DOM.dropdown).find("div.tagify__dropdown__item").each(function(_,e) {
let synonymName = e.getAttribute("value");
const lc = (/^\w\w:/).exec(synonymName);
if (lc) {
synonymName = synonymName.substring(3);
}
const canonicalName = input.synonymMap[synonymName];
if (canonicalName && canonicalName !== synonymName) {
e.innerHTML += " (&rarr; <i>" + canonicalName + "</i>)";
}
});
});

input.on("add", function (event) {
Expand Down
20 changes: 16 additions & 4 deletions lib/ProductOpener/APITaxonomySuggestions.pm
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,10 @@ sub taxonomy_suggestions_api ($request_ref) {
};

# Options define how many suggestions should be returned, in which format etc.
my $options_ref = {limit => request_param($request_ref, 'limit')};
my $options_ref = {
limit => request_param($request_ref, 'limit'),
get_synonyms => request_param($request_ref, 'get_synonyms')
};

# Validate input parameters

Expand Down Expand Up @@ -122,9 +125,18 @@ sub taxonomy_suggestions_api ($request_ref) {
}
# Generate suggestions
else {

$response_ref->{suggestions}
= [get_taxonomy_suggestions($tagtype, $search_lc, $string, $context_ref, $options_ref)];
my $options_relavant = {%$options_ref};
delete $options_relavant->{get_synonyms};
my @suggestions
= get_taxonomy_suggestions_with_synonyms($tagtype, $search_lc, $string, $context_ref, $options_relavant);
$log->debug("taxonomy_suggestions_api", @suggestions) if $log->is_debug();
$response_ref->{suggestions} = [map {$_->{tag}} @suggestions];
if ($options_ref->{get_synonyms}) {
$response_ref->{matched_synonyms} = {};
foreach (@suggestions) {
$response_ref->{matched_synonyms}->{$_->{tag}} = ucfirst($_->{matched_synonym});
}
}
}

$log->debug("taxonomy_suggestions_api - stop", {request => $request_ref}) if $log->is_debug();
Expand Down
77 changes: 55 additions & 22 deletions lib/ProductOpener/TaxonomySuggestions.pm
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ use Log::Any qw($log);
BEGIN {
use vars qw(@ISA @EXPORT_OK %EXPORT_TAGS);
@EXPORT_OK = qw(
&get_taxonomy_suggestions_with_synonyms
&get_taxonomy_suggestions
); # symbols to export on request
%EXPORT_TAGS = (all => [@EXPORT_OK]);
Expand Down Expand Up @@ -80,9 +81,13 @@ sub load_categories_packagings_stats_for_suggestions() {
return $categories_packagings_stats_for_suggestions_ref;
}

=head2 get_taxonomy_suggestions_with_synonyms ($tagtype, $search_lc, $string, $context_ref, $options_ref )
Generate taxonomy suggestions with matched synonyms information.
=head2 get_taxonomy_suggestions ($tagtype, $search_lc, $string, $context_ref, $options_ref )
Generate taxonomy suggestions.
Generate taxonomy suggestions (without matched synonyms information).
=head3 Parameters
Expand All @@ -107,7 +112,7 @@ Restart memcached if you want fresh results (e.g. when taxonomy are category sta
=cut

sub get_taxonomy_suggestions ($tagtype, $search_lc, $string, $context_ref, $options_ref) {
sub get_taxonomy_suggestions_with_synonyms ($tagtype, $search_lc, $string, $context_ref, $options_ref) {

$log->debug(
"get_taxonomy_suggestions - start",
Expand Down Expand Up @@ -139,7 +144,8 @@ sub get_taxonomy_suggestions ($tagtype, $search_lc, $string, $context_ref, $opti

my @tags = generate_sorted_list_of_taxonomy_entries($tagtype, $search_lc, $context_ref);

my @filtered_tags = filter_suggestions_matching_string(\@tags, $tagtype, $search_lc, $string, $options_ref);
my @filtered_tags
= filter_suggestions_matching_string_with_synonyms(\@tags, $tagtype, $search_lc, $string, $options_ref);
$results_ref = \@filtered_tags;

$log->debug("storing suggestions in cache", {key => $key}) if $log->is_debug();
Expand All @@ -152,6 +158,12 @@ sub get_taxonomy_suggestions ($tagtype, $search_lc, $string, $context_ref, $opti
return @$results_ref;
}

sub get_taxonomy_suggestions ($tagtype, $search_lc, $string, $context_ref, $options_ref) {
return
map {$_->{tag}}
get_taxonomy_suggestions_with_synonyms($tagtype, $search_lc, $string, $context_ref, $options_ref);
}

=head2 generate_sorted_list_of_taxonomy_entries($tagtype, $search_lc, $context_ref)
Generate a sorted list of canonicalized taxonomy entries from which we will generate suggestions
Expand Down Expand Up @@ -312,28 +324,39 @@ sub match_stringids ($stringid, $fuzzystringid, $synonymid) {

# best_match is used to see how well matches the best matching synonym

sub best_match ($stringid, $fuzzystringid, $synonyms_ids_ref) {
sub best_match ($search_lc, $stringid, $fuzzystringid, $synonyms_ref) {

my $best_match = "none";
my $best_type = "none";
my $best_match = 0;

foreach my $synonymid (@$synonyms_ids_ref) {
foreach my $synonym (@$synonyms_ref) {
my $synonymid = get_string_id_for_lang($search_lc, $synonym);
my $match = match_stringids($stringid, $fuzzystringid, $synonymid);
# Prefer to use the earlier ones from the list for when the canonical name has the same match type as a synonym
next if $match eq "none" or $match eq $best_type;
if ($match eq "start") {
# Best match, we can return without looking at the other synonyms
return "start";
$best_type = $match;
$best_match = $synonym;
last;
}
elsif (($match eq "inside")
or (($match eq "fuzzy") and ($best_match eq "none")))
or (($match eq "fuzzy") and ($best_type eq "none")))
{
$best_match = $match;
$best_type = $match;
$best_match = $synonym;
}
}
return $best_match;
return {type => $best_type, match => $best_match};
}

=head2 filter_suggestions_matching_string_with_synonyms ($tags_ref, $tagtype, $search_lc, $string, $options_ref)
Filter a list of potential taxonomy suggestions matching a string with matched synonyms information.
=head2 filter_suggestions_matching_string ($tags_ref, $tagtype, $search_lc, $string, $options_ref)
Filter a list of potential taxonomy suggestions matching a string.
Filter a list of potential taxonomy suggestions matching a string (without matched synonyms information).
By priority, the function returns:
- taxonomy entries that match the input string at the beginning
Expand All @@ -357,7 +380,7 @@ By priority, the function returns:
=cut

sub filter_suggestions_matching_string ($tags_ref, $tagtype, $search_lc, $string, $options_ref) {
sub filter_suggestions_matching_string_with_synonyms ($tags_ref, $tagtype, $search_lc, $string, $options_ref) {

my $original_lc = $search_lc;

Expand Down Expand Up @@ -424,39 +447,43 @@ sub filter_suggestions_matching_string ($tags_ref, $tagtype, $search_lc, $string
my $tag_xx = display_taxonomy_tag("xx", $tagtype, $canon_tagid);

# Build a list of normalized synonyms in the search language and the wildcard xx: language
my @synonyms_ids = map {get_string_id_for_lang($search_lc, $_)} (
my @synonyms = (
@{deep_get(\%synonyms_for, $tagtype, $search_lc, get_string_id_for_lang($search_lc, $tag)) || []},
@{deep_get(\%synonyms_for, $tagtype, "xx", get_string_id_for_lang("xx", $tag_xx)) || []}
);

# check how well the synonyms match the input string
my $best_match = best_match($stringid, $fuzzystringid, \@synonyms_ids);
my $best_match = best_match($search_lc, $stringid, $fuzzystringid, \@synonyms);

$log->debug(
"synonyms_ids for canon_tagid",
"synonyms for canon_tagid",
{
tagtype => $tagtype,
canon_tagid => $canon_tagid,
tag => $tag,
synonym_ids => \@synonyms_ids,
synonyms => \@synonyms,
best_match => $best_match
}
) if $log->is_debug();

my $to_add = {
tag => $tag,
matched_synonym => $best_match->{match}
};
# matching at start, best matches
if ($best_match eq "start") {
push @suggestions, $tag;
if ($best_match->{type} eq "start") {
push @suggestions, $to_add;
# count matches at start so that we can return only if we have enough matches
$suggestions_count++;
last if $suggestions_count >= $limit;
}
# matching inside
elsif ($best_match eq "inside") {
push @suggestions_c, $tag;
elsif ($best_match->{type} eq "inside") {
push @suggestions_c, $to_add;
}
# fuzzy match
elsif ($best_match eq "fuzzy") {
push @suggestions_f, $tag;
elsif ($best_match->{type} eq "fuzzy") {
push @suggestions_f, $to_add;
}
}
}
Expand All @@ -475,4 +502,10 @@ sub filter_suggestions_matching_string ($tags_ref, $tagtype, $search_lc, $string
return @suggestions;
}

sub filter_suggestions_matching_string ($tags_ref, $tagtype, $search_lc, $string, $options_ref) {
return
map {$_->{tag}}
filter_suggestions_matching_string_with_synonyms($tags_ref, $tagtype, $search_lc, $string, $options_ref);
}

1;
6 changes: 6 additions & 0 deletions tests/integration/api_v3_taxonomy_suggestions.t
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,12 @@ my $tests_ref = [
path => '/api/v3/taxonomy_suggestions?tagtype=categories&string=Café&lc=fr',
expected_status_code => 200,
},
{
test_case => 'allergens-string-fr-o-get-synonyms',
method => 'GET',
path => '/api/v3/taxonomy_suggestions?tagtype=allergens&string=o&lc=fr&get_synonyms=1',
expected_status_code => 200,
},
# Packaging suggestions return most popular suggestions first
{
test_case => 'packaging-shapes',
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
{
"errors" : [],
"matched_synonyms" : {
"Arachides" : "Cacahouètes",
"Crustacés" : "Homard",
"Fruits à coque" : "Fruits à coque",
"Gluten" : "Orge",
"Lait" : "Lactose",
"Mollusques" : "Mollusques",
"Moutarde" : "Moutarde",
"Poisson" : "Poisson",
"Soja" : "Soja",
"Œufs" : "Œufs"
},
"status" : "success",
"suggestions" : [
"Gluten",
"Œufs",
"Arachides",
"Crustacés",
"Fruits à coque",
"Lait",
"Mollusques",
"Moutarde",
"Poisson",
"Soja"
],
"warnings" : []
}
Loading

0 comments on commit 908603a

Please sign in to comment.