-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New metadata fields for work entity #210
Comments
Good point! And actually as a first step, I think it'd be helpful if we tracked somewhere what fields we already have covered vs. those that are new. As a naive approach, this lists all fields from library(openalexR)
tbl <- oa_fetch(id = "W2755950973")
lst <- oa_fetch(id = "W2755950973", output = "list")
sort(names(lst)[!names(lst) %in% colnames(tbl)])
#> [1] "abstract_inverted_index" "apc_list"
#> [3] "apc_paid" "authorships"
#> [5] "best_oa_location" "biblio"
#> [7] "cited_by_percentile_year" "corresponding_author_ids"
#> [9] "corresponding_institution_ids" "countries_distinct_count"
#> [11] "created_date" "fulltext_origin"
#> [13] "has_fulltext" "indexed_in"
#> [15] "institutions_distinct_count" "keywords"
#> [17] "locations" "locations_count"
#> [19] "mesh" "ngrams_url"
#> [21] "open_access" "primary_location"
#> [23] "primary_topic" "referenced_works_count"
#> [25] "sustainable_development_goals" "title"
#> [27] "topics" "type_crossref"
#> [29] "updated_date" This of course doesn't mean we're missing coverage for these fields - some of them have been renamed in the df (e.g., So as a preliminary, maybe it's worth introducing something to internally track covered fields, like: #' @keywords internal
covered_fields <- c("title", "authorships", ...) Then we (or at least I) can get a clearer picture of what we're missing and have a programmatic way to track the introduction of new fields. I can take a stab at this, then reconvene here to decide how to deal with the new fields? For example, it immediately jumps out to me that Original: lst$apc_list
#> $value
#> [1] 3680
#>
#> $currency
#> [1] "USD"
#>
#> $value_usd
#> [1] 3680
#>
#> $provenance
#> [1] "doaj"
lst$apc_paid
#> $value
#> [1] 3680
#>
#> $currency
#> [1] "USD"
#>
#> $value_usd
#> [1] 3680
#>
#> $provenance
#> [1] "doaj" Formatted: rbind.data.frame(
c(type = "list", lst$apc_list),
c(type = "paid", lst$apc_paid)
)
#> type value currency value_usd provenance
#> 1 list 3680 USD 3680 doaj
#> 2 paid 3680 USD 3680 doaj |
I totally agree |
Coverage now tracked in #211. TODO: we need to agree on what other fields we should export in the dataframe (we now have topics, apc already). |
@trangdata
@yjunechoe
Recently, OA has added a lot of new metadata for entity work.
In particular, the API now also reports info regarding keywords, topics, grants that funded the research, APC paid, etc.
At the moment the only way to access this information is to use the "list" format.
TO DO:
Modify the works2df() function so that the data frame also includes this new metadata. This way even using the "tibble" or "data.frame" format will output this new metadata.
The text was updated successfully, but these errors were encountered: