-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement po_create()/po_update() for creating/updating translations #235
Implement po_create()/po_update() for creating/updating translations #235
Conversation
Changed it to add or update, depending on whether or not the |
I think this is better as |
There are a couple of gettext tools I would do well to integrate better with / provide good wrappers for... |
R/msgmerge.R
Outdated
#' new messages are added; and translations for deleted message are marked | ||
#' as deprecated and moved to the bottom of the file. | ||
#' | ||
#' @param lang Language identifiers. These are typically two letters (e.g. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This basically repeats the description in ?translate_package
... @inheritParams
doesn't feel quite right because of how detailed we want to be... Maybe we need a ?language-code
.Rd file we can refer to? @param languages Language identifiers. See [language-code] for details.
?
This comment has been minimized.
This comment has been minimized.
Conflicts: DESCRIPTION NAMESPACE
Merge commit '3849f477c94b95af34764c5c96b7c0fce284eedd'
And fundamentally change approach
Ok, reimplemented basically from scratch; I'll add a bunch of tests shortly. |
Bringing in system2() code from #257
And fix the bugs thus revealed
tests/testthat/_snaps/po_create.md
Outdated
po_create("jp", verbose = TRUE) | ||
Message <simpleMessage> | ||
Updating 'jp' R translation | ||
Running msgmerge on './po/R-jp.po' succeeded: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied this from #257, but it feels a bit wordy to me. What do you think about not showing the call by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm it's coming from verbose=TRUE
(now the default), are you suggesting we flip the default, or that we implement verbose=0,1,2,...
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's necessary to routinely display a message for success. Adding levels of verbosity is unlikely to be worth the effort IMO, so I'd be in favour of verbose = FALSE
by default and not passing in verbose
from the wrapping function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that's not quite right; I think on success we want to see the stdout/stderr from the command line tool, but not the call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tweaked the code to behave the way that I want just to give you a concrete proposal to look at. Let me know what you think.
R/write_po_file.R
Outdated
@@ -300,7 +300,7 @@ format.po_metadata = function(x, template = FALSE, use_plurals = FALSE, ...) { | |||
x$email = "EMAIL@ADDRESS" | |||
x$language = '' | |||
x$language_team = "LANGUAGE <[email protected]>" | |||
x$charset = 'CHARSET' | |||
x$charset = 'UTF-8' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why this was CHARSET before, but I was getting an error from tools::checkPoFile()
that it was failing to iconv
CHARSET
to UTF-8
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm note that .pot is not mentioned in ?tools::checkPoFile
. CHARSET
shows up in templates in base:
Here's what I'm seeing in the xgettext
docs:
https://www.gnu.org/software/gettext/manual/gettext.html
MIME-Version, Content-Type, Content-Transfer-Encoding
These values are set according to the content of the POT file and the current locale. If the POT file contains charset=UTF-8, it means that the POT file contains non-ASCII characters, and we keep the UTF-8 encoding. Otherwise, when the POT file is plain ASCII, we use the locale’s encoding.
and
Content-Type
Replace ‘CHARSET’ with the character encoding used for your language, in your locale, or UTF-8. This field is needed for correct operation of the msgmerge and msgfmt programs, as well as for users whose locale’s character encoding differs from yours (see Charset conversion).
In the gettext sources, here is where xgettext
initially populates the default CHARSET
:
Here write-po
treats CHARSET
as ASCII:
This snippet makes me thing setting UTF-8 is probably OK, since the default CHARSET
has the probably-in-general faulty assumption that the strings in the .pot file are all-ASCII:
That said, current r-devel .pot are all-ASCII:
grep -rP "[\x80-\xFF]" ~/svn/R-devel/ --include=*.pot
# <empty>
CRAN would require this to hold true for all packages at the R level, and probably the C level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably patch tools::checkPoTools
to accept the CHARSET
➡️ ASCII
logic, but I'm not sure that would solve all of the issues of using checkPoTools()
on .pot files when it was designed for .po.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'll back this change out here and we can discuss more thoroughly in a separate issue. I think potools should assume all .pot
and .po
files are UTF-8 and then fix any downstream problems that causes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, backing out this change doesn't break any of my tests (or interactive usage) so definitely doesn't need to be here.
R/po_create.R
Outdated
#' | ||
#' `po_create()` creates a new `po/{languages}.po` containing the messages to be | ||
#' translated. If a translation already exists, it'll be updated with any | ||
#' changes to the `.pot` since it was last touched. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we clarify what happens to existing translations here too? Details may be overkill, but mentioning fuzzying seems appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll put most of the details in po_update()
and then link to there from here.
I think the updating here is mostly incidental, just because if you use msginit
without checking, it will reset the existing translations, which I don't think you ever want. But at the same time, the main point of this function is not the updating, so I don't want to draw too much attention to it.
R/write_po_file.R
Outdated
@@ -300,7 +300,7 @@ format.po_metadata = function(x, template = FALSE, use_plurals = FALSE, ...) { | |||
x$email = "EMAIL@ADDRESS" | |||
x$language = '' | |||
x$language_team = "LANGUAGE <[email protected]>" | |||
x$charset = 'CHARSET' | |||
x$charset = 'UTF-8' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm note that .pot is not mentioned in ?tools::checkPoFile
. CHARSET
shows up in templates in base:
Here's what I'm seeing in the xgettext
docs:
https://www.gnu.org/software/gettext/manual/gettext.html
MIME-Version, Content-Type, Content-Transfer-Encoding
These values are set according to the content of the POT file and the current locale. If the POT file contains charset=UTF-8, it means that the POT file contains non-ASCII characters, and we keep the UTF-8 encoding. Otherwise, when the POT file is plain ASCII, we use the locale’s encoding.
and
Content-Type
Replace ‘CHARSET’ with the character encoding used for your language, in your locale, or UTF-8. This field is needed for correct operation of the msgmerge and msgfmt programs, as well as for users whose locale’s character encoding differs from yours (see Charset conversion).
In the gettext sources, here is where xgettext
initially populates the default CHARSET
:
Here write-po
treats CHARSET
as ASCII:
This snippet makes me thing setting UTF-8 is probably OK, since the default CHARSET
has the probably-in-general faulty assumption that the strings in the .pot file are all-ASCII:
That said, current r-devel .pot are all-ASCII:
grep -rP "[\x80-\xFF]" ~/svn/R-devel/ --include=*.pot
# <empty>
CRAN would require this to hold true for all packages at the R level, and probably the C level?
R/write_po_file.R
Outdated
@@ -300,7 +300,7 @@ format.po_metadata = function(x, template = FALSE, use_plurals = FALSE, ...) { | |||
x$email = "EMAIL@ADDRESS" | |||
x$language = '' | |||
x$language_team = "LANGUAGE <[email protected]>" | |||
x$charset = 'CHARSET' | |||
x$charset = 'UTF-8' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably patch tools::checkPoTools
to accept the CHARSET
➡️ ASCII
logic, but I'm not sure that would solve all of the issues of using checkPoTools()
on .pot files when it was designed for .po.
tests/testthat/_snaps/po_create.md
Outdated
po_create("jp", verbose = TRUE) | ||
Message <simpleMessage> | ||
Updating 'jp' R translation | ||
Running msgmerge on './po/R-jp.po' succeeded: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm it's coming from verbose=TRUE
(now the default), are you suggesting we flip the default, or that we implement verbose=0,1,2,...
?
R/utils.R
Outdated
@@ -167,3 +167,19 @@ is_outdated <- function(src, dst) { | |||
} | |||
|
|||
is_testing = function() identical(Sys.getenv("TESTTHAT"), "true") | |||
|
|||
local_test_package <- function(..., .envir = parent.frame()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
love the look of this, and looks like a great redesign of generating test packages (the current suite is pretty unwieldy/arbitrarily organized). a good companion would be local_translation_conn()
to keep translations right next to where they'll be used too.
but feels like it should be in tests/testthat/helper.R
, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've generally been moving away from using helper files, because it seems more natural just to keep all code in one place. Additionally, as these local_
type helpers grow more complex, it's not crazy that you might also want to test them.
That said, I don't feel particularly strongly about it, and since you already have helper.R
, that's a more sensible home, so I'll move it there.
Conflicts: NAMESPACE R/po_compile.R R/po_extract.R man/po_compile.Rd man/po_extract.Rd
} | ||
|
||
} | ||
po_prefix <- function(type = c("R", "src")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the one place this will break is for base
, which uses R-base.pot
for R and R.pot
for src. I'll just add a TODO for now to make sure that case is handled later.
#' @export | ||
po_create <- function(languages, dir = ".", verbose = !is_testing()) { | ||
package <- get_desc_data(dir, "Package") | ||
po_files <- po_language_files(languages, dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be nice to warn()
here if e.g. R-pkg.pot is present but pkg.pot is not & the package has a src/
directory.
OTOH, if the translation is handled entirely by potools
, this should only happen if src/
doesn't have any messages to translate, right? So maybe it will be a warning highly prone to false positives...
skipping for now, but file a follow-up issue if you think the warning is worthwhile.
Looking great, thanks! |
I don't love the name
tr_add()
, and it's making me doubt that the developer/translator distinction is that useful, because this function might be run by the developer or an R-packages savvy translator.I tweaked the
.Rproj
to make my usual roxygen2 workflow a little smoother; let me know if you want to revert.I'm little surprised that potools doesn't already use
msginit
, so maybe I should instead copy the approach used intranslate_package()
?