Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bind_textdomain_codeset() return value updates #4311

Open
orlitzky opened this issue Dec 15, 2024 · 9 comments
Open

bind_textdomain_codeset() return value updates #4311

orlitzky opened this issue Dec 15, 2024 · 9 comments
Labels
bug Documentation contains incorrect information

Comments

@orlitzky
Copy link

bind_textdomain_codeset() is documented to return a string on success, and false on failure. But there is one success case that also returns false. If you query the codeset for a domain that has not been explicitly set (yet), you will also get false:

php > var_dump(bind_textdomain_codeset("foo", NULL));
bool(false)

This is because the C function returns NULL to indicate that the locale's codeset will be used. Quoting from https://pubs.opengroup.org/onlinepubs/9799919799.2024edition/functions/bind_textdomain_codeset.html

If codeset is a null pointer and domainname is a non-empty string, bind_textdomain_codeset() shall return the current codeset for the named domain, or a null pointer if a codeset has not yet been set.

I bring this up because musl only supports UTF-8, and always returns NULL from its bind_textdomain_codeset(). Its PHP counterpart therefore always appears to fail, when it is working as... well, not quite expected, but as documented.

To summarize:

  • false will also be returned if no codeset has been explicitly set for the domain
  • this function always returns false under musl libc where the only supported codeset is UTF-8
@cmb69
Copy link
Member

cmb69 commented Dec 15, 2024

POSIX is pretty clear here:

If domainname is a null pointer, textdomain() shall return a pointer to the string containing the current text domain.

and

If domainname is a null pointer or an empty string, bindtextdomain() shall make no changes and return a null pointer without changing errno.

If musl (or any other implementation) return NULL here, they are not conforming to the specification. And that errno is unchanged, is to be expected. What is PHP supposed to do? Just say "yeah, the call was successful" would be wrong.

@orlitzky
Copy link
Author

I'm only talking about bind_textdomain_codeset. There's a separate paragraph about its return value:

If codeset is a null pointer and domainname is a non-empty string, bind_textdomain_codeset() shall return the current codeset for the named domain, or a null pointer if a codeset has not yet been set.

@cmb69
Copy link
Member

cmb69 commented Dec 15, 2024

Okay, but than musl could (and likely should) return UTF-8.

@orlitzky
Copy link
Author

Okay, but than musl could (and likely should) return UTF-8.

This was my first thought, too, but if you want to entertain some wild speculation, I think the reasoning is:

  • Consistently returning "UTF-8" would do something equivalent to the right thing, but not quite the right thing itself, in the case where the codeset has not been set yet.
  • If you are handling the codeset-not-set-yet NULL correctly, the current behavior of always returning NULL is also "equivalent to correct"
  • Keeping track of whether or not the codeset has been set yet in some global state would be extra headache when, in either case, we are doing something equivalent to correct
  • NULL is simpler than "UTF-8"

So, we wind up with a consistent NULL return.

I think this would be a lot more sensible after php/php-src#17163, but I filed them separately because changing the implementation is a lot harder to do than updating the docs to reflect reality. FWIW you can get a "failure" from musl, but only via errno. This is the entire implementation:

char *bind_textdomain_codeset(const char *domainname, const char *codeset)
{
        if (codeset && strcasecmp(codeset, "UTF-8"))
                errno = EINVAL;
        return NULL;
}

@cmb69
Copy link
Member

cmb69 commented Dec 15, 2024

bind_textdomain_codeset() is documented to return a string on success, and false on failure

Well, for me it just says "A string on success." That needs to fixed anyway. And yeah, should explicitly document what is returned if the codeset has not been set yet (currently, false).

@cmb69 cmb69 added the bug Documentation contains incorrect information label Dec 15, 2024
orlitzky added a commit to orlitzky/php-src that referenced this issue Dec 15, 2024
…musl

The musl implementation of bind_textdomain_codeset() always returns
NULL. The POSIX-correctness of this is debatable, but it is roughly
equivalent to correct, because musl only support UTF-8, so the NULL
value indicating that the codeset is unchanged from the locale's
codeset (UTF-8) is accurate.

PHP's bind_textdomain_codeset() function however treats NULL as
failure, unconditionally:

  * php/doc-en#4311
  * php#17163

This unfortunately causes false to be returned consistently on musl --
even when nothing unexpected has happened -- and naturally this is
affecting several tests. For now we change two tests to accept "false"
in addition to "UTF-8" so that they may pass on musl. If PHP's
bind_textdomain_codeset() is updated to differentiate between NULL and
NULL-with-errno-set, these tests can also be updated once again to
reject the NULL-with-errno result.

This partially addresses GH php#13696
@arnaud-lb
Copy link
Member

arnaud-lb commented Dec 17, 2024

I'm confused by the spec, as it seems to contradict itself:

 * If domainname is a null pointer or an empty string, bind_textdomain_codeset() shall make no changes and return a null pointer without changing errno.
 * Otherwise, if codeset is a null pointer:
   * If domainname is not bound, the function shall return the implementation-defined default codeset used by the gettext family of functions

But also:

If codeset is a null pointer and domainname is a non-empty string, bind_textdomain_codeset() shall return the current codeset for the named domain, or a null pointer if a codeset has not yet been set.

The terms "If domainname is not bound" and "if a codeset has not yet been set" appear to be equivalent, based on the rest of the spec.

If we consider the first quote, bind_textdomain_codeset(domainname, NULL) is supposed to return the default codeset when no codeset was set before.

If we consider the second quote, it's supposed to return NULL in the same case.

The RETURN VALUE section agrees with the second quote:

A call to the bind_textdomain_codeset() function with a non-empty domainname argument shall return one of the following:

    [...]

    A null pointer without changing errno if no codeset has yet been bound for that text domain

The GNU gettext manpage agrees with the second quote as well:

If no codeset has been set for domain domainname, it returns NULL.

So the second quote is probably the right one.

Given that setting the codeset always fails on Musl (kind of, as it doesn't set errno), then Musl is actually correct to always return NULL from bind_textdomain_codeset(domainname, NULL).

So it would be enough to just document that false will also be returned if no codeset has been explicitly set for the domain.

@orlitzky
Copy link
Author

I'm confused by the spec, as it seems to contradict itself:

Indeed. This is new in POSIX 2024, and all of the docs I can find for the various implementations (solaris, gnu, musl, freebsd) have it returning NULL, so I wonder where the "implementation defined codeset" part came from.

@orlitzky
Copy link
Author

I got nothing but a headache trying to sign up for the POSIX bug tracker / mailing list, but I did find the email address of a technical editor and sent him a note.

@orlitzky
Copy link
Author

Success: https://austingroupbugs.net/view.php?id=1894

orlitzky added a commit to orlitzky/php-src that referenced this issue Dec 18, 2024
Musl has two quirks that are leading to failed internationalization
tests. First is that the return value of bindtextdomain(..., NULL)
will always be false, rather than an "implementation-defined default
directory," because musl does not have an implementation-defined
default directory. One test needs a special case for this.

Second is that the musl implementation of bind_textdomain_codeset()
always returns NULL. The POSIX-correctness of this is debatable, but
it is roughly equivalent to correct, because musl only support UTF-8,
so the NULL value indicating that the codeset is unchanged from the
locale's codeset (UTF-8) is accurate.

PHP's bind_textdomain_codeset() function however treats NULL as
failure, unconditionally:

  * php/doc-en#4311
  * php#17163

This unfortunately causes false to be returned consistently on musl --
even when nothing unexpected has happened -- and naturally this is
affecting several tests. For now we change two tests to accept "false"
in addition to "UTF-8" so that they may pass on musl. If PHP's
bind_textdomain_codeset() is updated to differentiate between NULL and
NULL-with-errno-set, these tests can also be updated once again to
reject the NULL-with-errno result.

This partially addresses GH php#13696
orlitzky added a commit to orlitzky/php-src that referenced this issue Dec 19, 2024
Musl has two quirks that are leading to failed internationalization
tests. First is that the return value of bindtextdomain(..., NULL)
will always be false, rather than an "implementation-defined default
directory," because musl does not have an implementation-defined
default directory. One test needs a special case for this.

Second is that the musl implementation of bind_textdomain_codeset()
always returns NULL. The POSIX-correctness of this is debatable, but
it is roughly equivalent to correct, because musl only support UTF-8,
so the NULL value indicating that the codeset is unchanged from the
locale's codeset (UTF-8) is accurate.

PHP's bind_textdomain_codeset() function however treats NULL as
failure, unconditionally:

  * php/doc-en#4311
  * php#17163

This unfortunately causes false to be returned consistently on musl --
even when nothing unexpected has happened -- and naturally this is
affecting several tests. For now we change two tests to accept "false"
in addition to "UTF-8" so that they may pass on musl. If PHP's
bind_textdomain_codeset() is updated to differentiate between NULL and
NULL-with-errno-set, these tests can also be updated once again to
reject the NULL-with-errno result.

This partially addresses GH php#13696
arnaud-lb pushed a commit to php/php-src that referenced this issue Dec 19, 2024
Musl has two quirks that are leading to failed internationalization
tests. First is that the return value of bindtextdomain(..., NULL)
will always be false, rather than an "implementation-defined default
directory," because musl does not have an implementation-defined
default directory. One test needs a special case for this.

Second is that the musl implementation of bind_textdomain_codeset()
always returns NULL. The POSIX-correctness of this is debatable, but
it is roughly equivalent to correct, because musl only support UTF-8,
so the NULL value indicating that the codeset is unchanged from the
locale's codeset (UTF-8) is accurate.

PHP's bind_textdomain_codeset() function however treats NULL as
failure, unconditionally:

  * php/doc-en#4311
  * #17163

This unfortunately causes false to be returned consistently on musl --
even when nothing unexpected has happened -- and naturally this is
affecting several tests. For now we change two tests to accept "false"
in addition to "UTF-8" so that they may pass on musl. If PHP's
bind_textdomain_codeset() is updated to differentiate between NULL and
NULL-with-errno-set, these tests can also be updated once again to
reject the NULL-with-errno result.

This partially addresses GH #13696
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Documentation contains incorrect information
Projects
None yet
Development

No branches or pull requests

3 participants