-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deps: Add build recipe for PNG #4423
Conversation
Looks like we're still failing CI here |
indeed... it's the strangest thing. Still trying to get to the bottom of this. I think what's happening is, for systems that have a "too old" version of dynamic PNG libraries installed, now that there's a build recipe available, even if we build newer static PNG libs, the build system is still preferring to link the older shared libraries (even though they're too old). I dunno if this is a clue or a red herring, but it seems the build system might be getting confused about which version of PNG it's found versus the version it's trying to build (at least nominally):
so... in the second line in the above log snippet, it claims to be "Building package PNG 1.5.13 locally", even though it's building 1.6.44. I'm not sure what, if anything, that has to do with the more serious problems:
A lesser problem, but one worth figuring out -- currently, I'm forcing the recipe to only build static libraries, cuz Windows has trouble figuring out what to do with / how to install the dynamic PNG library it just built. (I was having similar trouble on macOS until I set Ideally, we'd only link the static libraries, since that seems to behave properly across platforms, given the opportunity... |
I'm kind of baffled by the CI build failures... I can't seem to pinpoint anything in the logs or the configs that would explain why or how the failures occur on some runs, but not others... The problematic runs are:
And they all fail the same way:
But... considering the rest of the checks pass, it doesn't seem to be an issue with the compiler or compiler version, the VFX202X build image, the version of Python, the version of EXR, the version of OCIO, or whether sse instructions are used. And I can't seem to ascertain a particular constellation of settings that produces a failure. The only thing I could identify that looked suspicious to me is the ordering of paths in the The logs aren't verbose enough for successful builds for me to tell if this is the difference that explains the failures of unsuccessful runs. |
8dd92ae
to
0c95b1d
Compare
Okay, this doesn't seem to be working properly -- CMake is having trouble finding the correct PNG library, and my understanding is, the issue stems from PNG's own build system idiosyncracies. The failures above occur on systems with libpng v1.5 installed, which are initially detected as too-old; but are later "refound" after find_package fails to find the libraries under $PNG_ROOT. This seems to be a problem with either kitware's FindPNG.cmake, PNG's CMake config, or both. This said, it looks like the libpng team has been very actively working on improving and modernizing their CMake-based build system recently, seemingly in anticipation of an imminent libpng 1.8 release. Importantly, two weeks ago, libpng v1.6.44 made some improvements and changes to the CMake build system (see pnggroup/libpng#545 among others) -- which is great -- but the difference between 1.6.43 and 1.6.44 seems to have serious implications for how find_package behaves (or fails to behave) and where targets can be found -- namely, we need find_package to consider the CONFIG introduced in 1.6.44, and likewise set PNG_DIR. However, sometimes, the find_package still fails to (re)find the correct libpng library. Locally, I can produce conditions with the VFX2021 image where find_package is able to find the proper self-built libraries... but only with incremental builds that reuse a build directory, and only after a first build attempt that manages to self-build PNG, but still proceeds to link the too-old system dynamic PNG, as described above. We do have other means for finding PNG -- for one, PkgConfig / png_check_module seems to find our PNG without difficulty -- but I understand this might be problematic on Windows. Alternatively, we can use the output of the libpng-config cli tool for setting PNG_LIBRARIES and PNG_INCLUDE_DIRS (and PNG_FOUND)... but setting those values doesn't seem to sufficiently inform the rest of the build process. And I don't really know enough about CMake or building and linking stuff in general to know what exactly to do next. I feel the best solution involves coercing CMake's own Find mechanism to do the right thing. I'm not sure if that means creating a custom FindPNG.cmake module or patching PNG's CMake config, or what; but I suspect the path of least resistance is contingent on the build system improvements the libpng team is currently working on. This PR was initially motivated by Wheels workflow tasks failing on vanilla macos-14 runners; and while this build_PNG.cmake recipe does seem to help with building the macos-arm64 wheels, it obviously introduces problems under other conditions. So, for the time being, we're bypassing the PNG plugin entirely for the Wheels / Build MacOS ARM64 tasks until we can figure out a more robust solution for PNG. |
@zachlewis Any update on this one? |
I'm afraid not. I'll have another look, but I think I might be stuck... The failures occur when the build system detects a too-old version of libpng:
I'm trying to find a way to force the build system to always find and link "our" PNG; but Also, libpng 1.6.43 and 1.6.44 behave very differently. |
@zachlewis, I run into similar problem during 'Refind' step - the too old version of OpenEXR has been used even thought the locally we build another version according to the current minimal requirements. What I've found in this case is to explicitly instruct |
If that change doesn't break anything, I'm fine with it. What do you think about adding
which would, if I understand correctly, make it look ONLY in the place where we locally built the package, and none of the other usual locations. I believe that might make it impossible to accidentally find the wrong one installed elsewhere in the system. |
To clarify, I mean those are potential additional arguments to the find_package that is called in the re-find step. |
I run a quick test replacing
Sadly, this gives me the same result as the original issue - where 'Refind' detects existing system version of library - as if
So far only this version gives expected result and builds. Note I removed |
I was able to reproduce the problem with a system version of PNG 1.5 installed - which forces to build 1.6 version locally. Although I didn't encounter issue of not detecting local build of PNG. For consistency though, I moved the 'refinding' step into To address the missing symbols in simd_test, it appears that OpenImageIO does not apparently link with PNG if static version of the library is built locally. I'm not sure why this is not necessary in cases when the system version is used. |
Hey @kaarrot, thanks for having a look! Super tricky, right?! I've been doing a ton of experiments... hard to remember what I've tried and not tried, but I've definitely run into the same trouble as you with Robinmap after setting NO_DEFAULT_PATH in the (re)find_package step. I know I've messed with including ${pkgname}_REFIND_VERSION and REQUIRED in the (re)find_package step as well, to no avail. I have found something that does seem to work, but I'm not sure if it's the direction we want to go, nor do I understand why this method works where others haven't... In any case, I'll push another commit to see what fails. With a little help from ChatGPT, I put together a |
hmmm... getting closer...! Not sure why vfx2022-gcc9 and the vfx2021-hobbled tests are still failing, though... |
Actually, looks like those successes were false positives. re-finding with ${pkgname}_REFIND_VERSION REQUIRED reveals elicits proper failures (in that find_package is still finding the system libpng). I think this works for me locally cuz I'm using a newer version of CMake. edit: nope, not the version of CMake either -- I tried testing "CI / VFX2021 hobbled" with CMake 3.24.4 + FetchContent_Declare's OVERRIDE_FIND_PACKAGE option, and... still no dice. I'm not sure why I'm having success locally with my build_dependency_with_fetchcontent macro, but since it doesn't seem to help things here, I'll remove it shortly. |
Signed-off-by: Zach Lewis <[email protected]>
Wow. It looks like I mangled the commit history pretty good here, huh. Instead of trying to reconcile with main and clean stuff up, I've started a fresh branch here: Right now, there's a single added build_PNG.cmake file, which is at parity with the one here (so, same 8 failing checks). If / when I have something worth sharing, I'll close this draft PR and open another. I'm not sure what the next step here would be. I guess we could try a custom FindPNG module? |
If you rebase on top of main and squash the whole thing down to one commit, do you get that nice compact addition of the build_PNG.cmake only? |
Thank you , that worked perfectly! |
Still weird here. You should be able to
|
I had no idea I could do that. Should be less weird now. |
I've been fooling around with this myself. No matter what I do, it's explicitly including the system libpng.so. I'm starting to wonder if some other dependency is pulling it in under our noses. |
(I don't seem to be able to approve the changes in github, but I most certainly do approve!) |
This is very interesting. FWIW, Freetype does seem to require (or at least use if available) PNG, so I wonder if PNG's placement below Freetype here is specifically what's causing problems for us; and it makes me wonder if the build system picking up a dynamic Freetype lib that itself links a dynamic libpng.so is somehow fighting our ability to find our libpng. Man, if we can force the build system to build static deps the whole way down for all plugins... :chefs_kiss: |
I may have spoken too soon... I'm having trouble building the MacOS ARM64-only Python wheels, wherein we're setting -DCMAKE_OSX_ARCHITECTURES="arm64"
It seems libpng has had a little trouble with MacOS ARM64 CMake-based builds (pnggroup/libpng#372) It's suggested in the thread that |
I wonder if the problems I'm experiencing somehow reflect an issue with Freetype, and not PNG. I can verify that arm64 libpng libraries are built successfully...
The problem goes away if I disable support for PNG-compressed OpenType embedded bitmaps in Freetype by passing the following CMake argument to Freetype's build_dependency_with_cmake command: I dunno what a PNG-compressed OpenType embedded bitmap is, or how common they are in the wild... but in my very brief testing, it does seem that oiiotool is able to read and write PNGs and print DroidSans text without issue... |
I feel like you're giving me too much credit! This was far from a skilled application of actual knowledge. Basically, I just tinkered with changing anything and everything I could think of that might be related, all weekend long over the course of 34 CI runs, until finally getting a combination that picked up the right libraries and passed the tests for each of our test matrix cases. It was very frustrating and I felt like I was flying blind most of the time. |
I think your assessment is correct -- Freetype is pulling in its own libpng and the exact order that the linker tries to resolve the symbols is very fragile and can break one or the other. libpng gives us a build-time option to give a prefix to all the symbols (which is one of the changes I made), which prevents the wrong symbols from being linked, but they don't have a build option to change the name of the library, which really would have been handy and solved the whole thing, because then the linker wouldn't pick just one -- it would look like two differently named libraries with totally differently named symbols. I'm almost tempted to propose a PR against libpng to add this option, but it's a pretty big pain for me to get legal clearance to contribute to a project. It's a one-time bureaucratic cost, so well worth it for a project we have a continued need to contribute to, but it's a high threshold for something where I want to make a 3 line change and then probably never send anything to the project again. OK, so I think you're on the right track with the idea that for the python wheels, you can probably build freetype locally without the png option for now and see if anybody ever complains. Let's try that as an additional safeguard. I will also note for the record that for quite some time, I've had it on my list to see if we should jettison libpng entirely in favor of "spng" which is just one .h and one .c file so can easily be vendored, and is supposed to be higher performance read/write as well. I was about one day of frustration away from just doing this and cutting libpng out of our lives. Maybe it's something we should still aim for in the relatively near future to eliminate all future libpng woes. |
Freetype seems to be having trouble linking PNG on MacOS when CMAKE_OSX_ARCHITECTURES = arm64. Signed-off-by: Zach Lewis <[email protected]>
oh my god i broke everything |
Never mind, I don't think this one's my fault... Between now and when things last worked, it looks like our CI / Ubuntu checks are now pulling a newer version of libheif (now 1.19, previously 1.17); and they seem to be running an older version of the OS (now "focal", previously "noble"). It seems unrelated to the changes I've just made (to disable PNG support in Freetype), but I'll happily revert them if needed... but I'm guessing I might see these failures in other PRs now.
That seems like a really good idea. If I have some time and if I can figure out what needs to be done, I'll see if I can submit a PR to the libpng folks.
I'd come across "spng" in my desperation too not long ago, and was gonna ask about it on Slack if we'd completely run out of momentum here... I guess it's really a matter of how much more pain you're willing to put up with if this PR doesn't solve everything foreverish. |
Let's just say that for the purpose of this PR, any test matrix entry that only fails the "heif" test is unrelated and won't prevent us from merging. We will investigate that separately.
I think if this PR is working for now, we should merge it as-is. I think that the most important application of this PR is to help put the final touches on the Python wheel generation so that we are sure it supports PNG, a commonly encountered format (despite being awful). Correct me if I'm wrong, but since WE (via our GHA workflow) are the ones generating the wheel, don't we have the option to make that workflow uninstall any existing problematic/conflicting dependencies from the VM image before we start our own builds? So we should be able to build conflict-free static libraries we need for the python wheel. Sure, somebody building on their own system that contains conflicting dependency versions can still have trouble in some cases, but that's no worse than where we are today, and we can return to the problem to make the build system even more robust after 3.0 release including allowing PyPI installs. |
Let's stick a fork in this sucker. LGTM.
Yes, you're absolutely correct. I was in fact uninstalling homebrew'd installed packages in the Wheels workflow at one point specifically to get rid of Freetype + friends; but upon review, we felt a more robust way to go would be to implement a Anyway -- yes, we can do whatever we want. Except install Docker on the MacOS images, apparently. |
Certainly, if there's a reliable way to ignore the paths, use that rather than uninstalling (which would take more time). |
OK, I will merge this, then. It's no big deal to have further changes to the PNG build if need be. But give me a couple hours to chase down the heif thing, if I can get that solved I will feel better. |
I merged in the unrelated CI fix, and that does appear to have gotten this all unstuck. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion diffs
Co-authored-by: Larry Gritz <[email protected]> Signed-off-by: zachlewis <[email protected]>
Co-authored-by: Larry Gritz <[email protected]> Signed-off-by: zachlewis <[email protected]>
I believe this is ready to go. The one test failure seems unrelated, let's look into it separately. You ready for me to merge this, @zachlewis ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Merge away! Thanks again. |
As requested in AcademySoftwareFoundation#4387. --------- Signed-off-by: Zach Lewis <[email protected]> Signed-off-by: Larry Gritz <[email protected]> Signed-off-by: zachlewis <[email protected]> Co-authored-by: Larry Gritz <[email protected]>
As requested in AcademySoftwareFoundation#4387. --------- Signed-off-by: Zach Lewis <[email protected]> Signed-off-by: Larry Gritz <[email protected]> Signed-off-by: zachlewis <[email protected]> Co-authored-by: Larry Gritz <[email protected]>
As requested in AcademySoftwareFoundation#4387. --------- Signed-off-by: Zach Lewis <[email protected]> Signed-off-by: Larry Gritz <[email protected]> Signed-off-by: zachlewis <[email protected]> Co-authored-by: Larry Gritz <[email protected]>
As requested in AcademySoftwareFoundation#4387. --------- Signed-off-by: Zach Lewis <[email protected]> Signed-off-by: Larry Gritz <[email protected]> Signed-off-by: zachlewis <[email protected]> Co-authored-by: Larry Gritz <[email protected]>
As requested in #4387.