Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul indexer #2207

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open

Conversation

Omikhleia
Copy link
Member

@Omikhleia Omikhleia commented Jan 2, 2025

Closes #1339

  • Automatically register the indexer on endpage class hook = using the 0.14.x way so any document class can use it (vs. the 0.12.x way where classes had to explicitly call a method)
  • Use language-dependent sorting (collation) in indexer = Sorting based on ICU for the current language
  • Support page ranges in indexer = either none (all pages are listed without page range collapsing) or expanded (consecutive pages are collapsed in full, e.g. 301-305)
  • Support links to pages in the indexer = hyperlinks if the pdf package is previously loaded
  • Add indexer options minimal/minimal-two for page range collapsing = consecutive pages are collapsed in minimal ranges (e.g. 301-5) or minimal ranges with at least two digits (e.g. 301-05). N.B. the "minimal", "minimal-two" and "expanded" names are loosely derived from those defined in CSL.1
  • Add option for filler between item and page in indexer = dotfill (leaders) (default), fill (space as before), comma (content is not line-flushed). These are the most usual ways of presenting an index, i.e.
    • item . . . . . 50, 60-63
    • item 50, 60-63
    • or item, 50, 60-63
  • Add new styling hooks to the indexer = introduces hooks on entry and pages, (BREAK) removes hard-coded verbatim styling (@davidchisnall see discussion below)
  • Allow indexes to be used anywhere in a document = As for table of contents, use an intermediary file, so the index(es)
    may be used (at the cost of a SILE re-run) anywhere, and not only on their own page at the end of a document... (This does fix several items from Enhancing the indexer package #1339, notably lost index entries if there's no page break before the \printindex and index entries are on that page2)
  • refactor(packages): Change index:item into an indexer package method = It's no longer callable from SIL/XML, so a package method is certainly better than a command.
  • feat(typesetters): Add content to text utility to the base typesetter = at last3
  • refactor(packages): Use typesetter content to text in indexer and tableofcontents
  • docs(packages): Improve indexer package documentation = As it's name says :)

Footnotes

  1. For some consistency, although indexes are not related to CSL. Note that some language may prefer a certain format (e.g. French typography rules usually recommend full expanded ranges, while English rules may prefer a minimal variant), but the package leaves that decision to the user.

  2. It's also more general towards supporting indexes that are not at the end of document on their own pages. I do have cases for indexes early in the document (as for table of contents, one may want them this way, in some books...). Reading the index from an intermediary files also allows for other packages to use it (e.g. one could implement a fancyindex as as did for the ToC in fancytoc.sile, and do their own rendering based on another logic while reusing most of the "common" code)

  3. As noted in the commit log, there are good reasons to do that (I always was convinced of it, at least! Doh)... @davidchisnall Just for the notice, I'll need to backport that to resilient's (actually silex's) modified typesetter, if that proposed change is accepted.

@Omikhleia Omikhleia requested a review from alerque as a code owner January 2, 2025 18:11
@Omikhleia Omikhleia marked this pull request as draft January 2, 2025 18:12
@Omikhleia Omikhleia added bug Software bug issue enhancement Software improvement or feature request modules:packages Issue relates to core or 3rd party packages labels Jan 5, 2025
@davidchisnall
Copy link

I'm not sure if it's a side effect of something else I've done, but this appears to print the index in a monospaced font.

@Omikhleia
Copy link
Member Author

Omikhleia commented Jan 6, 2025

@davidchisnall

I'm not sure if it's a side effect of something else I've done, but this appears to print the index in a monospaced font.

It always did^^
But I'll remove that formatting, as part of my work here -- there's no reason for monospace to be used.

As for table of contents, use an intermediary file, so the index(es)
may be used (at the cost of a SILE re-run) anywhere, and not only
on their own page at the end of a document...
@davidchisnall
Copy link

I've copied the file into my project and am using it there until it's part of a release (yay for modularity!). It seems to be working nicely. Ideally I'd want to be able to split index entries into letter headings and make it use a two-column layout, but I think the latter is independent of this package (though I'm struggling to make it work) and the latter is not vital (and maybe can be done in the style hook?).

Thanks!

@Omikhleia
Copy link
Member Author

Omikhleia commented Jan 8, 2025

It seems to be working nicely.

Thanks, glad it's helpful.
I still have a few commits to propose, but I am close to what I intended here (the rest is mostly some clean-up refactor, and documentation).

Ideally I'd want to be able to split index entries into letter heading

I actually wanted the same, but problem is that we don't have enough ICU support for that. Checking for differences on the first character to insert a "letter milestone" is not sufficient: E.g. In French "Éric" and "Ethan" should possibly both be under E, not under E and É). It might be doable with ICU (diacritic removal, via a transform NFD; [:Nonspacing Mark:] Remove; NFC), but:

  • It would need some extra C code for this in our ICU wrapper...
  • The solution might to be general either, as some languages expect some different rules (e.g. they do expect accented characters to have their own lettered milestone)
  • And even diacritic removal might not be enough for a general solution (e.g. in French œ would be under the "O" letter, but some languages expect some characters like this to be in their own section)

--> On such a topic, a dedicated issue would be required, and one would have to check the literature, regarding how this is addressed in general. (EDIT: maybe extracting the first char, but perform an ICU comparison for equality, might work? Well it might be doable with what we have then, but I still think it would be better in its own dedicated topic1).

... make it use a two-column layout

Wholly independent from this package indeed (and we don't really have robust multicolumn layouts yet).

Footnotes

  1. EDIT Nope: icu = require("justenoughicu"); collator = icu.collation_create("fr", {}); print(icu.compare(collator, "eric", "éric")) --> -1 so not considered equal (though sorting is collation-compliant). Same with backwards option enabled.

@davidchisnall
Copy link

For the sectioning, a hook that provides the name of the current and previous (empty string / nil for the first one) and lets you return nothing or a line break, with per-language defaults and a warning if your language didn't have one would probably be a nice start. The implementation for English is fairly simple but, as you say, for others it can be different (for some languages, I suspect, it's trivial because there's no equivalent notion and everything ends up in a single section).

@Omikhleia
Copy link
Member Author

Omikhleia commented Jan 8, 2025

For the sectioning, a hook that provides the name of the current and previous

As said, wait for this to PR to be finished, reviewed and merged, and feel free to discuss and propose something... (But as far as I am concerned, I'd be against any hook that works only for English.)

The new implementation was no longer callable from SIL or XML as the
pages are now an AST (with links etc.), so a command is no longer
meaningful and it's best to have it as a package method.
Such textual reconstructions from typeset nodes were performed
differently in the tableofcontents and indexer packages, and possibly
elsewhere.
The logic is tied to the typeset nodes and typesetter internal states,
so it's better moved there.
@Omikhleia Omikhleia marked this pull request as ready for review January 8, 2025 19:44
@Omikhleia
Copy link
Member Author

@alerque Ready for review.
I have no idea why the tests fail here (on a busted test for the font manager); they visibly passed on my own fork.

@Omikhleia
Copy link
Member Author

Some dumb illustration of what it does (entries are kinda stupid, but heh)

image

@alerque
Copy link
Member

alerque commented Jan 12, 2025

but problem is that we don't have enough ICU support for that

@Omikhleia If you tell me exactly what function(s) you want exposed (preferably from icu4x) I can expose it to Lua via the rusile module. I'm still mulling over various approaches to a public facing API for that, but as long as we consume it internally in a package we can play around with how it will work because the only breaking change will be the public feature set of the package, not how we reached into ICU to accomplish it.

@alerque
Copy link
Member

alerque commented Jan 12, 2025

For the sectioning, a hook that provides the name of the current and previous (empty string / nil for the first one) and lets you return nothing or a line break

@davidchisnall A hook is fine if we can come up with something we think will have wide utility, but this doesn't sound like a robust way to define such a hook. Also if the current implementation has a callback function for outputting each line, implementing such a sectioning device yourself should be trivial. Just stash the last entry in a local variable (instantiated outside the function) and check it at the top of the function when outputting an entry and output a header first if desired. This is quite a common paradigm for extending/changing SILE behavior already and doesn't require the default implementation to have any hooks or anything. You can even re-use the existing function by stashing it first, then just add your section handling to a new function and call back to the original one.

@alerque
Copy link
Member

alerque commented Jan 12, 2025

  • (BREAK) removes hard-coded verbatim styling

@Omikhleia You mention this as breakage, but it doesn't sound to me like something we need to hold off on for a major release bump since I would consider the existing weird styling a bug not a feature. What you you think? I'm targeting this for the next minor (patch level since we're stuck in ZeroVer mode) release unless I hear suggestion to the contrary.

@Omikhleia
Copy link
Member Author

Omikhleia commented Jan 12, 2025

but it doesn't sound to me like something we need to hold off on for a major release bump

I think the same, IMHO we could just go for it in a minor/patch release and not bother too much.

  • It's not that breaking and your Release Notes usually give enough information to potential users 🐱
  • This package moreover was probably never really used for real:
    • it was still using the 0.12 way to be used (= calling a method from the class, which implies a custom class, which is likely not the case for most users)
    • no one (but me) complained about it for several years
    • no one had ever noticed (but me) that sorting wasn't language-aware etc.
    • same, no one mentioned the hard-coded style behavior...
  • It's marked as "experimental" for some time now in the manual, and I don't think we should necessarily be afraid of slight breaking changes in highly experimental/incomplete packages
    image

Should we change the "experimental" status in the manual, by the way? I am not even sure, we still lag behind LaTeX's capabilities here (with its "see also" entries, multi-level indexes, etc.). We can just move forward, and if it starts being used, then we have interesting topics to address 😹

@alerque
Copy link
Member

alerque commented Jan 12, 2025

Oh yes of course, as long as it's still marked experimental I wouldn't even consider making a breaking release for changes. Just change it for the better and hope for the best ;-)

I agree it probably hasn't seen much use. My own projects have their own implementations (two of them I see). I'll have to work on bringing them in closer to the source by basing on and extending this. That will help get it some exercise and also make it easier to contribute improvements.

In the mean time I see no reason not to move ahead with this sooner rather than later. We can even consider removing the experimental tag after this PR (anytime before the release or whatever release follows when we decide the usage should be considered somewhat stable).

@Omikhleia Omikhleia self-assigned this Jan 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Software bug issue enhancement Software improvement or feature request modules:packages Issue relates to core or 3rd party packages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enhancing the indexer package
3 participants