implementation.tex

% vim:ts=1:et:nospell:spelllang=en_gb:ft=tex

 \chapter{Implementation}
  \label{implementation}

  To implement the \emph{ppt2mxp} conversion tool that is the subject of this
  thesis, we chose the Java programming language \citep{gosling-1}, version 8.
  Although the author has significant experience with lots of other, more
  interesting, more compelling, more fun languages, several reasons pushed us
  towards Java, the least of them being its ease of use. Of course, Java
  \emph{is} easy to use --- it would not have become as popular as it is
  nowadays if it was not. It has a fairly clear and logical syntax, a consistent
  structure, and an extensive standard library.
 
  The vast and extensive amount of libraries available for Java was obviously
  one of the more important reasons to make this choice. The existence of the
  Apache POI library (see section \ref{poi}) was a huge help in reaching our
  goal; without it, we would have had to figure out the very obfuscated\ .ppt
  file format structure, which undoubtedly would have taken up more time than
  was available to us. Other libraries like Spring, which allows the programmer
  to use and reuse components without writing complex systems to instantiate
  them, further increased our resolve to make Java our primary technology
  choice.

  However, Java is not the only technology used here. \mxp is written entirely
  in HTML5, so any tool that somehow relates to \mxp sooner or later needs to
  use HTML5 as well. The widely accepted HTML5 standard makes \mxp
  presentations highly portable and runnable on any device with a recent web
  browser, including smartphones and tablets \citep{roels-1}.

  \fig{conversion}{An original \ppt slide (top) and the converted result
  including automated layout in \mxp (bottom)}

  \figref{conversion} shows the result of our implementation. We clearly
  see the effects of our automated layout algorithm on the contents of this
  slide In the following sections we discuss how the various technologies were
  used to create the \emph{ppt2mxp} tool and the \code{autolayout} container
  plug-in, the combination of which accomplishes this result.

  \section{Taking \ppt apart}
   \label{poi}

   When converting one file format into another, the first part of the process
   involves getting the data you need out of the original file. This can be
   very complicated, as some --- usually proprietary --- file formats are
   deliberately designed to discourage this. They obfuscate data, encrypt it,
   and structure it in illogical and unexpected ways, amongst other techniques.
   The \ppt file format unfortunately is such a format, as Microsoft would not
   want to risk other companies making software that would work with \ppt
   files. Of course, over the years people have managed to crack the format,
   enabling the conversion of \ppt presentations into other formats, although
   the conversion does not usually guarantee to yield results that mimic the
   original version perfectly. Luckily, we do not want a perfect conversion, we
   want a better one.

   We found Apache POI library very helpful in this part of the implementation.
   The POI\footnote{Originally ``Poor Obfuscation Implementation''
   \citep{sundaram-1}} Library is a Java library that provides an API to access
   Microsoft document formats. The most mature (and most popular) part of it is
   HSSF\footnote{Horrible SpreadSheet Format}, which is used by Java developers
   worldwide to access Microsoft Excel spreadsheet data, as well as export data
   into Excel spreadsheets.

   For our purposes, we relied on HSLF\footnote{``Horrible SLideshow Format''},
   which provided us with a full API to access the contents of a \ppt
   presentation's contents in a myriad of ways. We could access all images at
   once, or every bit of text from the whole presentation, but the most
   interesting to us was the ability to access contents on a per-slide basis.
   Getting a list of the slides in a presentation first allowed us to group
   contents within their immediate context, under a node per slide in our
   component tree. As such, we could loop over the presentation's slides,
   converting them one by one, by placing the contents of each slide in a \mxp
   slide equivalent.

   \subsection{Bullets}

    That was unfortunately not the end of it. While HSLF does give us access to
    all the text in a presentation, or per slide, it was not immediately clear
    to us how it distinguished between `normal' text and bullet lists. This
    meant for a long time our conversion process was incomplete, as all bullets
    from the original \ppt presentation appeared as incoherent text runs in our
    converted result. We found out about the \code{RichTextRun} class, which
    had all the tools and properties to detect bullets and their indentation
    level, but we only discovered very recently that we could extract
    \code{RichTextRun}s from the \code{TextShape}s we were getting out of the
    slides.

     \begin{figure}[h!]
      \begin{lstjava}{bullets-1}{Converting bullet points}
BulletList ul = new BulletList();
Stack<BulletList> listStack = new Stack<>();
int indent = 0;

for (RichTextRun run : textShape.getTextRun().getRichTextRuns()) {
  ListItem li = new ListItem();
  StringContent txt = new StringContent(StringUtils.strip(run.getText()));

  if (run.isBullet()) {
    if (run.getBulletOffset() > indent) {

      indent = run.getBulletOffset();
      listStack.push(ul);
      ul = new BulletList();
      listStack.peek().addItem(ul);

    } else if (run.getBulletOffset() < indent) {

      indent = run.getBulletOffset();
      ul = listStack.pop();

    }
  } else {
    while (listStack.size() > 0) {
//  Current component is not a bulletlist or bullet, go to the top bullet level
      ul = listStack.pop();
    }
  }

  li.getContents().add(txt);
  ul.addItem(li);
}

while (listStack.size() > 0) {
  // Current component is not a bulletlist or bullet, go to the top bullet level
  ul = listStack.pop();
}

return ul;
      \end{lstjava}
     \end{figure}

    The code in \lstref{bullets-1} shows how we extract bullet points from a
    \code{TextShape} object and convert them into the nested bullet format we
    need. As you may notice, the original bullet points are not nested in any
    way: they are all on the same level of the original object tree, no matter
    their indentation level. This indentation level is also stored in the
    bullet object. However, we wanted a more elegant solution where the
    indentation would be deduced from the level of nesting, much like HTML has
    always done.

    The solution loops over the list of bullets, checking their indentation
    level and building a stack of nested bullets accordingly. When increasing
    the indentation level, a new bullet list is started and added as an item on
    the existing bullet list. When indentation decreases, we go back to the
    parent list and continue there. This can go down several levels, and all
    the way back up of course. Since the objects do not have a direct link to
    their parent, we use a stack to keep track of this at all times.

   \subsection{Animations}

    Another challenge was dealing with animations and other ways people managed
    to put way more content on one slide than would be advisable. The
    animations could not be transferred to \mxp since \mxp has its own way of
    transitioning from each component to the next in the form of a
    ZUI\footnote{Zoomable User Interface}. It would technically be possible to
    implement additional animations as a separate plug-in for \mxp to provide
    the equivalents of the animations in \ppt*, but that is beyond the scope of
    this thesis. So we could not provide the same animations, but some people
    use those animations not just to show off but to actually show multiple
    pictures and blocks of text, one after the other, on the same slide.
    Without animations, this content would either not be visible or it would
    become a serious layout issue in \mxp.
   
    Our first solution tried to limit the amount of objects one slide can
    contain, and any additional content should be put on extra slides
    automatically. A downside of this is that we had no way of guessing the
    correct order in which the content should appear, so what may have been an
    intrinsic choreography of pictures in \ppt might become an incoherent
    jumble of images in \mxp. Another solution would be to scale all content
    until it all fits next to each other on one slide, and then rely on the ZUI
    to show the pictures one by one, but in this case the same problem with
    order of appearance manifests itself. In the end, we decided it would be
    best to accept that no conversion algorithm is going to be perfect, and the
    author can always manually change the order around after the conversion is
    done.

    With this in mind, we now render the components in the order we get them
    from HSLF, hoping that this resembles the original order closely. The
    automated layout takes care of any overlapping that might have occurred
    originally, so we do not have to worry about that.

  \section{Generating \mxp content}

   Generating \mxp presentations was the final goal of the first phase of this
   thesis. This seemed a fairly easy task at first, until we learned that the
   \mxp compiler would not be available to us for most of the year. This meant
   we would either not be able to view-test our generated presentations, or we
   would have to convert them to browser-ready HTML5 ourselves. We chose the
   latter option, as not being able to see our results would not be very
   helpful in implementing and tweaking our conversion tool. As a result, this
   task became much more complicated, as we had to emulate the compiler's work
   ourselves. Luckily we already decided we would be working with a Java object
   representation of the original presentation as an intermediary form, a
   so-called component tree, which meant we could easily change the output of
   our conversion tool without affecting the rest of the conversion process,
   and on a per-component basis.

   \subsection{Emulating the \mxp compiler}

    Since the \mxp compiler was not functional during most of this thesis'
    implementation, we decided to generate an HTML5 file much like the \mxp
    compiler would, including the \mxp JavaScript library and plug-ins. This
    required us to first learn how \mxp works on the inside, which proved to be
    a steep learning curve but gave us more insight into the software than we
    would have gotten if we only had to generate \mxp XML and leave the rest to
    the compiler.

    \subsubsection{Improving the ZUI}
   
     As an exercise, we changed the way the ZUI works. Originally, \mxp used
     the CSS3 \code{transform: scale()} property to enlarge or reduce the whole
     view, giving the impression of zooming in or out. This is an obvious
     approach, simple in its execution and quite foolproof. However, the
     downside is that you can not zoom in very much, because currently browsers
     do not leverage the advantage of vector graphics and fonts even if you do
     use them, and obviously raster-based content does not scale much anyway.
     Instead, browsers render the content at its initial scale, and then treat
     the result as one big image when scaled or otherwise transformed
     afterwards. This means you get extremely pixelated content when zooming in
     too much.

     Through some refactoring, we were able to change this to use the
     \code{transform: translateZ()} property instead, along with the
     \code{transform: perspective()}\footnote{Not to be confused with the
     \code{perspective: <number>} property, which yields different results} and
     the \code{transform-style: preserve-3d} properties. This means we are now
     effectively rendering the presentation in 3D, and moving our viewpoint
     around in the 3D space to center each slide or component in turn.

     We believe this opens the door for even more visually impressive
     presentations, where content can be placed on different points along the
     Z-axis. This allows for example to place multiple slides behind one
     another, making for impressive zoom transitions between slides. The
     downside of this is that the overview may not always show all content, as
     some content can overlap, but we trust the author uses this feature wisely
     when manually adjusting the position of their slides. It may for example
     be useful to group slides together in this way, when there is too much
     content to show on one slide but creating a second, separate slide may
     break the flow of the presentation. In any case, our automated layout
     plugin will not currently generate slides positioned this way.

    \subsubsection{Plain HTML5}

     After investigating the inner workings of \mxp and studying some example
     presentations, we were ready to start generating our own presentations
     based on our component tree. This meant every possible component would
     have to be written out as valid HTML, with the necessary attributes for
     each generated tag and with any child components enclosed. Since our
     component tree nodes are nested the way the final HTML should be nested,
     this was not a problem.

     Our implementation currently includes \code{compile()} methods on every
     component object, which is consistent and easy to understand, but which
     might not be the best way to implement this depending on future goals.
     Instead of having a separate layer taking care of the output, currently it
     is a cross-cutting concern, which as we all know is not a desirable design
     pattern. We have to walk through the entire tree in any case, so
     performance will always be $O(n)$ at best.
    
     The current implementation has the advantage of extensibility, where new
     components can easily be added and it is immediately obvious to any new
     developer how these new components should generate their equivalent HTML
     code. However, for replacing the output with a different format it would
     be better if this functionality was separated from the components and
     gathered in a distinct \code{Writer} class instead. Switching formats
     would then be as easy as dropping in a new \code{Writer} class that
     generates a different format. Since we initially did not expect to switch
     outputs, and because we started implementing the conversion of one
     component and then added components as we needed them, the current
     implementation --- focused on simplifying addition of new components ---
     was easier for us to work with. Perhaps the refactoring of this
     implementation is an option for future work.

    \subsubsection{\mxp XML}

     Generating \mxp XML should be simpler than the HTML5 output, although in
     the end there is probably not much difference. We will not have to generate
     unique IDs for every component, and we will not have to generate the
     preamble content (which includes the \mxp library itself), but the
     structure and content should remain pretty much the same. Instead of
     \code{<div>} tags we can now use \mxp-specific tags. All that adds up to a
     much more easily readable result.

     The advantages towards the user would be even bigger. Using a hypothetical
     \mxp editor, the output of our conversion tool could be edited in such a
     tool immediately, while presumably such an editor would not work on plain
     HTML5. The reuse of content --- one of the main goals of \mxp --- would
     also be much more plausible using the XML format.

  \section{Creating layouts}

   Implementing an automated layout is not an easy task. At any point in the
   process opportunities arise to use some kind of template, some sort of
   design choice that would appear to make things easier, but turn out to be
   restrictive when applied to edge cases. There is also not always a clear
   distinction between implementing a template and implementing an automated
   layout. If you decide to put all of your components in a row next to each
   other, have you just implemented a template or not? What if you make them
   all the same height, so the row will look aesthetically pleasing when
   looking at it as a whole? What if you do not?

   It becomes more clear when we have to work within a defined area, such as a
   slide container. In this case, we have to fit our content within the slide;
   this could be seen as a template decision but it is not one our algorithm
   makes, so the algorithm itself just tries to fullfill the constraints it is
   given. We can then calculate the relative sizes of our components, scale
   them down or up together (using the same scale factor) until their combined
   size equals the area we need to fill, and puzzle them together in a way that
   fits. If no way is possible, we scale everything down some more and try
   again, until we find something that works.

   Whether using automated layout or not, the suggestion to not put too much
   content on one slide remains. While an automated layout system may be able
   to fit all content on one slide, too much content will still cause
   information to be conveyed less effectively. Due to technical limitations of
   currently existing browsers, scaling content down and then zooming in to
   make it fill the screen is not always an option: browsers treat the content
   as rasterized images rather than vectorized graphics, and scaling them does
   not yield ideal results. Scaling an image down and then zooming in on it
   will show you a blurred version at best, and a great big coloured blob at
   worst. Using the \code{perspective()} and \code{translateZ()} properties
   yields much better results, but this is not easily usable within a slide
   container as the content would be rendered \emph{behind} the slide, making
   the slide seem empty as seen from the front. If we want the content to be
   visible on the slide, we need to render it within the size limits of the
   slide, which gives the same blurred results as scaling.

   The true power of automated layout generation becomes apparent when no such
   boundaries are posed. If the content does not need to be scaled down, we can
   use it all in its original size whether those sizes are similar or not.
   Depending on which options the author of the presentation turns on, we can
   then still resize content to match sizes, among other things.

   \subsection{Using constraints}

    The basis of our automated layout approach is the use of constraints. As a
    first step in the process, we assign each component a certain `bounding
    box', a set of constraints that dictates how close other components can get
    to this particular component. This distance can be decided arbitrarily, or
    have a hard-coded value, but we decided to take a more dynamic approach and
    let the distance be 10\% of the width or height, whichever is the largest.
    This way, the padding around the content is equal on all sides, and
    proportionate to the size of the content.

    The next step in the process decides the size of the components. There are
    two mechanisms that may be applied here together, separately or not at all.
    One mechanism is the equalization of sizes, where we resize all components
    in such a manner that they end up all being the same or similar sizes. We
    can decide to apply this either to the width, the height or the surface
    area, depending on the effect we hope to create. If we want to put all
    components size by side, making them all the same height might make for an
    aesthetically pleasing result, for example; if we want to group them
    together in a raster-like manner similar surface areas would usually be a
    better idea.

    The other mechanism is the scaling of components, in which we scale
    everything up or down equally, maintaining relative proportions between
    components while reducing or increasing the total used surface area. This
    is especially useful when we have to generate an automated layout that
    needs to fit within a predefined container with a fixed size, such as a
    slide container.

    The default approach when no such fixed-size container is present, is to
    not apply either of these mechanisms, while we apply both mechanisms
    together when we do have to fill this predefined area: we then calculate an
    average surface area size between all components, resize all components to
    match this average, and finally scale everything up or down so that the
    total area fits the container.

    It is possible to override this behaviour: when no fixed size is defined,
    we can specify that all components should be sized equally, be it in width,
    height or surface area. The latter is the default but this too can be
    overridden: specifying that the components should be placed in a row or a
    column instead of a raster-like structure will automatically choose the
    matching equalizing method. Conversely, when there is a fixed size, we may
    specify to retain the original proportions and only scale everything up or
    down equally to fit the area.
   
   \subsection{Alternative approaches}

    During implementation, we tried several other approaches to improve upon the
    automated layout, all of which we ultimately decided to abandon in favor of
    the current approach. There were various reasons for abandoning these
    paths. Often the reason was that we realized we were influencing the
    automated layout process with human design decisions, which is exactly what
    we were trying to avoid. Sometimes what we were trying did not yield the
    results we hoped for, and sometimes the implementation became too complex
    to continue or what seemed like a good idea in theory turned our to be
    impossible in practice.

    One such idea was to divide components in a raster-like structure, but
    spread over a set of rows according to a normal distribution --- most
    components in the middle rows, a few components on the first and last few
    rows. We realized soon that on one side, this would become a kind of
    template (a very dynamic one, but still), and on the other side, this was a
    lot more complex than we initially thought. While this feature has been
    abandoned in the current implementation we do think it may still be an
    option for future work though: although it is a template, it seems dynamic
    enough that it would still match the spirit of our automated layout
    approach.

    Another idea was to use the Z-axis to make components seem equal in size
    when viewed from a certain angle as an overview, then zooming in on each
    component to reveal their true size. This was abandoned purely for
    complexity reasons: it is not enough to just place larger components
    further back, you also need to adjust the x and y coordinated to make it
    seem like the component is placed right next to another when in reality it
    is a lot further back. This would not be necessary with an orthogonal
    camera mode, but the HTML5 rendering engines only have a perspective camera
    mode, which makes this idea too complex to implement within our limited
    timeframe.