Skip to content

Data frame: Typed frame vs. type erasure in practice? #214

@SimonHeybrock

Description

@SimonHeybrock

I am looking at #174 with regards to the suggested typed-frame. I am assuming this would essentially be something like

template <class... Ts>
using xframe = std::map<std::string, std::variant<Ts...>>;

I think I really like the idea of having a std::variant here. In combination with std::visit implementing algorithms for this looks very convenient. 👍

I have a couple of concerns however, which may be rooted in my lack of understanding of what you are proposing, i.e., do not see this as a criticism of the approach per se:

  1. Does every custom algorithm or function that takes and xframe as an argument have to be templated on Ts...?

    • In general, a frame may contains tens of variables, not all of which are just plain double or int64_t, i.e., we have a combinatoric explosion of the number of xframe types.
    • Many algorithms will not care about most of the other types present in the frame, do they nevertheless need to depend on (template on) the types of those?
    • Doesn't this lead to absolutely massive code size? ... and compilation times?
    • Does adding a new type to a frame require recompiling the whole codebase, including all libraries built on xframe?
    • Can binaries of library be shipped (or rather, would it be useful at all)? If some custom code adds a custom type to the frame that is not known to the any of the algorithms in a library, it implies that the algorithm cannot be used with such a frame (even if the custom type is irrelevant to the algorithm).
  2. Considering a Python interface:

    • Do we need to instantiate all possible combinations of types and have separate Python exports for all of them?
    • Do we furthermore need to instantiate and export all functions/algorithms for all possible combinations?
  3. How is compatibility between two xframe objects handled? We need to support, e.g., xframe::operator+=(const xframe &other).

    • We definitely must support other that actually has a different content that *this (within certain limits). For example, *this might contain some additional variables or coordinates that are not present in other. If the remaining coordinates and variable names match, an operation is still possible and should be supported.
    • For implementing a custom algorithm that takes, e.g., two input frames, we would need to pass two parameter packs to that algorithm, so we would have something like
      template<class... As, template <class...> class A, class... Bs, template <class...> class B>
      void myAlg(A<As...> &frameA, const B<Bs...> &frameB) { /*...*/ }
      Am I missing something? This looks quite complicated. While certainly doable for library-internals, it looks quite complicated when targeting an average C++ developer. Is there a way to avoid this (except for having a may too generic template <class F1, class F2> void myAlg;)?
      Again, I am also concerned about the number of instantiations. I assume it could quickly reach hundreds or even thousands in practice when supporting two input frames (at least when doing explicit instantiations such that xframe is usable from Python)? Supporting three or more input frames seems totally impossible, unless maybe we just have a single xframe type for a variant with all possible supported types (not sure this would still be possible with expressions then, since adding a set of expressions to the list of supported types seems unrealistic?)?
  4. Is it possible to provide an intuitive API for the frame type, given that adding/removing variables may change the type?

    • From variable to frame: design idea #174 suggests operator|, which is very different from Python where we would have something like frame["data1"] = variable --- is such an asymmetry between C++ and Python a problem?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions