diff --git a/docs/walkthrough/one-dim.ipynb b/docs/walkthrough/one-dim.ipynb index 86c4b39..3b12192 100644 --- a/docs/walkthrough/one-dim.ipynb +++ b/docs/walkthrough/one-dim.ipynb @@ -305,8 +305,8 @@ "\n", "Then, it's time to prepare our data. We'll create a `DataTree`\n", "that defines the relationships among all the datasets we're working\n", - "with. This is a tree in the mathematical sense, with nodes referencing\n", - "the datasets and edges representing the relationships." + "with. This is a tree roughly in the mathematical sense, with nodes referencing\n", + "the dataset dimensions and edges representing the relationships." ] }, { @@ -355,25 +355,41 @@ "source": [ "The first named dataset we include, `tour`, is by default the root node of this data tree.\n", "We then can define an arbitrary number of other named data nodes. Here, we add `person`, `hh`,\n", - "`odt_skims` and `odt_skims`. Note that these last two are actually two different names for the\n", + "`odt_skims` and `dot_skims`. Note that these last two are actually two different names for the\n", "same underlying dataset, and for each name we will next define a unique set of relationships.\n", + "For each of these other data nodes, we will need to define some way to link each dimension of\n", + "them back to the root node, so that for any position in the root node's arrays, we can find\n", + "one corresponding value in each of the other datasets variables.\n", "\n", "All data nodes in this tree are stored as `Dataset` objects. We can give a pandas DataFrame\n", - "in this contructor instead, but it will be automatically converted into a one-dimension `Dataset`.\n", + "in this constructor instead, but it will be automatically converted into a one-dimension `Dataset`.\n", "The conversion is no-copy if possible (and it is usually possible) so no additional memory is\n", "consumed in the conversion.\n", "\n", "The `relationships` defines links of the data tree. Each relationship maps a particular variable\n", "in a named upstream dataset to a particular dimension of a named downstream dataset. For example,\n", "`\"person.household_id @ hh.HHID\"` tells the tree that the `household_id` variable in the `person` \n", - "dataset contains labels (`@`) that map to the `HHID` dimension of the `hh` dataset.\n", + "dataset contains labels (`@`) that map to the `HHID` dimension of the `hh` dataset. Similarly,\n", + "`\"tour.PERID @ person.PERID\"` tells the tree that the `PERID` variable in the `tour` dataset\n", + "contains labels that map to the `PERID` dimension of the `person` dataset. From this, we can\n", + "see that any position in the \"tour\" dataset can be mapped to a position in the \"person\" dataset,\n", + "in a many-to-one manner, and from there to a position in the \"hh\" dataset, also in a many-to-one\n", + "manner. Unlike tours, persons, and households, the `skims` datasets are multi-dimensional, so we need to\n", + "map multiple dimensions. For the `odt_skims` dataset, we map the origin TAZ dimension (`otaz`)\n", + "to the household TAZ (`hh.TAZ`), and the destination TAZ dimension (`dtaz`) to the tour\n", + "destination TAZ (`tour.dest_taz_idx`), and the time period dimension (`time_period`) to the\n", + "tour outbound time period (`tour.out_time_period`). This way, even though the skims dataset\n", + "is multi-dimensional, we can still find one unique position in the skims dataset for each\n", + "position in the tours dataset. The same is done for the `dot_skims` dataset, which actually\n", + "contains the same data as `odt_skims`, but the mapping of the dimensions is different, so a\n", + "different unique position in the skims dataset is found for each position in the tours dataset.\n", "\n", "In addition to mapping by label, we can also map by position, by using the `->` operator in the\n", "relationship string instead of `@`. In the example above, we map the tour destination TAZ's in\n", "this manner, as the `dest_taz_idx` variable in the `tours` dataset contains positional references\n", "instead of labels.\n", "\n", - "A special case for the relationship mapping is available when the source varibable\n", + "A special case for the relationship mapping is available when the source variable\n", "in the upstream dataset is explicitly categorical. In this case, sharrow checks that\n", "the categories exactly match the labels in the referenced downstream dataset dimension,\n", "and that there are no missing categorical values. If they do match and there are no\n",