Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 22 additions & 6 deletions docs/walkthrough/one-dim.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -305,8 +305,8 @@
"\n",
"Then, it's time to prepare our data. We'll create a `DataTree`\n",
"that defines the relationships among all the datasets we're working\n",
"with. This is a tree in the mathematical sense, with nodes referencing\n",
"the datasets and edges representing the relationships."
"with. This is a tree roughly in the mathematical sense, with nodes referencing\n",
"the dataset dimensions and edges representing the relationships."
]
},
{
Expand Down Expand Up @@ -355,25 +355,41 @@
"source": [
"The first named dataset we include, `tour`, is by default the root node of this data tree.\n",
"We then can define an arbitrary number of other named data nodes. Here, we add `person`, `hh`,\n",
"`odt_skims` and `odt_skims`. Note that these last two are actually two different names for the\n",
"`odt_skims` and `dot_skims`. Note that these last two are actually two different names for the\n",
"same underlying dataset, and for each name we will next define a unique set of relationships.\n",
"For each of these other data nodes, we will need to define some way to link each dimension of\n",
"them back to the root node, so that for any position in the root node's arrays, we can find\n",
"one corresponding value in each of the other datasets variables.\n",
"\n",
"All data nodes in this tree are stored as `Dataset` objects. We can give a pandas DataFrame\n",
"in this contructor instead, but it will be automatically converted into a one-dimension `Dataset`.\n",
"in this constructor instead, but it will be automatically converted into a one-dimension `Dataset`.\n",
"The conversion is no-copy if possible (and it is usually possible) so no additional memory is\n",
"consumed in the conversion.\n",
"\n",
"The `relationships` defines links of the data tree. Each relationship maps a particular variable\n",
"in a named upstream dataset to a particular dimension of a named downstream dataset. For example,\n",
"`\"person.household_id @ hh.HHID\"` tells the tree that the `household_id` variable in the `person` \n",
"dataset contains labels (`@`) that map to the `HHID` dimension of the `hh` dataset.\n",
"dataset contains labels (`@`) that map to the `HHID` dimension of the `hh` dataset. Similarly,\n",
"`\"tour.PERID @ person.PERID\"` tells the tree that the `PERID` variable in the `tour` dataset\n",
"contains labels that map to the `PERID` dimension of the `person` dataset. From this, we can\n",
"see that any position in the \"tour\" dataset can be mapped to a position in the \"person\" dataset,\n",
"in a many-to-one manner, and from there to a position in the \"hh\" dataset, also in a many-to-one\n",
"manner. Unlike tours, persons, and households, the `skims` datasets are multi-dimensional, so we need to\n",
"map multiple dimensions. For the `odt_skims` dataset, we map the origin TAZ dimension (`otaz`)\n",
"to the household TAZ (`hh.TAZ`), and the destination TAZ dimension (`dtaz`) to the tour\n",
"destination TAZ (`tour.dest_taz_idx`), and the time period dimension (`time_period`) to the\n",
"tour outbound time period (`tour.out_time_period`). This way, even though the skims dataset\n",
"is multi-dimensional, we can still find one unique position in the skims dataset for each\n",
"position in the tours dataset. The same is done for the `dot_skims` dataset, which actually\n",
"contains the same data as `odt_skims`, but the mapping of the dimensions is different, so a\n",
"different unique position in the skims dataset is found for each position in the tours dataset.\n",
"\n",
"In addition to mapping by label, we can also map by position, by using the `->` operator in the\n",
"relationship string instead of `@`. In the example above, we map the tour destination TAZ's in\n",
"this manner, as the `dest_taz_idx` variable in the `tours` dataset contains positional references\n",
"instead of labels.\n",
"\n",
"A special case for the relationship mapping is available when the source varibable\n",
"A special case for the relationship mapping is available when the source variable\n",
"in the upstream dataset is explicitly categorical. In this case, sharrow checks that\n",
"the categories exactly match the labels in the referenced downstream dataset dimension,\n",
"and that there are no missing categorical values. If they do match and there are no\n",
Expand Down
Loading