Data morphisms in Catecon contain information that can be used in a composite. For example, f(2) = 6. The domain for a data morphism is generally ℕ, but the codomain may have a complex form such as (𝔽×𝔽)×(ℤ×ℤ)×(ℕ×ℕ).
The old data morphisms in Catecon were simply a mapping from an index to a value which was good for testing. But this can take a lot of memory. Now contiguous, random, and url ranges are included for a more compact representation.
A single data morphism can have as many data values as memory allows, followed by a sequence of ranges. To evaluate a data morphism at some index, the data values are first consulted. If there is a value for specified index it is returned. If not, the ranges are searched in sequence for one that contains the specified index.
Contiguous Range
A contiguous range, denoted more simply as range in Catecon for space consideration, is given by a starting index, a count for the number of succeeding indices, and a start value to increment for each index.
Random Range
A random range also has a starting index and a count, but also a min and max for the interval in which to generate a random number. Each time you compose with a data morphism containing a random range you get different random numbers. Compose with an identity map to have “static” random numbers, but at the cost of storage.
URL Range
A url range is one obtained from downloading a file. If the file is JSON, it needs to be an array, and each entry in the array needs to look like the data morphism’s codomain. There’s no validation that the data confirms to the codomain until you start evaluating the data. A url range has a start index and the count is determined from the length of the downloaded array. When you enter the URL and create the range, Catecon attempts to download the info and attach it to the range. When your diagram is saved, the URL data is removed. When your diagram is loaded, the data is downloaded. In this manner you do not save the downloaded data in your diagram when it is saved or uploaded.
CSV Files
Much data is saved as comma separated values, aka .csv, and often the separators are instead tabs. For example the GAIA data set is saved as .csv.gz files here.
The first file has the following first ten columns for the first data row:
solution_id | source_id | random_index | ref_epoch | ra | ra_error | dec | dec_error | parallax | parallax_error |
---|---|---|---|---|---|---|---|---|---|
1635378410781930000 | 65408 | 973786105 | 2015 | 44.99615 | 14.37993 | 0.005616 | 6.517028 |
This then appears to be a data morphism from ℕ to ℕ×ℕ×ℕ×ℤ×𝔽×𝔽×𝔽×𝔽×𝔽×𝔽 where the ℤ is for the reference epoch (watch out for year zero). Some columns have no data. These generate nulls which could lead to unexpected behavior depending on what your diagram is expecting.