New Celestia unified runtime database

Post #21by **Lafuente_Astronomy** » 17.04.2019, 07:22

Well, that seems feasible but it also generalizes some of the objects in Celestia by dividing them into those that can do fusion and those that do not. There may be some universal objects that lie in between those lines, and may have characteristics of both, thus, causing some issues that must be dealt with.

Still, I'm not really knowledgeable on those things, and you may refer to the Devs with your suggestions concerning catalogs. They may have a better reply than mine or further ways to improve the catalogs of objects, especially the universal catalog.

Post #22by **pirogronian** » 17.04.2019, 19:25

Thank You for comments.
Now another from me :wink:

From core Celestia simulation point of view, ID is for cross-index only. All program has to assume is it's unique per object. There is no matter, how it was evaluated. All problem lies in coordination of cataloging. Thats I suggested evaluation procedure concerning "origin catalog" rather than parenting astronomical object - because I imagine people would work rather with particular catalog data than with objects from particular astronomical region (however I can be wrong).

So I believe that in fact we are talking about work coordination model.
There are at least two possibile situation:
1) cataloging is done by core Celestia team and coordinated via official forum/github etc.
2) cataloging is distributed among various people from the World, who can't work closely, however they all know official Celestia webpage and repo.
I think it's classical division onto core data distribution and various addons.
After some konsideration I think the second case is much less probable, because cataloging is a work for algorithm, while addons brings rather some extra data for individual objects or groups. So, we may just estimate workflow algorithm for cataloging process. I propose something like:

Code: Select all

get input catalog X
get pool of free IDs
for object in X do:
   if object exists in Celestia cross-index
       update object data
       update cross-indexes, if provided
    else
       set object new ID from given pool
       set object data
       set object cross-indexes, if provided
    endif
endfor

The main problem of this algorithm remains in allocations of free IDs. Due to per-catalog nature of algorithm, I suggest per-catalog allocation. If done sequentially, it can be done even linear.

There is yet an another problem - format of database. I think we just provide several files, which can be loaded on startup according to config file or on user demand. Proper order of loading could be left to user or indicating by file names (for example alphabetical).

Post #23by **Lafuente_Astronomy** » 18.04.2019, 06:58

Ahhh. Hopefully that would work. What does onetwothree have to say about that?

Post #24by **pirogronian** » 18.04.2019, 09:38

I dpn't know yet. Hopefully @onetwothree will post his comment here soon :smile:

Post #25by **Lafuente_Astronomy** » 18.04.2019, 09:41

Alright. I'll wait for it

Post #26by **onetwothree** » 18.04.2019, 21:15

Janus wrote:When I was looking at doing that, what I came up with is first having an internal index number that has nothing to do with any external DB at all.
What I mean is to have a separate DB for each catalog {HIP, HD, Gaia, whatever else}.

Pretty obvious solution. Basically that's what we have now but internal index (II) == HIP number if II < 10^6 or II == Tycho if II > 10^9, so only HIP & Tycho are first class catalogues. If we want to support more catalogues II (or Universal Object Identifier as pirogronian calls it) it should be abstracted, i.e. to be a mere pointer to star objects in memory.

With your in-memory db I see a problem that huge memory amounts will be unused because catalogues are different and only small parts overlap, so if we have 2000000 stars from Gaia them 90% of entries of HIP will be empty. So I see this as we should have smaller with the index and basic data and additional structures containing catalogue-specific data. (But this approach may require more memory actually! Testing is required to decide which one is better.)

Another problem is caused by the fact that stars ordered in catalogue A have different ordering in catalogue B, so to convert an arbitrary star name into II we should keep additional memory structures for fast (O(log2(n))) reverse translation. On the other hand, these reverse translation structures will require additional memory, so slower linear (O(n)) search might be better solution.

Added after 12 minutes 48 seconds:

pirogronian wrote:
Code: Select all
if object exists in Celestia cross-index

If you know that an object is in cross-index then it means that it's already loaded

I started to write a proposal how to fix this problem, but have found an alternative solution. I need to think

Post #27by **pirogronian** » 18.04.2019, 21:28

@onetwothree
Isnt cross index array already a O(1) sollution?

onetwothree wrote:If you know that an object is in cross-index then it means that it's already loaded

Yes, but we have to at least add it to cross index for processed catalog, and we may want to update some its data, if just processed catalog have any more accurate.

Post #28by **onetwothree** » 18.04.2019, 21:29

pirogronian wrote:Isnt cross index array already a O(1) sollution?

Correct. But only if we can programmatically convert catalogue names into a set of successive numbers. That's true, for example, for HIP.

Post #29by **Lafuente_Astronomy** » 18.04.2019, 21:35

onetwothree wrote:Another problem is caused by the fact that stars ordered in catalogue A have different ordering in catalogue B, so to convert an arbitrary star name into II we should keep additional memory structures for fast (O(log2(n))) reverse translation. On the other hand, these reverse translation structures will require additional memory, so slower linear (O(n)) search might be better solution.

Perhaps the Universal Catalog could solve some of those problems. But if I remember, it can create new ones as well. For your testing, why don't you give them a universal object identifier like the Celestia Catalog, then put stars in regardless of their memberships in whatever catalog they belong to. Sure it may increase memory in the long run but provided Celestia's graphics is improved to allow greater graphics handling and thus, not lag, that wouldn't be a problem at all. I don't mind Celestia becoming a few GB bigger. I mean, that's what Celestia Origin basically is, right?

Janus · Post #30by **Janus** » 19.04.2019, 01:08

I believe I was unclear on something.
Put each catalog into its own DB in memory.
There is no reason at all that any two catalogs have to be the same size.
That is what the indexes are for, this also means you can have different numbers of stars available.

HIP::Star->Name[SOL] could be us
HD::Star->Name[SOL] could also be us
Gaia::Star->Name[SOL] could still be us
Celestia::Star->Name[SOL] could then point to any of them.

Searches are not an issue since cross catalog indexes are created ahead of time.
Somewhere there is a list of stars and all their names.
For instance, Theta Cassiopeiae is also Marfak, ? Ret, 33 Cassiopeiae, BD+54° 236, HD 6961, HIP 5542, HR 343, SAO 22070, for instance.
Start with a Celestia master index, an example here using 1000 at random.

CMI[1000]->HIP = 5542
CMI[1000]->HD = 6961
CMI[1000]->SAO = 22070

You then create indexes of CMI based on each catalog.

CMIDEX->HIP[5542] = 1000;
CMIDEX->HD[6961] = 1000;
CMIDEX->SAO[22070] = 1000;

If you want to change which catalog information is displayed for a star, easy.
Pick a keyboard key that is not already used.
Each time it is hit, you change the catalog info displayed.
After cycling through all active catalogs, have it ask which you want to see the same way you search now by hitting enter.

The same basic idea can then be applied to bodies so there can be multiple catalogs of asteroids, comets, whatever else.

The hard work is all done outside celestia.
It is a display system, let it stick to displaying.

One byproduct of this is having a variable number of stars.
With proper indexing, you could show cataloged stars by decade or century of documentation.
Just musing.

Janus.

Post #31by **Lafuente_Astronomy** » 19.04.2019, 03:16

Janus wrote:I believe I was unclear on something.
Put each catalog into its own DB in memory.
There is no reason at all that any two catalogs have to be the same size.
That is what the indexes are for, this also means you can have different numbers of stars available.

HIP::Star->Name[SOL] could be us
HD::Star->Name[SOL] could also be us
Gaia::Star->Name[SOL] could still be us
Celestia::Star->Name[SOL] could then point to any of them.

Searches are not an issue since cross catalog indexes are created ahead of time.
Somewhere there is a list of stars and all their names.
For instance, Theta Cassiopeiae is also Marfak, ? Ret, 33 Cassiopeiae, BD+54° 236, HD 6961, HIP 5542, HR 343, SAO 22070, for instance.
Start with a Celestia master index, an example here using 1000 at random.

CMI[1000]->HIP = 5542
CMI[1000]->HD = 6961
CMI[1000]->SAO = 22070

You then create indexes of CMI based on each catalog.

CMIDEX->HIP[5542] = 1000;
CMIDEX->HD[6961] = 1000;
CMIDEX->SAO[22070] = 1000;

If you want to change which catalog information is displayed for a star, easy.
Pick a keyboard key that is not already used.
Each time it is hit, you change the catalog info displayed.
After cycling through all active catalogs, have it ask which you want to see the same way you search now by hitting enter.

The same basic idea can then be applied to bodies so there can be multiple catalogs of asteroids, comets, whatever else.

The hard work is all done outside celestia.
It is a display system, let it stick to displaying.

One byproduct of this is having a variable number of stars.
With proper indexing, you could show cataloged stars by decade or century of documentation.
Just musing.

Janus.

That actually is a good idea. In the Celestia Catalog, an ID for a single star must be constant regardless of its membership in other catalogs. For reference, we'll identify SOL as Celestia Catalog Number 1, or if we would have a subdivision of the Celestia Catalog for stars in the Milky Way, C1 1, with the first 1 representing galaxy #1 in the Celestia Catalog, and that number obviously belongs to our home galaxy, the Milky Way

Perhaps this can work, and hopefully the devs could agree with this, or if not, find a better alternative

Added after 5 minutes 17 seconds:
Speaking of C1, it would also be great to have a Celestia Catalog for galaxies and other star-containing bodies outside of galaxies. I know there are catalogs for galaxies and other similar objects, like the SDSS Catalog and others, and as such, there should be a universal object identifiers for those galaxies and other intergalactic objects as well. I suggest that if ever we'll make that catalog, it should be based on distance to us, regardless of their positions in the sky. So, C1, the very first identification in the Celestia Catalog for Galaxies should be none other than the Milky Way, then number 2 would be any galaxy nearest to the Milky Way, and so on and so forth. I do think that can be feasible unless the programs of Celestia suggest otherwise. However, I do believe that they can be changed because improvements can happen

Post #32by **pirogronian** » 19.04.2019, 06:39

Comment by @Janus reflects actual mechanism of cross indexes, both in legacy code and in my branch. Moreover I believe it can be expanded to non-stellar objects as well, because cross index have no need to know, what type of object is indexing. Grouping by distance can be done at runtime or by already implemented User Categories. Objects in particular galaxy can be easily grouped as orbiting its galactic barycenter or by adding to galactic "planetary system" equivalent.
Then, of course, option to change attributes by group is very welcome.

Post #33by **Lafuente_Astronomy** » 19.04.2019, 07:16

pirogronian wrote:Objects in particular galaxy can be easily grouped as orbiting its galactic barycenter or by adding to galactic "planetary system" equivalent.

If galaxies have barycenters (Though I know they move, I'm not sure if they have an orbit or if they move in an orbit at all), then we can classify those barycenters as the names of the groups the're in, i.e Local Group Galactic Barycenter, Maffel 1 Galactic Barycenter, etc. In relation to the galactic barycenters, we can create a singular galactic barycenter for all objects in the galaxy, to orbit, and another barycenter for Globular Clusters to orbit, as they orbit the galaxy differently from the majority of the objects in the galaxy, though it may be possible to have only 1 barycenter for all those objects. And depending on the informations on the galaxy's actual mass of gravity or black hole, we can place those barycenters in the galactic centers or at least in the greatest concentration of gravity in those galaxies and galactic systems.

pirogronian wrote:Moreover I believe it can be expanded to non-stellar objects as well, because cross index have no need to know, what type of object is indexing.

Well, I think you can create a Celestia Catalog for non-stellar and non-galactic objects, which would cover all nebulas, interstellar clouds, clusters and the like. The first number that should be identified is entirely up to you, as there are several of those objects that are close to earth that we may not know where to start.

Post #34by **onetwothree** » 19.04.2019, 09:38

Spoiler: pirogronian wrote:
Code: Select all
get input catalog X get pool of free IDs for object in X do: if object exists in Celestia cross-index update object data update cross-indexes, if provided else set object new ID from given pool set object data set object cross-indexes, if provided endif endfor

That's basically correct, but I'd prefer to have this done not in runtime but in buildtime so end users have just a stars.dat ready to use.

Post #35by **Lafuente_Astronomy** » 19.04.2019, 10:56

onetwothree wrote:That's basically correct, but I'd prefer to have this done not in runtime but in buildtime so end users have just a stars.dat ready to use.

That's actually the preferable option, as it means that it would not take the program of Celestia to run it and thus, it can focus on other things to run. A pre-prepared data pack would really be much more useful indeed.

But then, that raises a question, for me at least: Does that mean you developers have to write down all the catalog identities of the stars in that single star.dat file or can it be distributed among several star.dat files, like for example, star.dat file for HIP, star.dat file for GAIA, star.dat file for Tycho, etc?

Post #36by **onetwothree** » 19.04.2019, 11:59

I'd prefer a single file.

Post #37by **Lafuente_Astronomy** » 19.04.2019, 13:13

Alright. That would work. Perhaps the same can be done for the dat. files of all other objects in Celestia, especially galaxies and other non-stellar and non-galactic objects

Janus · Post #38by **Janus** » 19.04.2019, 14:11

The easiest way I see to handle the indexes is a csv file, or if you like suffering and misery, an xml.
Whcih is then turned into a binary cross index file.

You start with an external file that gives numbers to all the different catalogs.
Most likely a simple text file.
It does a simple bit of catalog to numbers.

Code: Select all

enum
   {
   CMI = 0,
   HIP = 1,
   HD,
   Tycho,
   Gaia,
   Asteroids1,
   Asteroids2,
   Comets1,
   Comets2,
   DSO1,
   DSO2,
   whateverelse;
   } Catalogs.
   
enum
   {
   Star = 1;
   Planet = 2,
   Asteroid,
   Comet,
   whatever;
   } crosstype;

Then you the csv file.
I use csv because spreadsheets handle them very well regardless of the OS, and tools to read them are simple regardless of the language.
The columns are catalog1, entry1, catalog2, entry2, notes.
Follow catalog by catalog# if you want to add a sanity check to parsing.

The tool will then create a CATCROSS file.
The byte after the header is the block size.
{The total file size is a multiple of this number.}
{By the time the block size is greater then $FF, this will be rewritten.}
The next byte is version.
Perhaps a uint64_t Number of entries in this file.
Then $00 bytes {filler/align??} until new uses are found for them.

The actual cross references are as follows.

Code: Select all

byte    : crosstype;
uint16 : catalog1;
uint64 : catalog1entry
uint16 : catalog2;
uint64 : catalog2entry
byte    : checkbyte;

This particular one gives 22 byte blocks.
Crosstype is a check that the catalogs match.
checkbyte is the zero sum all the bytes in the block.
The zero sum is done by reading the block into a byte array, then add all the bytes in it.
The sum & $FF should be zero if the entry is valid.
This provides a simple check of both the entry type, and values.
It also provides a uniform size of read block for embedded systems {Android} where that will make a difference.
The read block size can be padded out from 22 to 32 if the underlying IO of intended targets would benefit.

This puts all the cross references in one place.
In a human readable format if needed.

Just some thoughts.

Janus.

Post #39by **Lafuente_Astronomy** » 19.04.2019, 21:25

That looks interesting. Though I'm not a programmer, as I had little experience programming myself, only doing very small changes in my asterisms.dat and starnames.dat files, the program you mentioned could be a great way to uniformly organize all objects into a single catalog, while retaining their identities in the multiple catalogs they are members of.

That being said, is it also possible to make entirely new catalogs for the subdivision of the universal catalog? Like say for example, we divide the Celestia Catalog into subdivisions based on galaxies. And identify those subdivisions by a number, so that Celestia Catalog 1 refers to all the stars in one galaxy, Celestia Catalog 2 refers to all the stars in another galaxy, and so on. Is that possible or not?

Post #40by **onetwothree** » 19.04.2019, 21:53

Everything is possible

But such change requires much more work.