Catalogues and consistency

ajtribick · Post #1by **ajtribick** » 01.04.2008, 15:51

I've been involved in a discussion with Fridger about the binary stars catalogues and various inconsistencies that appear to have sprung up, and it seems there is no coherent policy on catalogue references at present, and it was suggested that Celestia move to be consistent with SIMBAD.

----

The place where inconsistency occurs is the Struve catalogue. Grant Hutchison's hand-edited files render these as "Struve ###". Fridger uses "SIG ###", which Celestia renders as "? ###". This is obviously inconsistent; furthermore, when Struve designations are rendered with the sigma, the convention appears to be using uppercase sigma, i.e. "? ###".

SIMBAD does not use (or accept) any of these formats. SIMBAD displays results from this catalogue as "** STF ###", with the note that the "**" is only used in SIMBAD and should be dropped. So SIMBAD appears to be suggesting these stars should be labelled as "STF ###".

----

Moving on to the Gliese catalogues and its various extensions. There are various ranges which use different names. Numbers below 1000 are from the second edition of the Catalogue of Nearby Stars (Gliese, 1969), numbers in the range 1000-3000 are from Nearby Star Data Published 1969-1978 (Gliese and Jahrei?, 1979), numbers in the range 3000-4000 are from the Preliminary Version of the Third Catalogue of Nearby Stars (Gliese and Jahrei?, 1991) and values in the range 9000+ are from the Extension of the Gliese Catalogue (Woolley, Epps, Penton and Pocock, 1970). This has led to a variety of acronyms: Gl for the <1000 range, GJ for the 1000-3000 range, NN for the 3000-4000 range and Wo for the 9000+ range.

SIMBAD renders all references as "GJ ###" (this works because none of the numbers conflict), however the note specifies the alternative forms. It also states that NN and Wo should be avoided and GJ should be used instead. At present the datafiles use "Gliese ###" for stars <1000, and "GJ ###" for stars >1000, which is inconsistent with SIMBAD but not contrary to the recommendations listed there.

----

A further question arises is whether we want to support (or display) the difference between HD and HDE: as the note at SIMBAD explains, stars with numbers 1-225300 are "HD ###", while stars with numbers 225301-359083 are "HDE ###". SIMBAD uses and accepts either form interchangeably, but only displays the "HD" prefix.

----

From a parser point of view, it would be good to be able to handle the various alternative forms without requiring the user to enter the right abbreviation for the right range, but from the display point of view some consistent plan would be good. Any thoughts?

Post #2by **t00fri** » 01.04.2008, 17:39

Good English summary ;-)

of what I have previously pointed out on various occasions.

One reason of why I have been reluctant recently to do further work with binary star orbits, is precisely the fact that with the present interfering definitions, this work has become trapped in one way or another.

We should really set out clear data base and naming specs to make sensible progress!

When expanding the list of visual and spectroscopic orbits, we will have more name collisions in the future, since Grant's data base criteria of implementing stars are intrinsically different from mine and basically non-compatible. Moreover, his nearstars file is a (personal?), hand-edited collection from several catalogs, as far as I can tell. My two binary orbit files are based entirely on two distinctive physical measuring technologies (namely visual and spectroscopic multiple stars), and can be arbitraily expanded, without changing the underlying criteria.
So far, my orbit data are based on two renowned, refereed scientific publications that everyone can check out! This is a completely transparent, scientifically valid and entirely documented situation.

I think Grant's list (stars < 20 ly distance) could easily be selected by a celx script or via Qt4 tools from a more general physically motivated database.

In any case I am not ready to cross out more of my visualbins or spectbins entries, merely because there is another collision with Grant's nearstars. I crossed out plenty already, just to avoid confrontation...
But this is not a good scientific policy at all. For outside (educated) users, there is hardly any scientific reason why more than a dozen entries of these publications have been left out by me ...

Now we have this issue with Grant's "Struve" designation that is not an identifier in Simbad, for example. I chose the nomenclature that the author of the spectroscopic binary paper chose (Poubaix).
Next comes Grant's 'Gliese' naming that is neither recognized in Simbad... Do you really want me to rename any Gliese entries this way?

++++++++++++++++++++++++
In view of this nightmare, I am strongly advocating to agree on a Simbad conform naming syntax in the future!
++++++++++++++++++++++++

F.

Post #3by **t00fri** » 01.04.2008, 17:52

Along the same lines, and emerging in PM's with Andrew, even Chris' latest syntax patch for binaries is far less than perfect.

Once I can start from clearcut physical definitions of what objects with which naming patterns should appear in the various data bases, I would simply read in the 'starnames.dat' dat in my .stc generating PERL scripts and add all the alternative names that are listed there for the barycenter. That is an almost trivial exercise.

Anyway, right now there are still various "anomalies" in Chris' latest code.

In general, there are cases where component A and B have different HIP numbers. That case does not seem to be accounted for in the new syntax.

Moreover, I find it a bit odd that the barycenter in HIP notation HIP xxxx is NOT offered in the object preview below the command line! Only Hip xxxx A and Hip xxxx B. If the star has a special name instead, e.g. 13 Cet, then all three are offered:

13 Cet
13 Cet A
13 Cet B

Also following Chris advocated new notation, "HIP 677 = ALF And" does not have any HIP 677 designations offered below the command line. No idea why this happens. Neither HIP 677 A nor anything similar.

That should NOT happen...

I have generated new visualbins.stc and specbins.stc, using the notation recently advocated by Chris, for Andrew to play with. If more people want to test these out more thoroughly, just let me know.

F.

ajtribick · Post #4by **ajtribick** » 01.04.2008, 18:11

t00fri wrote:When expanding the list of visual and spectroscopic orbits, we will have more name collisions in the future, since Grant's data base criteria of implementing stars are intrinsically different from mine and basically non-compatible. Moreover, his nearstars file is a (personal?), hand-edited collection from several catalogs, as far as I can tell. My two binary orbit files are based entirely on two distinctive physical measuring technologies (namely visual and spectroscopic multiple stars), and can be arbitraily expanded, without changing the underlying criteria.
So far, my orbit data are based on two renowned, refereed scientific publications that everyone can check out! This is a completely transparent, scientifically valid entirely documented situation.

As for the hand-edited versus catalogue debate, it's unfortunate that the present parser forces us into a situation where the two approaches necessarily cause conflicts: both approaches have advantages and disadvantages. At present there is no clean way to replace a multiple star system, and there is no way for Celestia to select different solutions.

Take the case of Alpha Aurigae (Capella), from this thread. Pourbaix's catalogue gives one set of elements which omits certain parameters, while the study by Hummel gives a different view. Unfortunately there is no clean way to override one set of orbital elements or switch between the two. Some way of associating orbital (or other) parameters with their source would be good.

t00fri wrote:Moreover, I find it a bit odd that the barycenter in HIP notation HIP xxxx is NOT offered in the object preview below the command line! Only Hip xxxx A and Hip xxxx B. If the star has a special name instead, e.g. 13 Cet, then all three are offered:

13 Cet
13 Cet A
13 Cet B

Also following Chris advocated new notation, HIP 677 = ALF And does not have any HIP 677 designations offered below the command line. No idea why this happens. Neither HIP 677 A nor anything similar.

That should NOT happen...

What is needed here is some way of telling Celestia "this is the A/B/Aa/etc. component of the system" and applying the names and catalogue numbers automatically. Such a system would have to be robust enough to deal with cases such as Alpha Centauri, where the two components have different HIP numbers, or YY Geminorum, where YY Gem A=Castor Ca, YY Gem B=Castor Cb.

Post #5by **t00fri** » 01.04.2008, 18:23

ajtribick wrote:...
As for the hand-edited versus catalogue debate, it's unfortunate that the present parser forces us into a situation where the two approaches necessarily cause conflicts: both approaches have advantages and disadvantages. At present there is no clean way to replace a multiple star system, and there is no way for Celestia to select different solutions.

Take the case of Alpha Aurigae (Capella), from this thread. Pourbaix's catalogue gives one set of elements which omits certain parameters, while the study by Hummel gives a different view. Unfortunately there is no clean way to override one set of orbital elements or switch between the two. Some way of associating orbital (or other) parameters with their source would be good.

Completely agreed. Grant and I also discussed various pecularities/difficulties already for the case of multiple systems. No clean solution so far, however.

But anyhow...

this issue (Hummel vs. Poubaix) inevitably leads us to the fact that in visualizations by Celestia there is so far NO ROOM for displaying measuring uncertainties! Another topic that I frequently rose up without much resonance... The situation mentioned by Andrew is perfectly normal and easy to handle in a normal scientific framework.

It's just hard if visualization is at stake!

In a normal scientific environment, one must sort out only whether certain discrepancies are systematic or statistical in nature. Then it is straightforward how to account for differing measurements within the uncertainties given.

F.

Post #6by **chris** » 01.04.2008, 20:06

ajtribick wrote:I've been involved in a discussion with Fridger about the binary stars catalogues and various inconsistencies that appear to have sprung up, and it seems there is no coherent policy on catalogue references at present, and it was suggested that Celestia move to be consistent with SIMBAD.

----

The place where inconsistency occurs is the Struve catalogue. Grant Hutchison's hand-edited files render these as "Struve ###". Fridger uses "SIG ###", which Celestia renders as "? ###". This is obviously inconsistent; furthermore, when Struve designations are rendered with the sigma, the convention appears to be using uppercase sigma, i.e. "? ###".

SIMBAD does not use (or accept) any of these formats. SIMBAD displays results from this catalogue as "** STF ###", with the note that the "**" is only used in SIMBAD and should be dropped. So SIMBAD appears to be suggesting these stars should be labelled as "STF ###".

----

Moving on to the Gliese catalogues and its various extensions. There are various ranges which use different names. Numbers below 1000 are from the second edition of the Catalogue of Nearby Stars (Gliese, 1969), numbers in the range 1000-3000 are from Nearby Star Data Published 1969-1978 (Gliese and Jahrei?, 1979), numbers in the range 3000-4000 are from the Preliminary Version of the Third Catalogue of Nearby Stars (Gliese and Jahrei?, 1991) and values in the range 9000+ are from the Extension of the Gliese Catalogue (Woolley, Epps, Penton and Pocock, 1970). This has led to a variety of acronyms: Gl for the <1000 range, GJ for the 1000-3000 range, NN for the 3000-4000 range and Wo for the 9000+ range.

SIMBAD renders all references as "GJ ###" (this works because none of the numbers conflict), however the note specifies the alternative forms. It also states that NN and Wo should be avoided and GJ should be used instead. At present the datafiles use "Gliese ###" for stars <1000, and "GJ ###" for stars >1000, which is inconsistent with SIMBAD but not contrary to the recommendations listed there.

----

A further question arises is whether we want to support (or display) the difference between HD and HDE: as the note at SIMBAD explains, stars with numbers 1-225300 are "HD ###", while stars with numbers 225301-359083 are "HDE ###". SIMBAD uses and accepts either form interchangeably, but only displays the "HD" prefix.

----

From a parser point of view, it would be good to be able to handle the various alternative forms without requiring the user to enter the right abbreviation for the right range, but from the display point of view some consistent plan would be good. Any thoughts?

Conforming to the same conventions used by SIMBAD seems like a good idea to me, too. We should make liberal in what designations are accepted and strict about what is displayed. From your post, it sounds like the most appropriate strings to display are:

Struve: STF
Gliese: GJ
Henry Draper: HD

We could expand the name canonicalization function to translate Struve to STF, Gliese and various acronyms to GJ, and HDE to HD. Seems like this would mostly address the inconsistencies without breaking existing catalog files. The next step would be to modify the existing catalog files to use the standard abbreviations--it wouldn't affect Celestia, but it would make these files slightly more readable.

--Chris

Post #7by **t00fri** » 01.04.2008, 20:16

chris wrote:...
Conforming to the same conventions used by SIMBAD seems like a good idea to me, too. We should make liberal in what designations are accepted and strict about what is displayed. From your post, it sounds like the most appropriate strings to display are:

Struve: STF
Gliese: GJ
Henry Draper: HD

We could expand the name canonicalization function to translate Struve to STF, Gliese and various acronyms to GJ, and HDE to HD. Seems like this would mostly address the inconsistencies without breaking existing catalog files. The next step would be to modify the existing catalog files to use the standard abbreviations--it wouldn't affect Celestia, but it would make these files slightly more readable.

--Chris

From the PERL point of view it's trivial to change. Let's see what Grant has to say? I could as easily read out all identifiers of a certain HIP system by batch script from Simbad and implement the results into my PERL scripts. Once we agree on a procedure, it's realization is NO problem whatsoever from my side.

The next much harder point is how to avoid future clashes if I am to include more visual and sprectroscopic systems in my two data base files... I also don't like to cross out parts of published data sets because of such incompatibilities ...

So before being able to proceed constructivelty, we MUST talk about a more promising definition of multiple stare data bases and their syntax!

The other major part of the above posts concerns the mentioned anomalies in your recent patch....

F.

ajtribick · Post #8by **ajtribick** » 01.04.2008, 20:23

chris wrote:Conforming to the same conventions used by SIMBAD seems like a good idea to me, too. We should make liberal in what designations are accepted and strict about what is displayed. From your post, it sounds like the most appropriate strings to display are:

Struve: STF
Gliese: GJ
Henry Draper: HD

We could expand the name canonicalization function to translate Struve to STF, Gliese and various acronyms to GJ, and HDE to HD. Seems like this would mostly address the inconsistencies without breaking existing catalog files. The next step would be to modify the existing catalog files to use the standard abbreviations--it wouldn't affect Celestia, but it would make these files slightly more readable.

--Chris

Would it be possible to move some aspects of catalogue names and cross-indices to script files? I'm not particularly sure how suitable Lua would be (perhaps the lack of regexp support might make this more difficult), but it might be worth considering, particularly if it would enable more cross-indices to be supported.

As for the binary stars issue, perhaps it might be the point to try and figure out a new system for the datafiles/parser to replace the current system: getting rid of such annoying quirks as the different RA conventions (which will no doubt become more of an issue since the 16kly limit is smashed).

Post #9by **chris** » 01.04.2008, 20:32

t00fri wrote:
chris wrote:...
Conforming to the same conventions used by SIMBAD seems like a good idea to me, too. We should make liberal in what designations are accepted and strict about what is displayed. From your post, it sounds like the most appropriate strings to display are:

Struve: STF
Gliese: GJ
Henry Draper: HD

We could expand the name canonicalization function to translate Struve to STF, Gliese and various acronyms to GJ, and HDE to HD. Seems like this would mostly address the inconsistencies without breaking existing catalog files. The next step would be to modify the existing catalog files to use the standard abbreviations--it wouldn't affect Celestia, but it would make these files slightly more readable.

--Chris

From the PERL point of view it's trivial to change. Let's see what Grant has to say? I could as easily read out all identifiers of a certain HIP system by batch script from Simbad and implement the results into my PERL scripts. Once we agree on a procedure, it's realization is NO problem whatsoever from my side.

The next much harder point is how to avoid future clashes if I am to include more visual and sprectroscopic systems in my two data base files... I also don't like to cross out parts of published data sets because of such incompatibilities ...

So before being able to proceed constructivelty, we MUST talk about a more promising definition of multiple stare data bases and their syntax!

The other major part of the above posts concerns the mentioned anomalies in your recent patch....

What's necessary is a way to unambiguously identify stars and replace previous definitions so that catalogs can be easily overlayed. It should be possible change the 'priority' of various catalogs simply by changing their loading order.

As for tab-completion of star names, nothing has changed: plain catalog numbers are never shown in the list. But something like "HD xxxx B" is not a catalog number: Celestia treats it as just a name string, so it does appear in the completion list. Coming up with a more consistent identification scheme for stars with some sort of explicit component indicator would take care of the problem by preventing HD xxxx B from appearing in the completion list (perhaps not what you had in mind.)

--Chris

Post #10by **t00fri** » 01.04.2008, 20:36

ajtribick wrote:Would it be possible to move some aspects of catalogue names and cross-indices to script files? I'm not particularly sure how suitable Lua would be (perhaps the lack of regexp support might make this more difficult), but it might be worth considering, particularly if it would enable more cross-indices to be supported.

As for the binary stars issue, perhaps it might be the point to try and figure out a new system for the datafiles/parser to replace the current system: getting rid of such annoying quirks as the different RA conventions (which will no doubt become more of an issue since the 16kly limit is smashed).

But that's a PERFECT job for PERL, which exists in all OS and is easy to learn. Why lua? That's not a data base language in the first place. PERL can not only calculate extremely well, it is the "queen of regular expressions" and has innumerable modules to do specialized jobs.
Moreover PERL is almost human readable and thus fulfils the allimportant task of documentation of our various data sets.

Certainly, we should unify the syntax in .dsc and .stc files. It's an impossible point that .dsc wants RA in hours while .stc wants it in degrees ;-)

. But this is really a simple change.

F.

Post #11by **t00fri** » 01.04.2008, 20:43

chris wrote:What's necessary is a way to unambiguously identify stars and replace previous definitions so that catalogs can be easily overlayed. It should be possible change the 'priority' of various catalogs simply by changing their loading order. Of course the catalog ordering is an important option that I have always exploited.

As for tab-completion of star names, nothing has changed: plain catalog numbers are never shown in the list. But something like "HD xxxx B" is not a catalog number: Celestia treats it as just a name string, so it does appear in the completion list. Coming up with a more consistent identification scheme for stars with some sort of explicit component indicator would take care of the problem by preventing HD xxxx B from appearing in the completion list (perhaps not what you had in mind.)

--Chris

Since I have overlaid MANY catalogs in the past via PERL, in all cases the only universal reference that one finds for stars everywhee is the HIP number. I used it as the bridging identifier in practically all cases.

But the rest of your remark I don't understand, really.

Fact is that in some cases the completion list shows 3 entries per system
(barycenter and components A and B) while sometimes it does not show the barycenter. I could not make out what your above comment was wrto this 'anomaly'. Sorry for my bad English ;-)

F.

Post #12by **chris** » 01.04.2008, 20:56

ajtribick wrote:
chris wrote:Conforming to the same conventions used by SIMBAD seems like a good idea to me, too. We should make liberal in what designations are accepted and strict about what is displayed. From your post, it sounds like the most appropriate strings to display are:

Struve: STF
Gliese: GJ
Henry Draper: HD

We could expand the name canonicalization function to translate Struve to STF, Gliese and various acronyms to GJ, and HDE to HD. Seems like this would mostly address the inconsistencies without breaking existing catalog files. The next step would be to modify the existing catalog files to use the standard abbreviations--it wouldn't affect Celestia, but it would make these files slightly more readable.

--Chris
Would it be possible to move some aspects of catalogue names and cross-indices to script files? I'm not particularly sure how suitable Lua would be (perhaps the lack of regexp support might make this more difficult), but it might be worth considering, particularly if it would enable more cross-indices to be supported.

I'd rather have all issues related to catalog names and cross-indices sorted out in the core code rather than delegating it to script.

As for the binary stars issue, perhaps it might be the point to try and figure out a new system for the datafiles/parser to replace the current system: getting rid of such annoying quirks as the different RA conventions (which will no doubt become more of an issue since the 16kly limit is smashed).

The different RA convention is indeed annoying, but this can be addressed without completely replacing the parsers and data file formats. If we're going to take such a drastic and time consuming step as introducing new data file formats, there has to be a more compelling reason and a solid proposal for a better format. I'd recommend starting a new thread for that.

--Chris

ajtribick · Post #13by **ajtribick** » 02.04.2008, 13:48

It may also be worth figuring out how to resolve the potential conflict between Bayer and variable star designations: the combinations "MU" and "NU" are both valid in variable star designations as well as being Latinisations of the Greek letters ? and ?. We don't have to worry about the other two-letter Latinisations since "XI" and "PI" are not valid in variable star designations. SIMBAD distinguishes the two using "*" and "V*" as prefixes, but these prefixes are SIMBAD-only and not used elsewhere.

Post #14by **granthutchison** » 02.04.2008, 15:33

I'm happy for any and all of "my" stc files to be edited to reflect any convention that is generally adopted.

Grant

Celestia Forums

Catalogues and consistency

Catalogues and consistency

Re: Catalogues and consistency

Re: Catalogues and consistency

Re: Catalogues and consistency

Re: Catalogues and consistency

Re: Catalogues and consistency

Re: Catalogues and consistency

Re: Catalogues and consistency

Re: Catalogues and consistency

Re: Catalogues and consistency

Re: Catalogues and consistency

Re: Catalogues and consistency

Re: Catalogues and consistency

Re: Catalogues and consistency