Hi all,
indeed, this is a completely new dimension. I have discovered this amazing project 2 days ago on the net.
A team of enthusiasts most carefully collecting information about cities/towns on earth, including precise lat-long coordinates, national naming, country, country code, ....and population since 1996!
The result is a monster data base of 50000 entries that appears very, very interesting!
41245 entries are specified with precise coordinates and number of inhabitants and can be directly exploited for Celestia.
The site URL is:
http://www.world-gazetteer.com/home.htm
© by Stefan Helders http://www.world-gazetteer.com
SH is from Germany and the data base is free. He just asks to include the above Copyright statement.
To extract & convert these 41245 cities/towns into Celestia format, just amounted to hacking together a Perl script for 15 mins and then, 1 sec later, the generated *.ssc file was ready for a first test; of course all, with varying importance weights, as calculated from the quoted information about inhabitants.
Otherwise, it would be impossible to handle 40k+ location labels;-).
Based on census data input the Gazetteer authors use a sophisticated theoretical 'growth' formula, to extrapolate the number of inhabitants to the current year. I checked many numbers explicitly and was surprised indeed about the achieved accuracy.
Finally, let me give you a flavour of this new dimension;-). Below is a thumbnail from where I live (you can virtually see my house;-)). You must click on it to appreciate the amazing agreement between the small town locations and the label positions based on my 32k (!!) Nightlights texture in 1600x1200 resolution!
The only "hair in the soup", as we say over here;-) is that Celestia does not support 8bit font display and unicode encoding. The 50000 names are in Latin1 = iso-8859-1 encoding. Unfortunately, so far, Celestia skips all letters that have diacritic signs, Umlaute, etc.....too bad, we got to work on that!
OK, her comes the image:
and in order to illustrate that this incredible precision also holds in remote corners of our globe, here is another image from the himalaya ( don't forget to click on the image!):
and another pe(a)rl chain of towns along the southern Andes. Virtually every light patch carries a label now!
Bye Fridger
A new dimension: 41245 cities & towns on earth!
-
Topic authort00fri
- Developer
- Posts: 8772
- Joined: 29.03.2002
- Age: 22
- With us: 22 years 8 months
- Location: Hamburg, Germany
A new dimension: 41245 cities & towns on earth!
Last edited by t00fri on 11.12.2003, 23:30, edited 3 times in total.
-
- Posts: 1034
- Joined: 16.12.2002
- With us: 21 years 11 months
- Location: People's Republic Of Cork, Ireland
-
Topic authort00fri
- Developer
- Posts: 8772
- Joined: 29.03.2002
- Age: 22
- With us: 22 years 8 months
- Location: Hamburg, Germany
Brendan wrote:WOW I can't wait to see how much labeling there is around my hometown area (the Glens Falls, NY region).
Edited to add question:
Does this new location file include all of the locations that the other Earth citiy files have?
Brendan
Although I did not check the individual entries (yet), the capitals should all be included in big file. However, my original earth.ssc file that was adapted from XEphem, had many observatory locations that may be found nowhere else.
Bye Fridger
Wow!!
... And I thought your previous UN file had a lot of cities in it. Many thanks to Stefan Helders and his group for obtaining all of this information and for making it available publicly and royalty free.
And to you Fridger, for finding the site and perling the data into a format usable by Celestia. This is so cool!
Thanks Fridger! Looking forward to seeing it on my own system.
By the way, one of the entries on the Southern Andes screenshot caught my eye ... "Los ngeles". Should this be "Los Angeles"?
Cheers,
-Don G.
... And I thought your previous UN file had a lot of cities in it. Many thanks to Stefan Helders and his group for obtaining all of this information and for making it available publicly and royalty free.
And to you Fridger, for finding the site and perling the data into a format usable by Celestia. This is so cool!
Thanks Fridger! Looking forward to seeing it on my own system.
By the way, one of the entries on the Southern Andes screenshot caught my eye ... "Los ngeles". Should this be "Los Angeles"?
Cheers,
-Don G.
-
- Developer
- Posts: 1863
- Joined: 21.11.2002
- With us: 22 years
Probably "Los ?ngeles" in the original file, but as Fridger says, Celestia ignores all accented characters at present.don wrote:By the way, one of the entries on the Southern Andes screenshot caught my eye ... "Los ngeles". Should this be "Los Angeles"
We had the same problem with the large IAU location files - I had to strip out a large number of accented characters and replace them with their unaccented equivalents.
Grant
-
Topic authort00fri
- Developer
- Posts: 8772
- Joined: 29.03.2002
- Age: 22
- With us: 22 years 8 months
- Location: Hamburg, Germany
don wrote:Wow!!
... And I thought your previous UN file had a lot of cities in it. Many thanks to Stefan Helders and his group for obtaining all of this information and for making it available publicly and royalty free.
And to you Fridger, for finding the site and perling the data into a format usable by Celestia. This is so cool!
Thanks Fridger! Looking forward to seeing it on my own system.
By the way, one of the entries on the Southern Andes screenshot caught my eye ... "Los ngeles". Should this be "Los Angeles"?
Cheers,
-Don G.
These beer drinking smileys would exceptionally be something for my "german heart":lol:
Bye Fridger
-
Topic authort00fri
- Developer
- Posts: 8772
- Joined: 29.03.2002
- Age: 22
- With us: 22 years 8 months
- Location: Hamburg, Germany
granthutchison wrote:Probably "Los ?ngeles" in the original file, but as Fridger says, Celestia ignores all accented characters at present.don wrote:By the way, one of the entries on the Southern Andes screenshot caught my eye ... "Los ngeles". Should this be "Los Angeles"
We had the same problem with the large IAU location files - I had to strip out a large number of accented characters and replace them with their unaccented equivalents.
Grant
Don,
unfortunately, Grant's explanations are correct. This is also the reason why --so far-- I have hesitated to make the file publicly available. A lot of names are 'castrated' this way. But, if a number of you feel that they can live with this deficiency for the time being, let me know. I shall then upload it to the TextureFoundry.
The *.ssc file with 41245 entries is unpacked 3.75 MB and zipped only 585 Kb.
Let me also emphasize that the original 50k data base is ideally pre-conditioned for the application of Perl, just a TAB-separated table with many entries. So even Perl-beginners or Perl-revivers (Hi Don;-)) could easily extract whatever they are intersted in...Therefore I also did most probably not introduce additional bugs via pattern matching ambiguities etc. The precision of the coordinates is simply great, by the way.
The calculation of the Importance weights is still in progress, but by tomorrow or so, I will have optimized and finalized the weights.
Even with the castrated names, it is great fun to spot your familiar neighboring town etc...
Actually all we would need is 8bit Latin1 (ISO-xxxx-1) encoding support. Full 16bit Unicode encoding is not required for the names I chose for the Celestia file.
Bye Fridger
Thank you for the confirmation Fridger. Would there be any "simple" way to perl a set of replacement ASCII characters?t00fri wrote:unfortunately, Grant's explanations are correct. This is also the reason why --so far-- I have hesitated to make the file publicly available.
t00fri wrote:Let me also emphasize that the original 50k data base is ideally pre-conditioned for the application of Perl, just a TAB-separated table with many entries.
Sure wish I had more time right now <sigh>. Could you e-mail it to me, for perusal and perl playtime at a later date?
-Don G.
PS. Glad you like the beer drinking smileys. German beer is definately GOOD for the heart, as well as pure joy <smile>.
-
- Developer
- Posts: 1863
- Joined: 21.11.2002
- With us: 22 years
Fridger, isn't it easy enough to write a simple program to make appropriate substitutions with the unaccented equivalents, given the relatively small number of accented characters involved? A few years back I wrote an Excel macro to perform a similar task on some GEOnet files - replacing their custom character set with Unicode. It was mildly tedious to create, but no more than that.t00fri wrote:Actually all we would need is 8bit Latin1 (ISO-xxxx-1) encoding support. Full 16bit Unicode encoding is not required for the names I chose for the Celestia file.
Grant
-
Topic authort00fri
- Developer
- Posts: 8772
- Joined: 29.03.2002
- Age: 22
- With us: 22 years 8 months
- Location: Hamburg, Germany
granthutchison wrote:Fridger, isn't it easy enough to write a simple program to make appropriate substitutions with the unaccented equivalents, given the relatively small number of accented characters involved? A few years back I wrote an Excel macro to perform a similar task on some GEOnet files - replacing their custom character set with Unicode. It was mildly tedious to create, but no more than that.t00fri wrote:Actually all we would need is 8bit Latin1 (ISO-xxxx-1) encoding support. Full 16bit Unicode encoding is not required for the names I chose for the Celestia file.
Grant
Well, this would be another easy challenge for Perl, once the transcripting rules are known! This is where I am less selfconfident;-). I read about similar projects/attempts in the WEB, which was rather discouraging. I simply was not aware about the diversity of the rules involved, e.g. that in Swedish, an 'a' with a little circle on top should be replaced by 'aa'. Lot's of rules, indeed. If I had a summary sheet from the net somewhere, I would be most happy to hack such a transcript facility together.
Here is another one that is not quite trivial;-):
Western Sahara's capital: al-?Ay?n -> La'youn
Another problem I encountered was that in Perl the pattern matching with these accented characters did not function very reliably...
Bye Fridger
-
- Developer
- Posts: 1863
- Joined: 21.11.2002
- With us: 22 years
I wasn't talking about precise transliteration, which is, as you say, a complex, language-dependent task. Just the simple stripping of accents, which would at least leave the user with an "undamaged" letter sequence they could easily associate with the more correct form of the name - such simple substitutions are very common in non-specialist text.t00fri wrote:Well, this would be another easy challenge for Perl, once the transcripting rules are known!
Perhaps there would also be a need for a few substitution rules for variant letters like edh, thorn, kra, eszett, etc, if these appear in the dataset.
While far from perfect, this would leave us with something more legible than the current labels, with their absent letters and unpronounceable results.
El Aai?n, Aai?n, Aiun, La?youne, La'youan ... These are not problems of converting accented Latin characters to unaccented versions, but of transliterating Berber-accented Arabic into the Latin alphabet as it is used in English, French and Spanish. We certainly don't want to get involved in that can of worms ...t00fri wrote:Western Sahara's capital: al-?Ay?n -> La'youn
Tut. Surely not a deficiency in Perl?t00fri wrote:Another problem I encountered was that in Perl the pattern matching with these accented characters did not function very reliably...
I must say I haven't encountered this problem when writing character conversion routines in Excel or VB ...
Grant