A new dimension: 41245 cities & towns on earth!

General discussion about Celestia that doesn't fit into other forums.
Avatar
Topic author
t00fri
Developer
Posts: 8772
Joined: 29.03.2002
Age: 22
With us: 22 years 8 months
Location: Hamburg, Germany

A new dimension: 41245 cities & towns on earth!

Post #1by t00fri » 11.12.2003, 20:00

Hi all,

indeed, this is a completely new dimension. I have discovered this amazing project 2 days ago on the net.

A team of enthusiasts most carefully collecting information about cities/towns on earth, including precise lat-long coordinates, national naming, country, country code, ....and population since 1996!

The result is a monster data base of 50000 entries that appears very, very interesting!

41245 entries are specified with precise coordinates and number of inhabitants and can be directly exploited for Celestia.

The site URL is:
http://www.world-gazetteer.com/home.htm

© by Stefan Helders http://www.world-gazetteer.com

SH is from Germany and the data base is free. He just asks to include the above Copyright statement.

To extract & convert these 41245 cities/towns into Celestia format, just amounted to hacking together a Perl script for 15 mins and then, 1 sec later, the generated *.ssc file was ready for a first test; of course all, with varying importance weights, as calculated from the quoted information about inhabitants.

Otherwise, it would be impossible to handle 40k+ location labels;-).

Based on census data input the Gazetteer authors use a sophisticated theoretical 'growth' formula, to extrapolate the number of inhabitants to the current year. I checked many numbers explicitly and was surprised indeed about the achieved accuracy.

Finally, let me give you a flavour of this new dimension;-). Below is a thumbnail from where I live (you can virtually see my house;-)). You must click on it to appreciate the amazing agreement between the small town locations and the label positions based on my 32k (!!) Nightlights texture in 1600x1200 resolution!

The only "hair in the soup", as we say over here;-) is that Celestia does not support 8bit font display and unicode encoding. The 50000 names are in Latin1 = iso-8859-1 encoding. Unfortunately, so far, Celestia skips all letters that have diacritic signs, Umlaute, etc.....too bad, we got to work on that!

OK, her comes the image:

Image

and in order to illustrate that this incredible precision also holds in remote corners of our globe, here is another image from the himalaya ( don't forget to click on the image!):

Image

and another pe(a)rl chain of towns along the southern Andes. Virtually every light patch carries a label now!
Image

Bye Fridger
Last edited by t00fri on 11.12.2003, 23:30, edited 3 times in total.

JackHiggins
Posts: 1034
Joined: 16.12.2002
With us: 21 years 11 months
Location: People's Republic Of Cork, Ireland

Post #2by JackHiggins » 11.12.2003, 22:39

Whoa!! :D (And I thought the CIA database was detailed...!)

If only for this alone, we should get a Unicode font for Celestia!

Great work fridger- can't wait to get it once it's available!
- Jack Higgins
Jack's Celestia Add-ons
And visit my Celestia Gallery too!

Brendan
Posts: 296
Joined: 15.07.2003
With us: 21 years 4 months
Location: Bellows Falls, VT
Contact:

Post #3by Brendan » 12.12.2003, 01:13

WOW 8O I can't wait to see how much labeling there is around my hometown area (the Glens Falls, NY region).

Edited to add question:

Does this new location file include all of the locations that the other Earth citiy files have?

Brendan

Avatar
Topic author
t00fri
Developer
Posts: 8772
Joined: 29.03.2002
Age: 22
With us: 22 years 8 months
Location: Hamburg, Germany

Post #4by t00fri » 12.12.2003, 01:25

Brendan wrote:WOW 8O I can't wait to see how much labeling there is around my hometown area (the Glens Falls, NY region).

Edited to add question:

Does this new location file include all of the locations that the other Earth citiy files have?

Brendan


Although I did not check the individual entries (yet), the capitals should all be included in big file. However, my original earth.ssc file that was adapted from XEphem, had many observatory locations that may be found nowhere else.

Bye Fridger

don
Posts: 1709
Joined: 12.07.2003
With us: 21 years 5 months
Location: Colorado, USA (7000 ft)

Post #5by don » 12.12.2003, 18:51

Wow!! Image

... And I thought your previous UN file had a lot of cities in it. Many thanks to Stefan Helders and his group for obtaining all of this information and for making it available publicly and royalty free.

And to you Fridger, for finding the site and perling the data into a format usable by Celestia. This is so cool!
Image

Thanks Fridger! Looking forward to seeing it on my own system.

By the way, one of the entries on the Southern Andes screenshot caught my eye ... "Los ngeles". Should this be "Los Angeles"?

Cheers, Image

-Don G.

granthutchison
Developer
Posts: 1863
Joined: 21.11.2002
With us: 22 years

Post #6by granthutchison » 12.12.2003, 18:59

don wrote:By the way, one of the entries on the Southern Andes screenshot caught my eye ... "Los ngeles". Should this be "Los Angeles"
Probably "Los ?ngeles" in the original file, but as Fridger says, Celestia ignores all accented characters at present. :cry:
We had the same problem with the large IAU location files - I had to strip out a large number of accented characters and replace them with their unaccented equivalents.

Grant

don
Posts: 1709
Joined: 12.07.2003
With us: 21 years 5 months
Location: Colorado, USA (7000 ft)

Post #7by don » 12.12.2003, 19:24

Ahhhhh, by ignore, that means "not displayed", instead of converted to ASCII. Gotcha. Thanks!

Avatar
Topic author
t00fri
Developer
Posts: 8772
Joined: 29.03.2002
Age: 22
With us: 22 years 8 months
Location: Hamburg, Germany

Post #8by t00fri » 12.12.2003, 22:42

don wrote:Wow!! Image

... And I thought your previous UN file had a lot of cities in it. Many thanks to Stefan Helders and his group for obtaining all of this information and for making it available publicly and royalty free.

And to you Fridger, for finding the site and perling the data into a format usable by Celestia. This is so cool!
Image

Thanks Fridger! Looking forward to seeing it on my own system.

By the way, one of the entries on the Southern Andes screenshot caught my eye ... "Los ngeles". Should this be "Los Angeles"?

Cheers, Image

-Don G.


These beer drinking smileys would exceptionally be something for my "german heart":lol:

Bye Fridger

Avatar
Topic author
t00fri
Developer
Posts: 8772
Joined: 29.03.2002
Age: 22
With us: 22 years 8 months
Location: Hamburg, Germany

Post #9by t00fri » 12.12.2003, 23:33

granthutchison wrote:
don wrote:By the way, one of the entries on the Southern Andes screenshot caught my eye ... "Los ngeles". Should this be "Los Angeles"
Probably "Los ?ngeles" in the original file, but as Fridger says, Celestia ignores all accented characters at present. :cry:
We had the same problem with the large IAU location files - I had to strip out a large number of accented characters and replace them with their unaccented equivalents.

Grant


Don,

unfortunately, Grant's explanations are correct. This is also the reason why --so far-- I have hesitated to make the file publicly available. A lot of names are 'castrated' this way. But, if a number of you feel that they can live with this deficiency for the time being, let me know. I shall then upload it to the TextureFoundry.
The *.ssc file with 41245 entries is unpacked 3.75 MB and zipped only 585 Kb.

Let me also emphasize that the original 50k data base is ideally pre-conditioned for the application of Perl, just a TAB-separated table with many entries. So even Perl-beginners or Perl-revivers (Hi Don;-)) could easily extract whatever they are intersted in...Therefore I also did most probably not introduce additional bugs via pattern matching ambiguities etc. The precision of the coordinates is simply great, by the way.

The calculation of the Importance weights is still in progress, but by tomorrow or so, I will have optimized and finalized the weights.

Even with the castrated names, it is great fun to spot your familiar neighboring town etc...

Actually all we would need is 8bit Latin1 (ISO-xxxx-1) encoding support. Full 16bit Unicode encoding is not required for the names I chose for the Celestia file.

Bye Fridger

don
Posts: 1709
Joined: 12.07.2003
With us: 21 years 5 months
Location: Colorado, USA (7000 ft)

Post #10by don » 13.12.2003, 00:18

t00fri wrote:unfortunately, Grant's explanations are correct. This is also the reason why --so far-- I have hesitated to make the file publicly available.
Thank you for the confirmation Fridger. Would there be any "simple" way to perl a set of replacement ASCII characters?


t00fri wrote:Let me also emphasize that the original 50k data base is ideally pre-conditioned for the application of Perl, just a TAB-separated table with many entries.

Sure wish I had more time right now <sigh>. Could you e-mail it to me, for perusal and perl playtime at a later date?

-Don G.

PS. Glad you like the beer drinking smileys. German beer is definately GOOD for the heart, as well as pure joy <smile>.

granthutchison
Developer
Posts: 1863
Joined: 21.11.2002
With us: 22 years

Post #11by granthutchison » 13.12.2003, 00:22

t00fri wrote:Actually all we would need is 8bit Latin1 (ISO-xxxx-1) encoding support. Full 16bit Unicode encoding is not required for the names I chose for the Celestia file.
Fridger, isn't it easy enough to write a simple program to make appropriate substitutions with the unaccented equivalents, given the relatively small number of accented characters involved? A few years back I wrote an Excel macro to perform a similar task on some GEOnet files - replacing their custom character set with Unicode. It was mildly tedious to create, but no more than that.

Grant

Avatar
Topic author
t00fri
Developer
Posts: 8772
Joined: 29.03.2002
Age: 22
With us: 22 years 8 months
Location: Hamburg, Germany

Post #12by t00fri » 13.12.2003, 00:51

granthutchison wrote:
t00fri wrote:Actually all we would need is 8bit Latin1 (ISO-xxxx-1) encoding support. Full 16bit Unicode encoding is not required for the names I chose for the Celestia file.
Fridger, isn't it easy enough to write a simple program to make appropriate substitutions with the unaccented equivalents, given the relatively small number of accented characters involved? A few years back I wrote an Excel macro to perform a similar task on some GEOnet files - replacing their custom character set with Unicode. It was mildly tedious to create, but no more than that.

Grant


Well, this would be another easy challenge for Perl, once the transcripting rules are known! This is where I am less selfconfident;-). I read about similar projects/attempts in the WEB, which was rather discouraging. I simply was not aware about the diversity of the rules involved, e.g. that in Swedish, an 'a' with a little circle on top should be replaced by 'aa'. Lot's of rules, indeed. If I had a summary sheet from the net somewhere, I would be most happy to hack such a transcript facility together.

Here is another one that is not quite trivial;-):

Western Sahara's capital: al-?Ay?n -> La'youn

Another problem I encountered was that in Perl the pattern matching with these accented characters did not function very reliably...

Bye Fridger

granthutchison
Developer
Posts: 1863
Joined: 21.11.2002
With us: 22 years

Post #13by granthutchison » 13.12.2003, 01:36

t00fri wrote:Well, this would be another easy challenge for Perl, once the transcripting rules are known!
I wasn't talking about precise transliteration, which is, as you say, a complex, language-dependent task. Just the simple stripping of accents, which would at least leave the user with an "undamaged" letter sequence they could easily associate with the more correct form of the name - such simple substitutions are very common in non-specialist text.
Perhaps there would also be a need for a few substitution rules for variant letters like edh, thorn, kra, eszett, etc, if these appear in the dataset.
While far from perfect, this would leave us with something more legible than the current labels, with their absent letters and unpronounceable results.

t00fri wrote:Western Sahara's capital: al-?Ay?n -> La'youn
El Aai?n, Aai?n, Aiun, La?youne, La'youan ... These are not problems of converting accented Latin characters to unaccented versions, but of transliterating Berber-accented Arabic into the Latin alphabet as it is used in English, French and Spanish. We certainly don't want to get involved in that can of worms ... :wink:

t00fri wrote:Another problem I encountered was that in Perl the pattern matching with these accented characters did not function very reliably...
Tut. Surely not a deficiency in Perl? :wink:
I must say I haven't encountered this problem when writing character conversion routines in Excel or VB ...

Grant


Return to “Celestia Users”