41245 (!) World Cities & Towns for Download

Post #21by **t00fri** » 18.12.2003, 22:19

Hi,

here is a little status report on what "dirty" tricks I am presently trying, to reduce Holland's & England's overpopulation of labels;-) in my huge Gazetteer data base....

The green line is what is available for download so far. The blue curve is an "ingenious local" modification;-) of my importance weight function to suppress a certain range of populations massively => Holland :lol:

.

Bye Fridger

wcomer · Post #22by **wcomer** » 18.12.2003, 23:35

I'm not sure if these have been listed yet:

Code: Select all

Kanchanadit Thailand 16502 91.67°N 99.47°E
Bacho Thailand 11071 66.17°N 101.65°E
Chawang Thailand 11230 62.67°N 102.05°E

The first one has an impossible lattitude. The others are either not Thai cities or they have the wrong lattitudes.

cheers,
Walton

Post #23by **t00fri** » 18.12.2003, 23:49

wcomer wrote:I'm not sure if these have been listed yet:

Code: Select all
Kanchanadit Thailand 16502 91.67°N 99.47°E Bacho Thailand 11071 66.17°N 101.65°E Chawang Thailand 11230 62.67°N 102.05°E

The first one has an impossible lattitude. The others are either not Thai cities or they have the wrong lattitudes.

cheers,
Walton

Thanks Walton! No, this is "original";-). It will be immediately corrected...

In the first and second entries, the comma was just displaced by one digit;-)

Bye Fridger

wcomer · Post #24by **wcomer** » 19.12.2003, 05:27

Fridger,

I don't want to steal your thunder, but I've taken a stab at this as well.

Here is what I did.

For each city,
1) Found the first VT level for which the city is the most populous, capping off at level 16.
2) Found the logarithm (base-4) for the city's population rank for each of the prior 3 levels, if they existed.
3) Found the logarithm (base-4) for the city's population over the entire earth (note this caps to ~9.)
4) Averaged (#1, #2a_log + #2a_level-1, #2b + #2b_level-1, #2c_log + #2c_level-1)
5) Averaged (#1, #3, #4)
6) Importance Weight = 1+3200*(1/2)^#5

This works well. It tends to have good localization for relative importances without putting obscure locals to high up. It doesn't seem to ignore important cities either. I have checked much of the Earth's surface and do not see any major flaws. It is not overall quite as dense as your version, but that can be easily changed by scaling using somethign larger than 3200. No attempt was made at normalization this is entirely the result of trial and error.

The resulting files can be found (I've decided to pull the link in the interest of both letting Fridger finish his methodology and not wanting to create too many version of the same concept. Anyone interested, private message me, and I will send you the link to the file.)

cheers,
Walton

don · Post #25by **don** » 19.12.2003, 07:15

Howdy Fridger,

Here's another town for the list -- nearest town to where we live:

Town: Calhan, Clorado USA
Population: 896 (as of 2000)
Latitude: 39.035N
Longitude: -104.296W

-Don G.

don · Post #26by **don** » 19.12.2003, 07:31

A couple more local towns near us ...

Falcon, CO
Population 2000: unknown (about 1200)
Latitude: 38.933N
Longitude: -104.608W

Peyton, CO
Population 2000: unknown (about 200)
Latitude: 39.028N
Longitude: -104.482W

Ramah, CO
Population 2000: 117
Latitude: 39.121N
Longitude: -104.165W

Simla, CO
Population 2000: 663
Latitude: 39.141N
Longitude: -104.083W

Limon, CO
Population 2000: 2,071
Latitude: 39.263N
Longitude: -103.691W

For anyone in the U.S.A., you can go here (http://www.epodunk.com/) to find this information for your town.

-Don G.

don · Post #27by **don** » 19.12.2003, 07:58

Howdy Fridger,

While taking a closer look at some locations, I found many cities to be duplicated (exact names or real close names). It would also appear that many towns share the same exact long and lat, as multiple labels are on top of one another, even down to 22 km.

Soooo, I was wondering if you could sort your resulting data file by a couple of different columns for duplicate city name checking, and duplicate long/lat for multiple city names checking ...

* File 1: Sort by Country then City Name
* File 2: Sort by Long then Lat (or Lat then Long)

You could probably eliminate *exact* country/city name, and Long/Lat matches via perl, but then the rest of the "very close" (ie. Brussel/Brussels) duplicates would need to be manually eliminated.

I am just asking, not requesting <smile>.

-Don G.

Post #28by **t00fri** » 19.12.2003, 08:27

don wrote:Howdy Fridger,

While taking a closer look at some locations, I found many cities to be duplicated (exact names or real close names). It would also appear that many towns share the same exact long and lat, as multiple labels are on top of one another, even down to 22 km.

Soooo, I was wondering if you could sort your resulting data file by a couple of different columns for duplicate city name checking, and duplicate long/lat for multiple city names checking ...

* File 1: Sort by Country then City Name
* File 2: Sort by Long then Lat (or Lat then Long)

You could probably eliminate *exact* country/city name, and Long/Lat matches via perl, but then the rest of the "very close" (ie. Brussel/Brussels) duplicates would need to be manually eliminated.

I am just asking, not requesting <smile>.

-Don G.

Don,

I am a little bit surprised. Are you sure that only the earth-Gazetteer.ssc file is loaded?? Perhaps you also have other earth location files in your extras folder or still my world-capitals.ssc in the data dir?

Could you give me one example?

Bye Fridger

don · Post #29by **don** » 19.12.2003, 09:17

Howdy Fridger,

Your bright and fresh morning mind is much more alert than my late-night sleepy mind. I'm very sorry. Yes, the Capitals file was still in the data dir <sigh>. I think I'd better go to bed now <laughing>. Long day...

Thank you Fridger. Have a good day!

-Don G.

Post #30by **t00fri** » 19.12.2003, 14:55

wcomer wrote:Fridger,

I don't want to steal your thunder, but I've taken a stab at this as well.

Here is what I did.

For each city,
1) Found the first VT level for which the city is the most populous, capping off at level 16.
2) Found the logarithm (base-4) for the city's population rank for each of the prior 3 levels, if they existed.
3) Found the logarithm (base-4) for the city's population over the entire earth (note this caps to ~9.)
4) Averaged (#1, #2a_log + #2a_level-1, #2b + #2b_level-1, #2c_log + #2c_level-1)
5) Averaged (#1, #3, #4)
6) Importance Weight = 1+3200*(1/2)^#5

This works well. It tends to have good localization for relative importances without putting obscure locals to high up. It doesn't seem to ignore important cities either. I have checked much of the Earth's surface and do not see any major flaws. It is not overall quite as dense as your version, but that can be easily changed by scaling using somethign larger than 3200. No attempt was made at normalization this is entirely the result of trial and error.

The resulting files can be found here.

cheers,
Walton

Walton,

I gave your modified weights a quick try this morning. The result certainly appears much less crowded than mine, but to see the very small places, I had to zoom into the extreme. That is of course always a threatening disadvantage. With 32k tiles, your scheme looks good and so does my more recent one above.

I am not sure however, whether I really understood you algorithm (could be partly also a matter of english):

e.g. what does your statement:
..."the first VT level for which the city is the most populous"?

amount to in practice? How does this depend on the size of the actual VT tiles? etc. What if no tiles are used at all? What exactly did you calculate etc? By which means (Perl, Exel?)

What is "Found the logarithm (base-4)" supposed to mean?? Are you working with logarithms of base=4? If yes why? Or do you mean base 'minus' 4? What base?

You were using a lot of different averaging steps. Is this a more or less empirical procedure or did you have a more "theoretical" approach in mind, that I missed to notice?

Bye Fridger

wcomer · Post #31by **wcomer** » 19.12.2003, 19:55

"the first VT level for which the city is the most populous"

Search for the Virtual Texture tile level for which the city is the most populous compared with all other cities that share the same tile. Hence the log base-4, each new level has four times the surface "area", thus four times as much space for labels. Also, It turns out that for the first 6 levels there is very good agreement between the log_4 of the number of populated tiles and the level number.

I've explored using a log population model instead of log rank. The results are pretty similar except it allows larger cities that are proximate to other large cities to not be underweighted. However that leads to clustering, but now only the smallest of cities require close zooms. So I will try to do a blend of the two approaches and see if the results are not better.

Ideally, an importance weighting function would have the following results:
1) Scalable as list grows or changes.
2) Consistent label density across wide range of geographies. Exceptions being areas with very little population (i.e. don't overemphasize extremely small outposts) and very large cities that are close together (i.e don't ignore the second smallest largest city in the neighborhood just because it is next to a giant.)
3) Close magnification required for only the smallest of cities.

I will not be able to do anythign else on this until after christmas.

cheers,
Walton

Neil · Post #32by **Neil** » 19.12.2003, 23:19

Fridger
Thanks for the links - I now have Jens' Earthnights VT installed. Of course, it lead to some more misplacements coming to 'light'

Here's the latest list of New Zealand corrections (hope you're not getting sick of these!)

Inglewood 174.11 -39.15
Marsden Point 174.50 -35.85
Opotiki 177.28 -38.00
Pukekohe 174.90 -37.20
Turangi 175.81 -38.98
Twizel 170.10 -44.25
Waimate 171.05 -44.73 (This Waimate in the south is a proper town showing light spill - original is a one horse place)
Waipukurau 176.55 -40.00

Marsden point is an addition to give name to a large light patch - mainly industrial area (oil refinery)

I am wondering where the data for these nightlight textures originate.
There are some anomalies apparent e.g. a significant light patch west of Palmerston where no settlement exists. And the Taranaki area (the part of North Island sticking out westwards) has 3 bright patches that do not correspond to towns (which I have checked)
Finally, what do you think of the idea to CAPITALISE capital cities of countries and states?
cheers
Neil

Post #33by **selden** » 19.12.2003, 23:50

Neil,

The nightlight textures are derived from a NASA image which was created by C. Mayhew & R. Simmon of NASA Goddard from data from the Defense Meteorological Satellites Program satellites. More info is available on the APOD Web page at http://antwrp.gsfc.nasa.gov/apod/ap001127.html and on the Visible Earth web page at http://visibleearth.nasa.gov/cgi-bin/viewrecord?5826

Supposedly, most or all of the lit areas that don't correspond to cities were caused by wildfires.

Post #34by **t00fri** » 20.12.2003, 19:20

Neil,

I have added all your corrected places after cross-checking their coordinates against Getty Thesaurus to the Gazetteer source data base. Exception: Marsden Place. Thesaurus did not list it and a number of further info was missing (requested in the main data base)

We have already committed ourselves to name the major oceans and continents by capitals. My proposal in the developer forum was rather for 1.3.2 to use the shades of one color (e.g. blue) for a qualitative graduation of the population or importance. This would be a new degree of freedom, allowing to display significant additional info about the locations without making things more crowded!

Don,

adding such small places is tricky, since you have also not provided a variety of entries requested for a given place in the source data base. One might think of an additional file that lists such 'personal' and incomplete entries for inclusion by the Perl script in Celestia.

Bye Fridger

don · Post #35by **don** » 21.12.2003, 03:05

t00fri wrote:Don,

adding such small places is tricky, since you have also not provided a variety of entries requested for a given place in the source data base. One might think of an additional file that lists such 'personal' and incomplete entries for inclusion by the Perl script in Celestia.

Howdy Fridger,

I'm not sure what you mean by incomplete. But I can add them as a separate file here, since nobody else would probably be interested anyway <smile>.

Hope you're having a good weekend!

-Don G.

Post #36by **t00fri** » 21.12.2003, 10:48

don wrote:
t00fri wrote:Don,

adding such small places is tricky, since you have also not provided a variety of entries requested for a given place in the source data base. One might think of an additional file that lists such 'personal' and incomplete entries for inclusion by the Perl script in Celestia.
Howdy Fridger,

I'm not sure what you mean by incomplete. But I can add them as a separate file here, since nobody else would probably be interested anyway <smile>.

Hope you're having a good weekend!

-Don G.

Don,

the Gazetteer-'master' data base has a certain format, involving tab separated columns with these
10 entries:

place name
basic place name
administrative division
basic administrative division
country
cc
admin.center of country/region/both/none
current population
latitude
longitude

'place name' is e.g. the name that the local Indians use in (utf8 encoding) when they ride by your ranch :lol:

'basic place name' is the ISO-8559-1 (latin1)-projected name that I have been using so far.

'cc' is the 2-letter country code,

just to name a few.

Celestia (so far) makes use only of four entries: 'basic place name', 'current population', 'latitude', 'longitude', the latter two in degrees with decimal fractions.

As soon as we got the UTF8 font display incorporated it costs me just modifying one column index in the Perl script to switch to the UTF8 names!

So altogether, I find it sensible that all places that I enter into the master data base should have the 10 columns completely specified. Then I can use the same Perl script at a later stage to extract additional info (if needed) for all entries.

That's all. I could however just open up another file with "Celestia only" data (like yours) which I just read in with Perl /after/ the read out of the Gazetteer data has taken place.

Bye Fridger

don · Post #37by **don** » 21.12.2003, 19:42

Howdy Fridger,

Ahhh, Gazetter-format incomplete. Yes, now I understand, and agree with you. Sorry I am so dense at times. My mind is quite fragmented right now, working on Celestia stuff, holiday stuff, and trying to down-size, clean up, and update a 10,000 file, 1,055 page web site. So please excuse my momentary mental lapses for a while <smile>.

Regarding your "Celestia only" data idea, I'm not sure that anyone else cares to know about little towns like the ones around where we live. After all, they are quite easy to add manually <smile>.

Cheers,

-Don G.

Post #38by **t00fri** » 22.12.2003, 21:49

Hi all,

meanwhile, I did quite a few more tunings of my importance weights and many
corrections of lat-long coordinates, notably including numerous towns in
New Zealand [thanks a lot, Neil!]. So here is the next update of my
monster locations file for you to play with. I strongly encourage
feedback of any kind!

http://www.shatters.net/~t00fri/earth-Gazetteer-2.ssc.zip

The tuned weights produce a much less crowded label display compared
to the previous version, yet avoid extreme zooming-in to make all
labels show up eventually. Still, there are a few 'problem zones'
left: Paris, for example and also the densely populated region in
Germany near K?ln-D?sseldorf-Dortmund...But most other regions on
earth seem quite fine now. Great Britain and Holland, for example. Let
me know how things look on YOUR display! I suggest
you have a look, at the pearl cord of small Caribean islands and the
many non-overlapping labels showing up at some point...or northern
India, or the US, of course...

Walton's approach incorporates a few very good ideas that mine is missing.
His approach is better suited to account for local label density
fluctuations by exploiting information about the individual neighbors of a
given town!

So I guess, when Walton is again available, we'll start another pleasant
collaboration on this most interesting and important mathematical problem...

Enjoy.

Bye Fridger

Neil · Post #39by **Neil** » 23.12.2003, 09:30

Fridger
Nice to see your updated gazetteer, but for New Zealand I am seeing many small places being displayed well before substantially larger centres while zooming in (this did not happen in your original gazetteer)

cheers
Neil

Post #40by **t00fri** » 23.12.2003, 10:23

Neil wrote:Fridger
Nice to see your updated gazetteer, but for New Zealand I am seeing many small places being displayed well before substantially larger centres while zooming in (this did not happen in your original gazetteer)

cheers
Neil

Yes, indeed, I understand of course why. So I have to continue working on a version 3;-)

Bye Fridger

Celestia Forums

41245 (!) World Cities & Towns for Download

a different approach

Re: a different approach