Informative Games With City Populations

General discussion about Celestia that doesn't fit into other forums.
Avatar
Topic author
t00fri
Developer
Posts: 8772
Joined: 29.03.2002
Age: 22
With us: 22 years 8 months
Location: Hamburg, Germany

Informative Games With City Populations

Post #1by t00fri » 12.12.2003, 10:27

Hi all,

after I now have such a very interesting data base of 41245
cities/towns on earth
, it is very easy and most informative to play a few games with it.

As a start, I used my dayly "professional math tool", Maple9, and with only 2 commands

Code: Select all

people:=evalf(map(log10,readdata("log.txt",integer))):
histogram(people,area=count,axes=boxed,labels=["log10(population)","number of cities"],labeldirections=[horizontal,vertical],OPTS);

I generated this nice histogram from the file 'log.txt', containing the populations in each of the 41245 cities/towns. This file I just wrote out with Perl, of course.

Image

What does it tell us?

On the horizontal axis, I plotted the decadic logarithm of the
populations (binned within certain ranges) and vertically, you can read off the corresponding numbers of cities/towns. So on the horizontal axis,
2 = 100, 4 = 10000 6 = 1000000 people!

You see from the plot that most cities (14000+) on earth (or better in that database!) host between 10000 and 16000 people, while only a "few" (412) host around one million of people.

AHA!

Why did I do that plot besides mere curiosity??

Of course, to optimize the calculation of an importance parameter for
the location labels from this distribution....This is in progress!

Bye Fridger
Last edited by t00fri on 12.12.2003, 16:18, edited 1 time in total.

Christophe
Developer
Posts: 944
Joined: 18.07.2002
With us: 22 years 4 months
Location: Lyon (France)

Re: Informative Games With City Populations

Post #2by Christophe » 12.12.2003, 14:10

Most certainly the symmetry of the graph is a sample artefact. It'd be interesting to draw the same graph with the comprehensive list of the 36000 French 'communes' for example.
Christophe

Avatar
Topic author
t00fri
Developer
Posts: 8772
Joined: 29.03.2002
Age: 22
With us: 22 years 8 months
Location: Hamburg, Germany

Re: Informative Games With City Populations

Post #3by t00fri » 12.12.2003, 14:36

Christophe wrote:Most certainly the symmetry of the graph is a sample artefact. It'd be interesting to draw the same graph with the comprehensive list of the 36000 French 'communes' for example.


Christophe,

I have been asking this myself, too. Certainly (the left-hand side of) the distribution is not 'acceptance' corrected. So we agree for sure that notably towards the left, an unknown number of towns well be missing due to sampling bias against too small towns.

It seems, however, plausible to me that sufficiently small towns will become disfavoured eventually, since many things in daily life become increasingly difficult, if all infrastructure is missing!

A crucial argument to fold in is that my above distribution is a world average involving counties with huge populations. For Old Europe;-) a town of 10000 people may seem big, elsewhere probably not?

France and the rest of Europe are merely 'perturbations';-)

Hence, an intriguing speculation could be that the 'true' distribution is actually essentially a normal distribution around log10(population)~4.2. In other words, the missing towns on the left between 2..4 could just fill up the distribution such as to make the symmetry perfect;-).

As a physicist I know that almost everything in life is 'normal distributed';-)

Why don't you do your complete, i.e. unbiased control sample for France? I can easily project France out of my sample and we could compare...

Last not least: despite being interesting per se, for the purpose of optimizing the importance weights of the labels, the left hand side is entirely irrelevant...

Bye Fridger
Last edited by t00fri on 12.12.2003, 15:47, edited 1 time in total.

Christophe
Developer
Posts: 944
Joined: 18.07.2002
With us: 22 years 4 months
Location: Lyon (France)

Re: Informative Games With City Populations

Post #4by Christophe » 12.12.2003, 15:59

t00fri wrote:I have been asking this myself, too. Certainly (the left-hand side of) the distribution is not 'acceptance' corrected. So we agree for sure that notably towards the left, an unknown number of towns is missing due to sampling bias against too small towns. Yet, where is the the threshold? 10000 or rather 20000, perhaps? Then the missing ones from your 36000 'communes' in France (and elsewhere) would not necessarily have to grossly distort the shape.

I've actually found that graph for France:
http://www.colloc.minefi.gouv.fr/colo_otherfiles_fina_loca/presentations/comm.html
It shows that most towns in France (57%) are very small (<500 inhab.)
However most people (49%) live in mid-sized or large towns (> 10000 inhab).

It should also be noted that France is a bit of an exception, since there are almost as many communes in France as in the rest of Europe (36681 / 80000).

t00fri wrote:It seems, however, plausible to me that sufficiently small towns will become disfavoured eventually, since many things in daily life become increasingly difficult, if all infrastructure is missing!

Yet people have telephones, electricity, gaz, TV and other commodities in rural France you know ;-)

t00fri wrote:As a physicist I know that almost everything in life is 'normal distributed';-)

If you apply the correct function to your dataset, everything can be!

t00fri wrote:One should also fold in that this is a world average involving counties with huge populations. For Old Europe;-) a town of 10000 people may seem big, elsewhere perhaps not?

I don't know, comparing countries and continents would be interesting.

t00fri wrote:Last not least: despite being interesting per se, for the purpose of optimizing the importance weights of the labels, the left hand side is entirely irrelevant...


I understand that, but the almost perfect symmetry of your graph puzzled me.
Christophe

Avatar
Topic author
t00fri
Developer
Posts: 8772
Joined: 29.03.2002
Age: 22
With us: 22 years 8 months
Location: Hamburg, Germany

Post #5by t00fri » 12.12.2003, 16:04

Christophe,

I quickly did the same plot for France as above for the World. I got 932 french 'communes'.

Image

One indeed can see that there is a marked cut-off in the distribution around a few thousand people...
This strongly smells like a bias, at least against small french towns;-)

After all, the guys who collected the data are German;-)

Bye Fridger

Avatar
Topic author
t00fri
Developer
Posts: 8772
Joined: 29.03.2002
Age: 22
With us: 22 years 8 months
Location: Hamburg, Germany

Post #6by t00fri » 12.12.2003, 16:30

Here comes India, showing quite a shift of the peak to the right, i.e towards more people!
Also, the shape is distorted.
3473 towns are included.

Image

Bye Fridger

wcomer
Posts: 179
Joined: 19.06.2003
With us: 21 years 5 months
Location: New York City

Post #7by wcomer » 12.12.2003, 22:06

Fridger,

If you take just the right hand side, and plot on log-log chart, is it a straight line? Are we dealing with a power tail or exponential? This would say a lot about your normality hypothesis.

-Walton

Avatar
Topic author
t00fri
Developer
Posts: 8772
Joined: 29.03.2002
Age: 22
With us: 22 years 8 months
Location: Hamburg, Germany

Post #8by t00fri » 12.12.2003, 23:59

wcomer wrote:Fridger,

If you take just the right hand side, and plot on log-log chart, is it a straight line? Are we dealing with a power tail or exponential? This would say a lot about your normality hypothesis.

-Walton


Walton,

unfortunately, Maple9 does not offer logY scaling for histograms. Your proposal is interesting in so far as you propose to take the 'reliable' side of the distribution for making tests on its functional shape.

From my considerable 'eyeball experience' in these matters, I am convinced that the shape may indeed well be fitted with an exp{-v*(log10pop - 4.2)^2} behaviour.

I guess that I shall be 'pragmatic' now and just go on calculating optimized importance weights;-)

Bye Fridger

Avatar
Topic author
t00fri
Developer
Posts: 8772
Joined: 29.03.2002
Age: 22
With us: 22 years 8 months
Location: Hamburg, Germany

Post #9by t00fri » 15.12.2003, 22:44

Hi Walton, Christophe and whoever is interested,

I have now completed my analysis of assigning the Importance weights for my monster locations file of 41245 entries such that the label density remains the same at all distances from earth. This means virtually no overlaps of labels despite the huge number. In order to fully enjoy the result, I really recommend large VT's notably for the nightlights!

For those of you, who might be interested in the mathematical derivation of my result, I have converted my extensively commented Maple9 worksheet into HTML. You find it by clicking on this URL:

http://www.shatters.net/~t00fri/earth.html

I shall upload my monster locations file to my TextureFoundry very soon.

Bye Fridger

wcomer
Posts: 179
Joined: 19.06.2003
With us: 21 years 5 months
Location: New York City

bravo but not an exponential tail.

Post #10by wcomer » 16.12.2003, 00:53

Fridger,

Nicely done. I look very forward to the finished product.

I should point out that the tail is power, NOT exponential. I pulled the data from your Maple file and took the Log10 of the population data.

Code: Select all

4.654670339   3.942206542
5.097972276   3.566908655
5.541274213   3.074816441
5.98457615   2.596597096
6.427878087   2.021189299
6.871180024   1.477121255


This fits very well to a straight line of slope (-1.124) with R^2=0.9958. Including the peak only reduces the R^2 to 0.9852 (with slope=-1.037). I think we can safely conclude that there is no important bias in the right-hand side of the histogram, and therefore we are deeling with a power law. I doubt this will significantly effect your importance weighting function, but it might be worth checking. The student-t distribution with 1 degree of freedom has a power tail of exponent (-1) so it would probably be a better fit than the guassian (or perhaps you prefer something else.)

cheers,
Walton

Avatar
Topic author
t00fri
Developer
Posts: 8772
Joined: 29.03.2002
Age: 22
With us: 22 years 8 months
Location: Hamburg, Germany

Re: bravo but not an exponential tail.

Post #11by t00fri » 16.12.2003, 01:07

wcomer wrote:Fridger,

Nicely done. I look very forward to the finished product.

I should point out that the tail is power, NOT exponential. I pulled the data from your Maple file and took the Log10 of the population data.

Code: Select all

4.654670339   3.942206542
5.097972276   3.566908655
5.541274213   3.074816441
5.98457615   2.596597096
6.427878087   2.021189299
6.871180024   1.477121255


This fits very well to a straight line of slope (-1.124) with R^2=0.9958. Including the peak only reduces the R^2 to 0.9852 (with slope=-1.037). I think we can safely conclude that there is no important bias in the right-hand side of the histogram, and therefore we are deeling with a power law. I doubt this will significantly effect your importance weighting function, but it might be worth checking. The student-t distribution with 1 degree of freedom has a power tail of exponent (-1) so it would probably be a better fit than the guassian (or perhaps you prefer something else.)

cheers,
Walton


Walton,

first of all, the finished product is ready for download in this department!

Second, did you have a look on my figure that looks like a "triangle" (title: Empirical Fit of Label Distribution"). It had to upload it several times with y-labels replaced, since the Maple9 HTML conversion is buggy! Perhaps you saw an incorrect intermediate version?

I agree that log (label number) is linearly proportional to abs(log (Population-x0)) to good approximation, which is a power law.

Bye Fridger

wcomer
Posts: 179
Joined: 19.06.2003
With us: 21 years 5 months
Location: New York City

I see it now

Post #12by wcomer » 16.12.2003, 01:26

Fridger,

I see that triangle now. Id oubted that it would have any significant effect, and I see now that it isn't even relevant. This wqas what you did all along. I don't use Maple, so I stopped paying attention after the first bit :P

Walton


Return to “Celestia Users”