Brains wanted for galaxies classification

abramson · Post #1by **abramson** » 12.07.2007, 17:03

Hi all,

A project has been brought to my attention, which may be of interest for people in this forum:

http://www.galaxyzoo.org

They recruit volunteers to analyze galaxy images from the SDSS, corresponding to unclassified galaxies, and sort them, just by looking at them. They say that the brain is more effective than the algorithms they have for this task. It may well be.

Cheers,

Guillermo

(I found about this at the Bad Astronomer's http://www.badastronomy.com)

Post #2by **ElChristou** » 12.07.2007, 17:56

Cool... but I'm wondering who will verify the hundreds of classified objects by the internet users...

Post #3by **selden** » 12.07.2007, 18:08

It's statistical: they verify each other.
Several people will inspect each image. If most say it's a clockwise spiral, then that's what it is.

Post #4by **t00fri** » 12.07.2007, 18:14

I don't think this makes a lot of sense.

-- many brains are too valuable ;-)

for such "dumb" recognition tasks. I just doubt that one cannot write satisfactory and /reproducable/ recognition code! Professionally, we are doing analogous scientific jobs since a long time by computer. Our patterns are way more complicated to assign.

-- it seems to me that some people wanted to save their coding time after the successes of Seti@home.

http://setiathome.berkeley.edu/

Seti@home was/is quite a different story!

Bye Fridger

Post #5by **ElChristou** » 12.07.2007, 18:24

t00fri wrote:...I just doubt that one cannot write satisfactory and /reproducable/ recognition code!

Yeap, it's exactly what I was thinking... I tried their trail test and get 15/15 even with some a bit tricky. I suppose that even a russian bot can do exactly the same... but much faster... :lol:

Scytale · Post #6by **Scytale** » 13.07.2007, 11:39

t00fri wrote:I just doubt that one cannot write satisfactory and /reproducable/ recognition code!

As a programmer, I incline to agree. I don't think there's anything in there that can't be solved by a bit of image processing / normalization and some straightforward code. I don't even think you need perception per se (as in percieving the shape), you could probably distinguish between an elliptical and a spiral by local analysis of the color gradients inside the image, or by eyeballing the histogram long enough.

Maybe they just need the attention.

ANDREA · Post #7by **ANDREA** » 13.07.2007, 13:52

Scytale wrote:
t00fri wrote:I just doubt that one cannot write satisfactory and /reproducable/ recognition code!
As a programmer, I incline to agree. I don't think there's anything in there that can't be solved by a bit of image processing / normalization and some straightforward code. I don't even think you need perception per se (as in percieving the shape), you could probably distinguish between an elliptical and a spiral by local analysis of the color gradients inside the image, or by eyeballing the histogram long enough. Maybe they just need the attention.

Well, I slightly disagree on this. :wink:

I have a long experience of cooperation with professional astronomers on search of many different kinds of objects, and still I can say that in many situations (but not always, obviously) the automatic search can be doubtful or wrong.
An example: in NEO search and confirmation (NEO= Near Earth Object, i.e. comets and/or Minor Planets that could rise an impact risk for the Earth), when a new object is discovered, it?€™s most important to calculate an orbit as close to the real as possible.
So it?€™s necessary to calculate the most realistic orbit, and this is very difficult, given the very short known orbit arc (sometimes just a few days), and to search on old images (Palomar?€™s POSS, POSS-2, etc) the ones that could have registered its trail.
All this is performed with dedicated software, but from this point on, the human eye must check each image in search of the tiny trail (most of times VERY VERY subtle); the error range is very big, most of times many degrees, so we need to search all around the given point, as requested by the positional error.
It?€™s a difficult, time consuming and eye-straining job, but sometimes the results are very satisfactory.
I?€™ve been so fortunate that with the SpaceGuard RHP group (Andrea Pelloni, myself- Roberto Haver- Prof. Giuseppe Forti of Arcetri Observatory) I found two new precoveries of 2003 VB12 Sedna, so its orbit became 11 years longer, from Sep 1990 to Mar 2004, a total of 13.5 years.
But this happens too when automatic recognition software is unable to discover an object, e.g. because it?€™s too close to a star, or embedded in a galaxy?€™s nucleus, as happens with Supernovae (Sn) search.
Despite the many professional search teams and telescopes dedicated to Sn search, our amateur Team (Tim Puckett?€™s, here: http://www.cometwatch.com/ ) discovered a total of 166 Supernovae, an average of 30 in the last years, all missed by automatic systems.
And, last but obviously not least, there is the cost factor: why spend a lot of taxpayer?€™s money, when in the web there is a lot of willingly people that, at the nominal charge of a citation (and not so frequent, alas!) on scientific papers, dedicate a lot of time, efforts and experience to make the job, freely?
This happens actually e.g. with the StarDust Home Project, that needs the help of thousands of scanners for the zillions of images obtained from the StarDust Probe Aerogel panels, that probably have englobed micrometric dust particles from Wild 2 Comet and, most important, from Deep Space:
http://stardustathome.ssl.berkeley.edu/about.php
Here too it?€™s a problem of money savings: a minimal staff can take care and check the images scanned and highlighted by a lot of people (each image is checked by hundreds of scanners, so only the images with a high detection rate go to the further step, i.e. a closer and higher-res scan of that portion of aerogel).
And this is working flawlessly!

We have many examples on this matter.
So I agree with Fridger and Scytale that (almost) all can be done automatically, but at the same time I can add that many of these things can be done in an extremely cheaper way, by amateurs.
And I have no evidence of signs of disapproval on this by the scientific community. :wink:

Bye

Andrea

Post #8by **t00fri** » 13.07.2007, 18:14

Andrea,

I think your personal experience is based on aspects
that only marginally relate to the specific task that was
advertised.

The answers they ask "layman-brains" to provide are
comparably TRIVIAL pattern recognition tasks
compared to what we do routinely "in grand style" in
elementary particle physics, both experimental and
theoretical.

You should e.g. see for once the amazing degree of
sophistication the software for the forthcoming LHC
collider at CERN/Geneva has reached meanwhile! The
daily bread and butter pattern recognition tasks are
many orders of magnitude harder and --of course-
solved with state of the art /computer/ methods.

Just as a small illustration: each proton proton collision
at the LHC actually constitutes up to 25 individual
processes with many 100's of elementary particles in
the final state that need to be identified and measured.
Which final state particle belongs to which of the 25
initial collisions? There are special so-called Cerenkov
detectors where detected particles leave fractions of
circles of Cerenkov light in a VERY noisy background.
The pattern recognition software not only has to localize
the many parts of circle geometries, but also needs to
measure energy and momentum of each particle in
question. The required computer task is so huge that
the whole scientific world is sharing the computer load
via the socalled GRID global network.

Here is a typical event from the RHIC heavy ion collider,
to give you a flavour about the immense task it takes to
identify and reconstruct in 3d all these particle tracks!

Some small research group may just not find the right
pattern recognition software on the "scientific market"
or lack the programming experts for doing a custom
approach. But from a "big science" perspective, I can
assure you that this website call looks kind of amusing.

Bye Fridger

Post #9by **ElChristou** » 13.07.2007, 19:34

t00fri wrote:...Here is a typical event from the RHIC heavy ion collider...

Curiosity, what's the size of this event?

Post #10by **t00fri** » 13.07.2007, 19:43

ElChristou wrote:
t00fri wrote:...Here is a typical event from the RHIC heavy ion collider...

Curiosity, what's the size of this event?

Many cubic meters!

Here is a shot from the Atlas detector at the LHC! See that tiny human being in the lower part of the image?

Bye Fridger

Post #11by **t00fri** » 13.07.2007, 19:57

Here is the worldwide GRID computing network used to unravel the event patterns of the forthcoming LHC experiments, starting early 2008.

At the LHC, we are looking for Dark Matter, Supersymmetry and many other revolutionary new discoveries...

Bye Fridger

Fightspit · Post #12by **Fightspit** » 13.07.2007, 20:04

Also, we will see if the boson of Higgs will be proved.

Post #13by **t00fri** » 13.07.2007, 21:18

Fightspit wrote:Also, we will see if the boson of Higgs will be proved.

Absolutely. But in reality, most people are interested less in confirming the Standard Model of particle physics than finding something NEW that would finally kill it ;-)

Bye Fridger

ANDREA · Post #14by **ANDREA** » 13.07.2007, 22:50

t00fri wrote:Andrea, I think your personal experience is based on aspects that only marginally relate to the specific task that was advertised. The answers they ask "layman-brains" to provide are comparably TRIVIAL pattern recognition tasks compared to what we do routinely "in grand style" in elementary particle physics, both experimental and theoretical.
You should e.g. see for once the amazing degree of sophistication the software for the forthcoming LHC collider at CERN/Geneva has reached meanwhile! The daily bread and butter pattern recognition tasks are many orders of magnitude harder and --of course-
solved with state of the art /computer/ methods.
Just as a small illustration: each proton proton collision at the LHC actually constitutes up to 25 individual processes with many 100's of elementary particles in the final state that need to be identified and measured.
?€¦. The required computer task is so huge that the whole scientific world is sharing the computer load via the so called GRID global network.
Here is a typical event from the RHIC heavy ion collider?€¦.
Some small research group may just not find the right pattern recognition software on the "scientific market" or lack the programming experts for doing a custom approach. But from a "big science" perspective, I can assure you that this website call looks kind of amusing.
Bye Fridger

Fridger, under many aspects my personal experience is close to the task that was advertised here, but I hardly understand what you are speaking of about your work, I?€™m not acquainted with high-energy physics.
But my long experience on astronomy (even if as amateur, not as professional astronomer) suggests me that this project?€™s budget is enormously more relevant that those of most of the astronomy projects I know (and I suggest you not to speak loudly of this to your colleagues astronomers :lol:

).

Just three examples:

The StarDust-Home project (The Planetary Society- http://stardustathome.ssl.berkeley.edu/about.php), based on data collected by the NASA StarDust Mission to comet Wild2, whose I was speaking in my previous post, has a not so high budget, and for this reason they were compelled, from its start, to define a policy based on the use of freely working people.
Actually about twelve thousand people are scanning the images!

The Galaxy Zoo project (the University of Oxford, the University of Portsmouth and Johns Hopkins University- USA- http://www.galaxyzoo.org/Press.aspx ), bases its project on the StarDust one, same approach, as they admit, people and not scientists make the thankless task (and this is OK, none is obliged to do it).

The MPC (Minor Planet Center, Smithsonian Astrophysical Observatory, Cambridge, http://cfa-www.harvard.edu/iau/mpc.html), the organization that catalogues ALL the known Minor Planets, checking, comparing, computing, defining, accepting or rejecting thousands of daily measurements of MP positions, in order to identify as soon as possible potentially hazardous objects, is a two and a half people venture, with access to the Tamkin Foundation Computing high-speed computer network, but with a ridiculous budget that doesn?€™t allow for extra time, so when something goes wrong the overtime they use to solve it is always unpaid!

The Sky Morph project (NASA's AISR program- http://skyview.gsfc.nasa.gov/skymorph/skymorph.html ), that takes care, gathers, memorizes, checks practically all the film, plate and electronic astronomical images from NEAT, DSS, DSS2, HST, USNO and POSS!, obtained up to now, and makes them available to all the scientists or researchers that need them to find in the past the position of particular objects like NEOs, was about closing three years ago, due to missing funds. After a kind of popular revolution among professional and amateur astronomers like me, that sent to NASA thousands of blaming messages, they obtained funds to go on for some more time, but none knows what will happen in the next future.

In science as in life, and you surely know it, Fridger, scientists may be clever or less clever, lucky or unlucky, diplomatic or not, so IMHO (beg your pardon for the acronym, but I like it) you should not say how much your group is clever and able to buy or make all the stuff you need, compared with other people of small research groups, because may be that some or many of them had not the cleverness, or the fortune, or the ability to obtain what they need.
Or perhaps they are in the wrong place at the wrong time, or their studies are not so interesting for benefactors or sponsors, who knows. :wink:

.
Even if I?€™m sure it?€™s not so, yours looks like a boastful speech towards them.

I?€™m not a scientist, and I have no debts with any of the projects I reported here, but I highly respect who tries to make his work the best he can, even if in a ?€?scientifically incorrect?€

Scytale · Post #15by **Scytale** » 14.07.2007, 14:11

But Andrea, even with very limited funding, there are a lot of people out there who would get in for free on a GPL software project aimed at classifying all this content. Why would they try to round up 12,000 people to classify by hand, when you can round up 10 programmers pro-bono, a grid, and maybe a MCS or two (although, again, at a very superficial first glance, you don't need state-of-the-art software to go through these pictures). I'm not contesting your/their work methods, I'm actually interested in how these people thought up their projects.

After all, we got doctors and bioengineers all wired up to computers, and now the astronomers are breaking the line?

These being said, I actually think I would spend less time writing a program that classifies all those galaxies, than writing a website which is supposed to introduce zounds of people to the project.

ANDREA · Post #16by **ANDREA** » 14.07.2007, 14:37

Scytale wrote:But Andrea, even with very limited funding, there are a lot of people out there who would get in for free on a GPL software project aimed at classifying all this content. Why would they try to round up 12,000 people to classify by hand, when you can round up 10 programmers pro-bono, a grid, and maybe a MCS or two (although, again, at a very superficial first glance, you don't need state-of-the-art software to go through these pictures). I'm not contesting your/their work methods, I'm actually interested in how these people thought up their projects. After all, we got doctors and bioengineers all wired up to computers, and now the astronomers are breaking the line?
These being said, I actually think I would spend less time writing a program that classifies all those galaxies, than writing a website which is supposed to introduce zounds of people to the project.

Scytale, surely you are right, so I don't know the reasons, if any, of such a behaviour.
Perhaps a reason could be found in the fear to use some free of charge but VERY clever programmer for their task, this way showing their incapability for solving ?€?in house?€

Fenerit · Post #17by **Fenerit** » 14.07.2007, 15:02

ElChristou wrote:
t00fri wrote:...Here is a typical event from the RHIC heavy ion collider...

Curiosity, what's the size of this event?

Fridger, for curiosity, in a homelike paragon, how many KW require an event like this? Intend, just the event, nor computation nor preparatives; second, how many nuclear plants supply the energy to LHC?

Post #18by **t00fri** » 14.07.2007, 15:11

Andrea,

let me add a few more considerations that come to mind in the course of our discussion about "human galaxy classification" at the level of 1 million galaxies ;-)

Firstly, when it comes to amateur projects (even possibly in collaboration with some professional astronomers) I have little to say.

Actually, I perhaps wrongly assumed that the above Galaxy Zoo classification project was entirely professional in nature. That was at least the basis for my criticism. /Professionals/ should simply know better than vasting their time in classifying 1 million galaxies ONE by ONE! That is quite INDEPENDENT of the available budget.

Professional science is a highly competitive discipline, and the fights for financial support of research projects can be fierce. Yet in most cases, first rate projects are able to win adequate support from their funding agencies!

Should a /professional/ galaxy classification project indeed be judged to be such a first rate task, it will also manage to receive some financial support for hiring 1-2 programming experts as needed to realize the appropriate pattern recognition software!

It is true that to my repeated surprise many of the professional astronomers that I know personally are not very advanced with programming techniques. Compared to the physics training, the advanced mathematics and programming education at Universities is often on a significantly lower level for astronomers. I really don't understand why this seems to be so... After all, astronomy is strongly tied to computing. Yet, my information is first hand from my wife, since she has taught repeatedly math classes for astronomers at her University (besides her usual math & physics courses for physics students, of course ). Her experience matches perfectly with what we know from a very good friend ( our "best man"), who is a theoretical astrophysicist at the University Observatory in Hamburg-Bergedorf. He was trained as a theoretical particle physicist before switching to astrophysics.

Astronomers often use those large analysis software platforms that in many respects are "heavy" and not state-of-the-art. For example, to my great surprise only about 5 years back, the well-known IRAF analysis software merely supported 8 color bitplanes (i.e. just 256 colors!!!) And that in astronomy...
After writing a surprised email to them, the answer was "lack of menpower"...

In case of projects that involve amateurs, many aspects are completely different. Amateurs are entirely "pleasure driven" and do not cost money. On the other hand, they often lack the specific (and important) know-how of professionals which then has to be compensated by proceeding more on a time-intensive "low-tech" route...

Bye Fridger

Post #19by **t00fri** » 14.07.2007, 15:29

Fenerit wrote:
ElChristou wrote:
t00fri wrote:...Here is a typical event from the RHIC heavy ion collider...

Curiosity, what's the size of this event?

Fridger, for curiosity, in a homelike paragon, how many KW require an event like this? Intend, just the event, nor computation nor preparatives; second, how many nuclear plants supply the energy to LHC?

The electricity bill for the LHC is HUGE.

The nominal annual electricity consumption reaches some 390 GWh/year when all accelerators are in operation. The nominal consumption of the whole of CERN is 1000 GWh/year. See here

Note: G=Giga=1000*Mega!

We are certainly not talking about KWh here ;-)

Here is more info directly from CERN:
http://cernenviro.web.cern.ch/cernenvir ... at=5&doc=1

The LHC accelerator will have a power rating of around 200 megawatts (MW), which is equivalent to that of LEP. By way of comparison, the peak consumption for the Canton of Geneva is 400 MW.

In my laboratory DESY in Hamburg
http://www.desy.de/html/home/index.html

the electron proton collider HERA also used a significant fraction of the power of the whole city of Hamburg!

Bye Fridger

Fenerit · Post #20by **Fenerit** » 14.07.2007, 15:48

Sincerely I thought a lot of more, although the power consumption of my house is less of 390 x 10^6 KWh/year...