Category Archives: Results

A New Paper All About #yellowballs

PIA18909_fig1

There is a new Milky Way Project paper in the news today, concerning the #yellowballs that were found by Milky Way Project volunteers.

The Yellowballs appeared on the very first day of the Milky Way Project when user kirbyfood asked ‘what is this?’ and I wasn’t sure so jokingly called it a ‘#yellowball’, since that’s what is looked like. We use hashtags on talk.milkywayprojct.org, and that user, and many others, went off and tagged hundreds of the things over the next few months. Before we knew it there was a catalogue of nearly 1,000 of them. However, we still didn’t know what they really were, and so Grace Wolf-Chase, Charles Kerton, and other MWP collaborators have put a lot of effort into figuring it out. From the JPL press release:

So far, the volunteers have identified more than 900 of these compact yellow features. The next step for the researchers is to look at their distribution. Many appear to be lining the rims of the bubbles, a clue that perhaps the massive stars are triggering the birth of new stars as they blow the bubbles, a phenomenon known as triggered star formation. If the effect is real, the researchers should find that the yellow balls statistically appear more often with bubble walls.

This new paper is the fourth from the Milky Way Project, and adds to the Zooniverse’s growing list of 80+ publications made possible thanks to our amazing volunteers. You can see the complete set at zooniverse.org/publications.

The full list of volutneers who helped tag the yellowballs is shown below. Each and everyone one of you made a valuable contribution to this paper. Thank you to everyone who helped in this search!

KhalilaRedBird, lpspieler, greginak, LarryW, chelseanr, broomrider1970, Dealylama, Cruuux, Mirsathia, suelaine, sdewitt, stukii, kmasterdo, PattyD, HeadAroundU, Fezman92, Jakobswede, Jk478B27Ds395, Kerry_Wallis, iacomo, Ken Koester, ttfnrob, jules, Falconet, Caidoz13, Starsheriff, ascil, simonron, tyna_anna, gwolfchase, Greendragon00, Ranchi, kirbyjp, githensd, katieofoz, harbinjer, ycaruth1, embo, echong, Feylin, stock_footage, zookeeper, joke slayer, karvidsson, Furiat, Tyler Reynolds, Manjingos, cathcollins, legoeeyore, GabyB, eshafto, mtparrish, 59Vespa, amatire, TheScribblery, pschmal, Helice, norfolkharryuk, WilB, jamesw40k, koenvisser, dragonjools, Nocterror, nunyaB, hansbe, meheler, Cahethel, Alice, stellar190, mabbenson, Embyrr922, gnome_king, jumpjet2k, tchan, yoman93, and Loulouuse.

Combining Your Clicks with Milkman

I’ve been building a new app for the Milky Way Project called Milkman. It goes alongside Talk and allows you to see where everyone’s clicks go, and what the results of crowdsourcing look like. It’s open source, and a good step toward open science. I’d love feedback from citizen scientists and science users alike.

Milkman

Milkman is so called because it delivers data for the Milky Way Project, and maybe eventually some other Zooniverse projects too. You can access Milkman directly at explore.milkywayproject.org (where you can input a Zooniverse subject ID or search using galactic coordinates), or more usefully, you can get to Milkman via Talk – using the new ‘Explore’ button that now appears for logged-in users.

Clicking ‘Explore’ will show you the core view of Milkman: a display of all the clicks from all the volunteers who have seen that image and the current, combined results.

Screenshot 2014-09-09 09.14.38

Milkman 2

Milkman is a live, near-realtime view of the state of the science output from the current Milky Way Project. It might help people discussing items on Talk to understand what other objects are in the MWP images, and it hopefully shows how volunteers’ clicks are used.

Milkman uses a day-old clone of the main Zooniverse database, which means the clicks are at most 24 hours old. The clustering is performed using a technique called DBSCAN, which takes the vast array of clicks on each image and tries to automatically group them up. The resultant, averaged bubbles, EGOs, clusters, and galaxies are often better than any individual drawing, showing the power of crowdsourcing in acton.

Milkman is open source on GitHub and I’m happy to accept issues and feedback through the repo’s issues.

Immediate plans for Milkman include a navigable map on the homepage (to let you explore the whole galaxy), better links to other public astronomical data, and access to the current state of the reduced MWP2 catalogue as a whole. If you have ideas or requests either contact me or create an issue on GitHub.

New MWP paper outlines the powerful synergy between citizen scientists, professional scientists, and machine learning

bubble_gallery_sorted_v2

A new Milky Way Project paper was published to the arXiv last week. The paper presents Brut, an algorithm trained to identify bubbles in infrared images of the Galaxy.

Brut uses the catalogue of bubbles identified by more 35,000 citizen scientists from the original Milky Way Project. These bubbles are used as a training set to allow Brut to discover the characteristics of bubbles in images from the Spitzer Space Telescope. This training data gives Brut the ability to identify bubbles just as well as expert astronomers!

The paper then shows how Brut can be used to re-assess the bubbles in the Milky Way Project catalog itself, and it finds that more than 10% of the objects in this catalog are really non-bubble interlopers. Furthermore, Brut is able to discover bubbles missed by previous searches too, usually ones that were hard to see because they are near bright sources.

At first it might seem that Brut removes the need for the Milky Way Project –  but the ruth is exactly the opposite. This new paper demonstrates a wonderful synergy that can exist between citizen scientists, professional scientists, and machine learning. The example outlined with the Milky Way Project is that citizens can identify patterns that machines cannot detect without training, machine learning algorithms can use citizen science projects as input training sets, creating amazing new opportunities to speed-up the pace of discovery. A hybrid model of machine learning combined with crowdsourced training data from citizen scientists can not only classify large quantities of data, but also address the weakness of each approach if deployed alone.

We’re really happy with this paper, and extremely grateful to Chris Beaumont (the study’s lead author) for his insights into machine learning and the way it can be successfully applied to the Milky Way Project. We will be using a version of Brut for our upcoming analysis of the new Milky Way Project classifications. It may also have implications for other Zooniverse projects.

If you’d like to read the full paper, it is freely available online at at the arXiv – and Brut can found on GitHub.

The Most Distant Bubble?

A little while ago Sarah Fitzmaurice, a work experience student at Zooniverse Oxford, spent a week working with the Milky Way Project database. She did some fun things with the data, including plotting the locations of many of the bubbles according to their distance from us. For many, the current canonical view of our own Galaxy comes from a combination of data sources, compiled by Robert Hurt, working at NASA JPL. The image is shown below, and you may recognise it: we use it as our Twitter/Facebook avatar. It is an artist’s impression based on several data sources and guided by astronomers.

The Milky Way may be our home in the Universe but we know startlingly little about it. On key missing piece of information for many objects in our Galaxy is their distance from us. From the Spitzer data alone, we do not know the distance to the bubbles in the MWP. For our first Data Release paper, we compared the MWP Bubble catalogue to known objects, some with distances, and this allowed us to find  out how far way some of the bubbles are. This enables us to investigate how large and sometimes how massive they may be.

During her work experience week, Sarah plotted the bubbles with known distances onto Robert Hurt’s map of the Milky Way. The result is shown below. The bubbles are marked with crosses, and the size of the cross shows the relative size of the bubble. The distances to these bubbles were derived by comparing them to a known set of radio sources that are expected to look like bubbles in Spitzer data.

You can see that the bubbles generally follow the distribution of spiral arms and that it is easier to see the bubbles nearby than those farther away. This is good because it is roughly what we expect. This map also allows us to easily spot the isolated, nearby or most-distant bubbles in the project. Much of Sarah’s week was spent looking at each of the interesting bubbles and finding out some more about them.

Although there may well be more distant bubbles in the catalogue, Sarah’s map provides a candidate for ‘most distant bubble’ in the MWP. It is one of a pair of bubbles located on the far side of the Perseus arm, almost 45,000 light years away from the Sun – in the top part of the above image.

Using the new MWP coordinates tool we can take a look at this distant object, and two nice images of it are shown below. Our ‘most distant bubble’ is actually located within another larger, clearer bubble, the image of this is also given. This is a line-of-sight effect and they are not necessarily near each other.

This bubble is located literally on the other side of our Galaxy and is roughly 15 light years across. The fact that the two bubbles are positioned on top of each other makes it hard to decide which one is farther away. There are many more instances where bubbles lie on top of each other where it would be impossible to decide which is actually on top of which. The nebulous material of which these objects are made makes them hard to disentangle. In this case there are stars and IR objects on top of the smaller bubble that make it easier to pick out the nearer and farther bubble.

In this case, the distance value is derived from a radio source that we expect to be associated with a bubble. Both of these bubbles lie at roughly the correct position to be associated with the radio source. Since we know the radio source is very far away, we can say that the smaller bubble is most likely the object associated with the radio source.

These kinds of confusing caveats are one of the things that make Galactic astronomy difficult and challenging. For these reasons, this might be the most distant bubble we know of in the MWP – or it might not. Either way, this awesome little bubble has provided the opportunity to discuss the ways that we determine the distances to objects in the MWP catalogue, and how doing astronomy in our cosmic backyard is tricky territory indeed.

Triggered Star Formation

20120329-121951.jpg

There’s a new Milky Way Project paper out on the arXiv. It was submitted to the Astrophysical Journal last week and concerns the topic of the triggered formation of massive stars. This study was lead by Sarah Kendrew and utilises the results of the first MWP paper (our catalogue of bubbles).

One of the main reasons for undertaking the MWP was to produce a large bubble catalogue that would allow statistical studies of star formation sites in our Galaxy. In the end we produced a list of bubbles ten times larger than the previous best catalogue in our first data release (DR1).

In this new study, we’ve used statistical techniques to see what correlations exist between the MWP bubbles and the RMS Catalogue: a well-used catalogue of infrared sources along the Galactic plane (a similar region to that covered by the Spitzer data used in the MWP).

The paper looks for any signs that there is a correlation between the positions of RMS sources and the positions of the MWP bubbles. Specifically we’re trying to see if such massive young stellar objects (MYSOs, stars being formed) are most commonly found on the rims of bubbles. If this is true, then it adds to evidence for a mode of star formation where the formation of some stars triggers the formation of others. In this case, young, hots stars blow out a bubble in the interstellar medium. During this process, clumps of material occur in which new stars condense and form.

This new study finds a strong correlation between MYSOs and the MWP bubbles. We find that Atwood thirds of the MYSOs surveyed are associated with bubbles and 22% are associated with bubble rims. We also see that larger bubbles are more likely to have MYSOs on their rims – though one of the main issues we encountered is that the effect of line-of-sight confusion makes the situation complicated.

This second paper is the first to follow on from the MWP DR1 paper, and there are more planned. You can read the paper on arXiv. The Milky Way Project itself, and this study, we’re presented at the UK/Germany National Astronomy Meeting this week in Manchester.

20120329-122008.jpg

Data Release 1

We submitted the first Milky Way Project paper to the Monthly Notices of the Royal Astronomical Society (MNRAS) in December and the referee has been very kind to us so far. We have our fingers crossed for acceptance soon. Thanks to recent media coverage and some awesome buzz at the recent AAS meeting we decided to go ahead and post our paper to the arXiv yesterday. In addition to the paper, which explains how the catalogue was created from all your bubble drawings, we have also made the data available on the MWP site. You can explore the data graphically or download various files on our data page.

DR1

Data release 1 (DR1) currently consists of a catalogue of large bubbles, a catalogue of small bubbles and a set of ‘heat maps’ (more on that in a moment). We are aiming to add green knots, red fuzzes, star clusters and galaxies to this list later in the year. We’ve called it DR1 because we also hope to refine and improve our catalogue – partly based on feedback from the community – and release a second set of data (DR2) later in 2012. Hopefully 2012 will be a big year for the project!

We have also nearly finished the process of cresting our ‘heat maps’. These are the maps of raw clicks that show the true crowd-sourced view of where bubbles are locate din our galaxy. they look amazing and are incredibly detailed and rich. They represent something new for the Zooniverse, and for the scientific community, and it will be interesting to see if they can be useful when released into the wild. If you’d like a sneak peak you can download one of the 3°x2° regions of the galaxy here and try it out – this is the region seen in the image above. It is a 5.4 MB FITS file, centred around 18° Galactic longitude and shows the raw bubble drawings that were used in the DR1 release. Every bubbles is given the same, tiny opacity and so as the bubbles coincide we start to see the regions of the sky where every agrees that bubbles are present. (A 220 MB file of public Spitzer data for this region can also be download as FITS here, for comparison.)

The other big change that we need to make to the site in the next few days is the release of an official ‘authors’ page, crediting all our citizen scientist volunteers. 40,000+ individuals have taken part in the MWP and those who contributed to DR1 will be credited on the site soon. I’ll blog when that happens to let you know.

Keep Clicking!

All of this doesn’t mean the MWP is over though: far from it. In fact, the classifications you make now will be collectively refining and improving the data we have produced so far. We  have plans, which i’ll explain at a later time, to modify the MWP interface so that each classifications contributed more efficiently to the final result. We also have new data to come in 2012 that will mean we can search for bubbles in whole new regions of the sky. Very exciting and there is much to look forward to in 2012!

Creating a Bubble Catalogue

In recent weeks, I’ve spent much of my time figuring out how to use all of your drawings to determine where the bubbles are in the Spitzer data. About a month ago we had a breakthrough. Thanks to a lengthy conversation with MWP science guru Matthew Povich, I realised that one of the reasons it is so hard to determine where a bubble should be drawn is that sometimes there is no right answer! There are many bubbles in the MWP that people would disagree on how to draw – the reason is that there is often not necessarily a right answer to the question “where is the bubble?”.

An example of just such a bubble is shown below, with all user drawings shown next to it. You can see that this bubble just isn’t that easy to draw and that there are even two or three structures within the image that one could call a bubble. Instead of trying to make this fit a rigid one-bubble definition, we realised that we should be using the human ability to recognise patterns. After all – this is exactly what you are all so good at, and computers are sometimes not.

Myself and Matthew decided that what we should do in these instances is simply allow two (or even three) bubbles to be deemed as ‘real’. The inner, red structure is a kind of bubble, and so is the open-ended green bubble just outside of it. One could also perceive a third bubble just below and to the left of these, and many people appear to have drawn just that. (This is in addition the multitude of smaller bubbles around the edge, of course). Whatever catalogue is produced by our data reduction, it probably should include at least the first two structures if enough people drawn them.

This decision has made creating a cleaned bubble catalogue much easier. The data reduction process described in my February blog post is still the process I’m using, although it has been greatly refined. More importantly, since February an enormous number of new bubbles have been drawn and this means the averaging process produces better results. Below you can see some results of the latest efforts and hopefully you’ll agree that what is being produced is a good catalogue, based on what you have all drawn. For the sake of testing, I am using one 3-by-2 degree section of the data. This is the region +12 degrees from the galactic centre and contains several interesting and complex features – which makes it a good testing ground.

Below you can see the 3×2 degree tile on its own, with all of your 7,000+ bubbles drawn on top and with the resultant ‘cleaned’ bubbles as well. You can click on any of the images to see the full version.

I have also been looking into other techniques for extracting the bubbles as the crowd sees them. Below you can see just the raw bubble data, drawn by users for this tile. With the background removed, we can use a simple contrast ratio to create a threshold, which we use to cut-out the bubbles from the original image.

This is another method for extracting data, and although it is harder to define a rigid catalogue of bubbles using this method, it may still have use in mapping regions of star formation in our galaxy.

What Are Yellowballs?

Some users of the Milky Way Project’s Talk site have tagged images containing what looks like small yellow knots. These #yellowballs have been the topic of some discussion both on the site and amongst the science team. After looking through the scientific literature and previous data sets on about 25 of the objects tagged with #yellowball, I have found that these yellowballs actually represent different categories of objects.

GLM_33730-0023_mosaic_I24M1

Some are compact or ultra-compact HII regions. Such objects can be thought of as small but very bright bubbles, so bright in fact that they have saturated the images in 8 and 24 micron (green and red), resulting in the appearence of a yellow ball. These small bubbles represent very early stages of the formation of massive stars, and are as such very interesting objects for the Milky Way Project science team! Examples include AMW43377df (above) and AMW435d93f (below). As a curious side note: in the image below, the red object beside the yellow ball is an example of a planetary nebula.

A yellowball image containing a planetary nebula.

The rest of the investigated objects turned out to be not bright enough to be compact HII regions. Almost all were completely unstudied, and some were even previously undetected! So what could they be? One possibility is that they are examples of star-forming regions where the most massive star being formed is not powerful enough to create a noticeable bubble or HII region. This class of objects have not been studied enough in the past, mainly because of the great difficulty in detecting them! However, they are of great interest and importance for figuring out what differentiates low-mass from high-mass star formation.

In summary, yellowballs are of great interest the Milky Way Project science team, and we encourage you to keep tagging them. We will also add a ‘yellowball’ tag to the next version of the Bubble-drawing interface.

Yellowballs are composed of different classes of objects and most of them have been too faint to catch the eyes of astronomers in previous years. Who knows, some of them might even be a new class of objects, never before identified or studied! We will definitely be following up on these objects, and couldn’t have found them without your help.

Reducing the Data

I’ve spent much of the past two weeks messing about with different ways to reduce down over 200,000 bubbles (now almost 220,000) into a sensible catalogue. This gets very messy so I will try and explain what I’ve been up to in stages. This is a process called data reduction and for a citizen science, crowd-sourced project like the MWP, it can get complicated. I thought it may interested some of you to see where we currently are in the process of turning your clicks into results.

The key part of the data reduction problem is that we have a very large set of data – the massive number of bubbles that have been drawn – and need to decide which among them are ‘similar’ to each other. We need to keep some flexibility of our definition of similarity because right now, I’m not sure what ‘similar’ means.

Essentially, bubbles are ‘similar’ when two people draw a similarly sized bubble in a similar location. This is something that sounds remarkably easy to say but was hard to do well in code. Comparing 200,000 bubbles to each other is obviously computationally intensive.

Screen shot 2011-02-22 at 10.23.07

In the end I decided that since the size of bubbles was a consideration then I would move across the galaxy, looking on ever-decreasing orders of size. To do this I split the galaxy into 2×2 degree boxes and take each box in turn. In each box I see if there are bubbles here that are of the order of the size of the box (meaning they have a maximum diameter that is between a half- and a whole-box). If there are bubbles on that scale I run a clustering algorithm and pick out groups of these bubbles with central positions clustered to within one quarter of the box size. If a cluster is found, those bubbles are then saved and removed from the whole list. I then divide the box into four and repeat until no bubble are found.

Screen shot 2011-02-22 at 10.22.42

This method means that when a box contains no bubbles, we need not continue down in size scale, but when it does contain bubbles we always split and inspect the four child boxes. In this way we move through the galaxy, in ever-decreasing boxes, but in a fairly efficient manner.

We also have to perform the same analysis with an offset grid. This is exactly the same but making sure we catch bubbles that had fallen on the borders of boxes.

Once we have passed across the galaxy on all size scales, we need to make sure we’ve cleaned up the duplicates created by the offset grid. We do this by considering our newly created list of ‘clean’ bubbles and running through them in order of size. When we find bubbles of a similar size and location they are combined, according to the number of users that drew that bubble. This can be done more easily now that there are far fewer bubbles (in my tests we have dropped to around 5% of the initial number by this stage).

Results

My initial run only looked at bubbles in the longitude range 0-30 degrees. Below are three images, showing one image from the MWP set (one of my favourites as lots of people see it differently). You can the the image, as it is shown to MWP users. Below that you see, overlaid in blue, the original bubbles as drawn by the users. In the third image you can see the same, but this time displaying the ‘cleaned’ results. In the original set the bubbles all have the same opacity, such that when they pile up you can see the similarities. The cleaned set gives the bubbles opacities according to their scores (think more opaque bubbles mean more users drew them).

GLM011680081mosaicI24M1

mwp_test_all_bubbles

mwp_test_clean_bubbles

It should be noted that the cleaned image does not yet display arcs, but rather always shows an entire ellipse. This is because I am not yet including the bubble cut-outs (which you can make out in the middle image) in the data reduction. These will be included at a later time.

You can see that I’m still getting some duplication at the end of the process – I may need to sweep across the final catalogue looking for similar bubbles until I reach a convergence when all bubbles are ‘unique’. I have been experimenting with this with mixed results but will continue my efforts.

If you’re still reading, I look forward to reading your comments. As I continue to make adjustments and progress with this reduction, I shall blog the results again. Many members of the science team are also having a go at this problem and so the final result may be quite different in the end as we improve things. I hope that this is an interesting insight into some of what goes on behind the scenes of the MWP.

Your Favourite Images

When you’re drawing bubbles, star clusters and everything else all over the Milky Way, you have the option to click a little ‘star’ button to mark an image as a favourite. These are then visible in the ‘My Galaxy’ portion of the site. Primarily this is done to let you keep hold of the images that you like the most. A side effect though is that we can see which images are collectively seen as the best by the Milky Way Project community.

Below you can see the 10 most-favourited images from the Milky Way Project. I’ll let the images speak for themselves. You can click on any of them to jump into Milky Way Talk where you can learn more about them or make a comment. These images also exists as a collection in Talk, where you can also comment and discuss them as a group.