All posts by ttfnrob

Astronomer, Google, Father to humans and animals

A New Paper All About #yellowballs

PIA18909_fig1

There is a new Milky Way Project paper in the news today, concerning the #yellowballs that were found by Milky Way Project volunteers.

The Yellowballs appeared on the very first day of the Milky Way Project when user kirbyfood asked ‘what is this?’ and I wasn’t sure so jokingly called it a ‘#yellowball’, since that’s what is looked like. We use hashtags on talk.milkywayprojct.org, and that user, and many others, went off and tagged hundreds of the things over the next few months. Before we knew it there was a catalogue of nearly 1,000 of them. However, we still didn’t know what they really were, and so Grace Wolf-Chase, Charles Kerton, and other MWP collaborators have put a lot of effort into figuring it out. From the JPL press release:

So far, the volunteers have identified more than 900 of these compact yellow features. The next step for the researchers is to look at their distribution. Many appear to be lining the rims of the bubbles, a clue that perhaps the massive stars are triggering the birth of new stars as they blow the bubbles, a phenomenon known as triggered star formation. If the effect is real, the researchers should find that the yellow balls statistically appear more often with bubble walls.

This new paper is the fourth from the Milky Way Project, and adds to the Zooniverse’s growing list of 80+ publications made possible thanks to our amazing volunteers. You can see the complete set at zooniverse.org/publications.

The full list of volutneers who helped tag the yellowballs is shown below. Each and everyone one of you made a valuable contribution to this paper. Thank you to everyone who helped in this search!

KhalilaRedBird, lpspieler, greginak, LarryW, chelseanr, broomrider1970, Dealylama, Cruuux, Mirsathia, suelaine, sdewitt, stukii, kmasterdo, PattyD, HeadAroundU, Fezman92, Jakobswede, Jk478B27Ds395, Kerry_Wallis, iacomo, Ken Koester, ttfnrob, jules, Falconet, Caidoz13, Starsheriff, ascil, simonron, tyna_anna, gwolfchase, Greendragon00, Ranchi, kirbyjp, githensd, katieofoz, harbinjer, ycaruth1, embo, echong, Feylin, stock_footage, zookeeper, joke slayer, karvidsson, Furiat, Tyler Reynolds, Manjingos, cathcollins, legoeeyore, GabyB, eshafto, mtparrish, 59Vespa, amatire, TheScribblery, pschmal, Helice, norfolkharryuk, WilB, jamesw40k, koenvisser, dragonjools, Nocterror, nunyaB, hansbe, meheler, Cahethel, Alice, stellar190, mabbenson, Embyrr922, gnome_king, jumpjet2k, tchan, yoman93, and Loulouuse.

Our Sentimental Galaxy

More than 25,000 comments have been made on Milky Way Project Talk since the project began in 2010. That’s a lot of content in itself – beyond the main classification data from the MWP’s main interface.

I’ve been using the Python-based Natural Language Toolkit (NLTK) to perform what’s called sentiment analysis on Zooniverse Talk data. Some of the most stunning results come from the Milky Way Project’s rich dataset.

The process is oddly simple – thanks mostly to NLTK’s great documentation. You train an algorithm to recognise positive and negative words and phrases in text – and then go though all the MWP subjects in Talk looking at the things people say about them, and recording whether the comments are positive or negative. If a comment is really positive (e.g. people say ‘stunning’, ‘wonderful’, ‘brilliant’) then it gets a score around 1. If it’s negative (e.g. people say ‘horrible’, ‘stupid’, ‘disgusting’) then it gets a score of 0. Of course most subjects come in somewhere in between.

So here are the results: the 20 most-positively commented on images from the MWP (click to embiggen). It’s a lovely set, and you can see why people were so positive about these images.

On the flip side, here are the 20 most-negatively commented on images. You see a mix of difficult to classify and blown-out images.

I’m now looking at ways to use this sort of sentiment analysis to extract interesting images from Talk and highlight them to moderators and science teams. It’s something I’ve been toying with on-and-off for several projects – not just the MWP. The Zooniverse Advent Calendar seems like a great time to share and see what people think of this idea.

You can find my code on GitHub along with other examples. As well as the MWP there are galleries for Galaxy Zoo and  Snapshot Serengeti.

Combining Your Clicks with Milkman

I’ve been building a new app for the Milky Way Project called Milkman. It goes alongside Talk and allows you to see where everyone’s clicks go, and what the results of crowdsourcing look like. It’s open source, and a good step toward open science. I’d love feedback from citizen scientists and science users alike.

Milkman

Milkman is so called because it delivers data for the Milky Way Project, and maybe eventually some other Zooniverse projects too. You can access Milkman directly at explore.milkywayproject.org (where you can input a Zooniverse subject ID or search using galactic coordinates), or more usefully, you can get to Milkman via Talk – using the new ‘Explore’ button that now appears for logged-in users.

Clicking ‘Explore’ will show you the core view of Milkman: a display of all the clicks from all the volunteers who have seen that image and the current, combined results.

Screenshot 2014-09-09 09.14.38

Milkman 2

Milkman is a live, near-realtime view of the state of the science output from the current Milky Way Project. It might help people discussing items on Talk to understand what other objects are in the MWP images, and it hopefully shows how volunteers’ clicks are used.

Milkman uses a day-old clone of the main Zooniverse database, which means the clicks are at most 24 hours old. The clustering is performed using a technique called DBSCAN, which takes the vast array of clicks on each image and tries to automatically group them up. The resultant, averaged bubbles, EGOs, clusters, and galaxies are often better than any individual drawing, showing the power of crowdsourcing in acton.

Milkman is open source on GitHub and I’m happy to accept issues and feedback through the repo’s issues.

Immediate plans for Milkman include a navigable map on the homepage (to let you explore the whole galaxy), better links to other public astronomical data, and access to the current state of the reduced MWP2 catalogue as a whole. If you have ideas or requests either contact me or create an issue on GitHub.

New MWP paper outlines the powerful synergy between citizen scientists, professional scientists, and machine learning

bubble_gallery_sorted_v2

A new Milky Way Project paper was published to the arXiv last week. The paper presents Brut, an algorithm trained to identify bubbles in infrared images of the Galaxy.

Brut uses the catalogue of bubbles identified by more 35,000 citizen scientists from the original Milky Way Project. These bubbles are used as a training set to allow Brut to discover the characteristics of bubbles in images from the Spitzer Space Telescope. This training data gives Brut the ability to identify bubbles just as well as expert astronomers!

The paper then shows how Brut can be used to re-assess the bubbles in the Milky Way Project catalog itself, and it finds that more than 10% of the objects in this catalog are really non-bubble interlopers. Furthermore, Brut is able to discover bubbles missed by previous searches too, usually ones that were hard to see because they are near bright sources.

At first it might seem that Brut removes the need for the Milky Way Project –  but the ruth is exactly the opposite. This new paper demonstrates a wonderful synergy that can exist between citizen scientists, professional scientists, and machine learning. The example outlined with the Milky Way Project is that citizens can identify patterns that machines cannot detect without training, machine learning algorithms can use citizen science projects as input training sets, creating amazing new opportunities to speed-up the pace of discovery. A hybrid model of machine learning combined with crowdsourced training data from citizen scientists can not only classify large quantities of data, but also address the weakness of each approach if deployed alone.

We’re really happy with this paper, and extremely grateful to Chris Beaumont (the study’s lead author) for his insights into machine learning and the way it can be successfully applied to the Milky Way Project. We will be using a version of Brut for our upcoming analysis of the new Milky Way Project classifications. It may also have implications for other Zooniverse projects.

If you’d like to read the full paper, it is freely available online at at the arXiv – and Brut can found on GitHub.

1,000,000 Classifications and 7 Languages

The Milky Way Project has now passed one million classifications since its relaunch a few months ago. The project is currently 75% complete, meaning there are still many, many images left to classify. Which is fine because in fact the project has become truly international lately – with citizen scientists around the world now able to participate in English, Spanish, German, French, Indonesian, Polish and Danish. There are more languages on the way too!

So to celebrate passing our 1,000,000 milestone I thought I’d share the homepage counter in all seven languages:

If you’re interested in helping to translate the Milky Way Project then get in touch with rob@zooniverse.org – and there are man other translatable project too!

A New Batch of Milky Way Project Data Has Arrived

After a busy December and January we ran out of data a few weeks ago after 600,000+ classifications of the new images – but the wait is over! Last night a whole new, bigger, batch of data was added to the Milky Way Project. Here’s a few examples of what you might see in the data:

These new data come from the GLIMPSE 2 survey – a comprehensive survey of the middle-part of our galaxy in the infrared. We’re also going to be adding in some of the GLIMPSE 1 data (from the old version of the Milky Way Project) back into the site but with the new colour stretch. We’re doing to that to check the system works, but also because new features and structures will be visible with the change in data and colour palette.

We’re still crunching the data from the new classifications, but we’ve been able to extract lists of galaxies, EGOs and star clusters that you have found. We hope to share those with you soon.

So hop on over the milkywayproject.org and let’s add another 600,000 classifications and continue mapping the galaxy.

The Project Is Complete… But Not For Long

After a fantastic (re)launch in December and a busy January, the Milky Way Project was doing well and was about 93% complete… until about 8 hours ago. Last night, the social media powerhouse that is IFLS pointed tens of thousands of people our way and in an hour they finished the project. This is obviously great news for science but some people might be wondering what happens next. 

MWP

The good news is that we have more data! The bad news is that it won’t be ready for another few weeks. In the meantime we are also working on producing some results from all your work, and you can continue to discuss things on Talk. We’ll let everyone know when we have more images to classify but for now: thank you for all your hard work and attention.

We shall return!

New Data, New Look: A Brand New Milky Way Project

The Milky Way Project (MWP) is complete. It took about three years and 50,000 volunteers have trawled all our images multiple times and drawn more than 1,000,000 bubbles and several million other objects, including star clusters, green knots, and galaxies. We have produced several papers already and more are on the way. It’s been a huge success but: there’s even more data!

And so it is with glee that we announce the brand new Milky Way Project! It’s got more data, more objects to find, and it’s even more gorgeous.

The new MWP is being launched to include data from different regions of the galaxy in a new infrared wavelength combination. The new data consists of Spitzer/IRAC images from two surveys: Vela-Carina, which is essentially an extension of GLIMPSE covering Galactic longitudes 255°–295°, and GLIMPSE 3D, which extends GLIMPSE 1+2 to higher Galactic latitudes (at selected longitudes only). The images combine 3.6, 4.5, and 8.0 µm in the “classic” Spitzer/IRAC color scheme.  There are roughly 40,000 images to go through.

An EGO shines below a bright star cluster
An pair of EGOs shine below a bright star cluster

The latest Zooniverse technology and design is being brought to bear on this big data problem. We are using our newest features to retire images with nothing in them (as determined by the volunteers of course) and to give more screen time to those parts of the galaxy where there are lots of pillars, bubbles and clusters – as well as other things. We’re marking more objects –  bow shocks, pillars, EGOs  – and getting rid of some older ones that either aren’t visible in the new data or weren’t as scientifically useful as we’d hoped (specifically: red fuzzies and green knots).

We’ve also upgraded to the newest version of Talk, and have kept all your original comments so you can still see the previous data and the objects that were found there. The new Milky Way Project is teeming with more galaxies, stars clusters and unknown objects than the original MWP.

It’s very exciting! There are tens of thousands of images from the Spitzer Space Telescope to look through. By telling us what you see in this infrared data, we can better understand how stars form. Dive in now and start classifying at www.milkywayproject.org – we need your help to map and measure our galaxy.

New Milky Way Project Poster

I’ve been diving into the bubbles database recently and ended up creating cutouts of all 3,744 large bubbles from the DR1 data release. From there it was an easy enough job to create this new Milky Way Project poster. It uses all 3,744 bubbles at least once (several are used more than once).

MWP Logo Mosaic of Bubbles

I’m currently working on three new Milky Way Project papers and will be blogging about them in the next weeks and months.