“Integrating Parallel and Distributed Data Mining Algorithms into the NASA Earth Exchange (NEX)” was one of ten proposals selected from among fifty-four submissions to the Computational Modeling Algorithms and Cyberinfrastructure (CMAC) program under NASA’s ROSES 2011 call for proposals. This proposal includes Dr. Nikunj Oza (NASA) as principal investigator, Bryan Matthews (SGT) as co-investigator, and several colleagues from Code TN, California State University Monterey Bay (CSUMB), and the University of Minnesota. The project will run for two years.
NEX combines state-of-the-art supercomputing, Earth system modeling, workflow management, NASA remote sensing data feeds, and a knowledge sharing platform to deliver a complete work environment where users can explore and analyze large datasets, run modeling codes, collaborate on new or existing projects, and quickly share results among Earth science communities. NEX is currently under development. A portion of the NASA Ames Pleiades supercomputer and a substantial amount of disk space has been set aside for NEX. A portion of the virtual machine management has been completed. Some global Landsat-based science work has already been done as part of a pilot study. NEX is acquiring data from several NASA Distributed Active Archive Centers (DAACs) at a rate of about two terabytes per day. Current work is focused on developing seamless integration between the NEX web portal and the supercomputing assets.
NEX currently has a limited number of algorithms for data mining—the NASA Ames-developed anomaly detection algorithms IMS and Orca. Neither of these algorithms exploits the full power of Pleiades. NASA Ames has developed distributed data mining algorithms that need to be integrated into NEX. Currently, in order to analyze NEX-hosted data, users have to install and run their own or third-party algorithms on this vast amount of data. Running analysis tools not written by the user requires the user to obtain the tool, figure out how it works, write and run code to translate data between the form that it is stored in and the form where the tool uses it, run the tool, write and run code to translate the results into a desired form, and then assess the results. A further complication is that many tools are not ready to run within NASA’s supercomputing environment which powers the NEX platform.
BACKGROUND: The objective of this project is augmentation of NEX to include several existing components developed through various NASA programs:
This project will extend the anomaly detection framework to allow the incorporation of data mining algorithms developed at NASA and the University of Minnesota and will allow other users to incorporate their own machine learning, data mining, statistics, and other data analysis methods to best utilize NEX’s distributed supercomputing assets to analyze NEX datasets.
NASA PROGRAM FUNDING: Computational Modeling Algorithms and Cyberinfrastructure (CMAC) program
COLLABORATORS: PI: Nikunj C. Oza (NASA), co-I: Bryan Matthews (SGT), co-I: Rama Nemani (NASA), co-I: Vipin Kumar (University of Minnesota), co-I: Andrew Michaelis (CSUMB), and co-I: Petr Votava (CSUMB)
Contact: Nikunj C. Oza