The NSF Astronomy Division recognized the value that the SDSS data would have to the astronomy community when it began its support of the construction and commissioning of the SDSS in 1994. The Astrophysical Research Consortium (ARC), which manages the SDSS, was required to provide the NSF with an acceptable plan for the distribution of the data to the astronomy community. The SDSS management prepared a public data distribution plan and submitted it to the Program Manager for Advanced Technologies and Instrumentation in the late fall of 1998. The plan was subsequently peer reviewed by six astronomers, under the Program Manager’s direction, and their comments were incorporated into the plan. After several iterations between the reviewers and the Program Manager, the SDSS management produced the plan presented in this document, which we will refer to as the 1999 NSF-SDSS Plan. The ATI Program Manager approved the plan in April 1999 and it has been the foundation of our schedule to release the data to the astronomy community. We propose to augment this schedule with an early release in mid 2001, which will precede the first scheduled release in January 2003 as specified in the 1999 NSF-SDSS Plan. This document describes the 1999 NSF-SDSS Plan, including the schedule and the data products, and then describes the early data release .The SDSS data products vary in complexity and size. The full volume of data is beyond the capability of most users to store or utilize effectively. Nevertheless, a single user can undertake a significant research project with just the positions and redshifts of “only” one million galaxies. Some research projects do not require data from the complete SDSS area and hence can be accomplished before the observing phase is complete by using data that are processed and calibrated as the survey progresses. Some require significantly more information about each object than the simple redshift catalog provides, such as the measured photometry of objects and corrected image frames. Some projects may choose to use different calibration procedures or even different processing algorithms. These latter projects require the type of computing facilities that only major computing centers possess. This data distribution plan takes these various possibilities into consideration.
There is also a trade-off between the prompt availability of the data to users during the survey and the integrity of the calibrations. Recalling data is like recalling Firestone Tires, it damages the credibility of the entire Survey. It is in everyone’s best interests to insure that that the released data are of high quality and have reliable calibrations.
The schedule for the release to the astronomy community, as defined in the 1999 NSF-SDSS Plan, is shown in the milestone-chart in Figure 1. Each milestone corresponds to a specific date and the release of a well-defined percentage of the final data sample. The size of the final data sample is determined by the five-year baseline survey.
The interplay between the photometric survey and the spectroscopic survey defines the SDSS observing strategy by imposing two well-defined “points of no return” on the data processing. The first occurs when a set of imaging data is determined to be good enough to allow target selection. The second occurs when the spectroscopic reductions are good enough to complete a particular “tile” on the sky. The first event is a particularly hard boundary: once we drill plates with hole positions fixed for individual targets and obtain their spectra, it would be very costly to discover that we had made an erroneous selection of objects and be obliged to re-tile, re-drill, and re-observe. Thus it is essential for the efficiency of the survey that we have a clearly defined "point of no return" for target selection. In effect, the scientific requirements related to the homogeneity of the spectroscopic survey define the timing and other procedures for acceptance of the photometric data. The schedule for the data distribution is referenced to these two points of no return.
In order to provide a statistically meaningful version of the data archive, we will release the data in yearly quanta. The complexity of the SDSS data and the need for time consuming, repeated verifications of the calibrations creates the latency; the time interval between the time a quantum of data is processed and calibrated and the time the data quality has been determined to have met survey requirements. The latency also includes the time it takes to package the data for distribution. This latency, which is similar to the one adopted by COBE, was expected to be eighteen months at the time of the first release. We expect that it could be gradually decreased to one year by the fifth year of the survey. As noted later, the time to bring the calibration up to survey requirements has taken longer than we had planned when the NSF approved the plan.
Figure 1 shows the milestones for the original 1999 plan. They determine the date of each data release and the fraction of the final data sample released. The triangles in Figure 1 define the critical milestones. These are the beginning of the survey, the dates that define the last observation to be included within each yearly quantum of data that will be released to the astronomy community, and the end of survey observations. We show these for the two main data components, the photometric catalog and the spectroscopic sample. The intermediate dates were chosen to be July 1, since the survey’s primary focus is the North Galactic Cap. The planning assumptions that determined the schedule are: observations of the northern sky are made during the first two quarters of every year, the third quarter is largely lost to the monsoon season, and the northern sky can be observed for about one month during the fourth quarter. The gray line shows the accumulation of imaging data. The black line shows the accumulation of processed and calibrated imaging data. This line shows when plates can be drilled thus enabling the spectroscopic survey. The dashed line shows the accumulation of spectroscopic data. The data obtained prior to the “points-of-no-return” is quantized by the mid-year milestones and will be released at the times shown by the tip of the arrows. The time to process the spectroscopic data is included in the length of the gray arrow. The date of the first release specified in the 1999 NSF-SDSS plan will occur in January 2003 and it will follow the July 2001 milestone by eighteen months. The vertical positions of the arrows define the total percentage of the final data sample available to the astronomy community at the time of the data release. The initial latency was intended to provide extra time to completely revise the calibration procedure if a problem were to be discovered during the first year of the survey.
Table 1. Data Products
Product Size Form 1. Complete Redshift Catalog 2 GB CD-ROM, ftp 2. Compact Photometric Catalog 60 GB CD-ROM, ftp 3. Survey Description (Status, Calibrations) 1 GB CD-ROM, www 4. Full photometric catalog 400 GB On-line, SX 5. Atlas Images 1.5 TB On-line, SX 6. Compressed Sky Map 300 GB On-line, ftp 7. ID Spectra 60 GB On-line, SX 8. Calibrations 5 GB On-line, SX, ftp 9. Corrected Imaging Frames 15 GB On-line, ftp
The survey started three months later than we had forecast eighteen months ago. We are also finding that photometric calibrations with a precision of 2% are very difficult and are taking much more time than we had planned. All of these factors would be expected to delay the date of the first release of data. While the observing phase of the survey did not begin until April of this year, science quality data was obtained during the commissioning period, late 1998 and the first three quarters of 1999. This data sample has yielded impressive results and we have concluded that the data, once properly processed and calibrated, is worthy of distributing to the astronomy community. We expect to apply survey quality calibrations to the commissioning data before March of 2001. For these reasons, we now plan to release a statistically significant sample of the commissioning data during the second quarter of 2001, more than a year ahead of the date of the initial release specified in the 1999 NSF-SDSS Plan. Moreover, we intend to maintain the intermediate milestones for data release specified in the 1999 NSF-SDSS Plan, albeit with smaller fractions of the total data sample. We will maintain the date for the final data release even though observations will continue until the end of March 2005.6.1. Details of the Early Data ReleaseWe propose a new data distribution schedule because the long commissioning period allowed us to accumulate about 400 square degrees of scientifically valuable data before we resumed observations after repairing the secondary mirror. The chart in Figure 2 shows the milestones for the new schedule for releasing the data. Somewhat arbitrarily, we made January 1, 2000 the effective starting date of the survey in Figure 2, because it simplifies the graphical presentation. Nevertheless, we plan to use that date as the starting point for the schedule for the release of data to the astronomy community. Regular accumulation of raw photometric data began in April 2000, and will end in the summer of 2004, except for some limited opportunities at the end of 2004. As the observing phase draws to a close, the opportunity to take imaging data, carry out target selection, drill plates and take spectra of the associated portion of the sky vanishes. Observations for the spectroscopy are now expected to begin in the last month of 2000 and are scheduled to continue to the end of March 2005. In order to meet the final milestone for the release of the complete sample we will have to decrease the latency to nine months.
Table 2. Dates for SDSS Data Release
Release date Photometry Spectroscopy Early release 1-July-2001 5% 0% Release 1 1-Jan-2003 15% 7% Release 2 1-Jan-2004 47% 33% Release 3 1-Oct-2004 68% 60% Release 4 1-July-2005 88% 85% Final 1-July-2006 100% 100% Figure 2. Milestones and data fractions for the new SDSS Data Distribution Plan.
The early data release will contain nearly 400 square degrees of area on the equator, in both the Northern and Southern skies. There will also be a small selected area of about 5 square degrees in the Northern Galactic Cap, which we observed in Spring 2000, to support the First Look Survey (FLS) of the SIRTF program. It will also contain the early spectra that were obtained from the same areas of the sky. We propose to use two processes to make the data available to the astronomy community: open access and controlled access In the former case we will put all of the products, except the Atlas Images, the Science Data Base (SX), and the Corrected Frames, on the SDSS website. The latter (controlled access) will contain SX and the Atlas Images, as shown on Figure 3. These are exactly the same services as the ones provided for the SDSS Collaboration. We feel that the step from supporting the approximately 200 users in the Collaboration to supporting the whole astronomy community of several thousand is a rather major one, thus we need to proceed carefully – many of the Fermilab resources are shared with the entire experimental program at Fermilab.
Open Access will consist of a web-based interface, containing a database of gif images of the data, with a clickable access to the catalog information, and a simple search engine to the full photometric catalog. This will serve as a finding chart (hereafter: chart). At the same time we will also provide on-line ftp access to the Compact Photometric Catalog and Calibration information, and to Status Information via the SDSS web site. These services will be built in collaboration with Jim Gray (Microsoft), and the necessary hardware will be provided by a grant from Microsoft Research. The ftp site will also contain the corrected frames for all the internal files for the 5 square degree FLS area of the sky, with the documentation required to use the system.
Controlled Access will provide the same data products as will be found on the web-based interface. In addition, it will provide access to the high-performance search engine (SDSS Science Archive – SX) built for the survey. This will enable much larger and more sophisticated queries. We propose to create an atlas image server that will be accessible to the astronomy community. It will not only supply the individual atlas images, but it will be able to reconstruct a corrected frame in FITS format from the atlas images.
Each user will be requested to apply for an account to use the Fermilab Computing Resources, just as all members of the Collaboration have done. If we find that the resources are easily managed within the available personnel and funds, we will make SX accessible through our website. The funds to implement the SX and create the Atlas Image Server (not yet implemented) for the astronomy community were requested in the August 2000 proposal to the NSF.
Figure 3 shows the services the SDSS public data distribution system will provide. The Open Access components are the Finding Chart, the FTP Server and the WWW server. The Controlled Access side will provide access to the Science Archive (SX) and later to the Atlas Image Server. As we understand the usage patterns better, we may consider moving the Atlas Image service to the Open Access side. The facilities for these early services should be considered as a beta test of our ultimate Data Distribution System. These will be continued until the final Data Distribution System is in place.
