The data collection and data management work flow at sector 8-ID as currently used is summarized below:
Step 1,2 :
The area-detector (AD) software writes images to a remote disk server (DServe) through a network mount point.
The current storage capacity allocated for 8-ID at DServe is around 8-10 TB (tera-bytes).
The 8 TB allows the beam-line staff to store 1-2 cycles worth of data before cleaning up the disks.
Step 3,4 :
The SPEC script that controls the area-detector along with other data acquisition equipment, transfers each collected data-set using GridFTP to two remote disk systems: (i) Lustre (ii) HDFS (Hadoop distributed file system).
The HDFS copy is used for running the MapRecuce analysis.
Whereas, the Lustre copy is used for long term storage and sharing with users through Globus-Online.
Howerver, they don't trust Lustre storage because of its reliability track record.
Step 5 :
The files are made available through Globus-connect. However, 8-ID is currently not actively using GO.
The dataset collected is typically an IMM file (images are stored as stream of bytes, along with a per image fixed size header). The meta-information gather by SPEC script is stored as a .batchinfo file.
The datasets are named as follows:
The analysis is done using MapReduce + HDFS, user input dataset path to Matlab script that submits a job to Hadoop using an ActiveMQ backed pipe-line.
The data is periodically backed-up on tape drives and kept there forever.
Suresh maintains a list of runs stored on each tap-drive for recovery.
Files are removed from dServs every two cycles. (TODO: Add file removal policy on Lustre).
- Want a way to automatically keep generated analysis with the captured data.
- Lustre needs to be more reliable.
Sector 2 and 32
The data collection and data management work flow at sector 2 and 32 as currently used is summarized below:
Step 1,2 :
The AD software writes images directly to locally attached Raid drive.
The Raid has about 12 TB of storage.
The storage requirement for different experiment and detector type vary alot.
The slower camera generates about 3 TB in a week.
The faster camera can generate up-to 10 TB in 3 days.
Step 3 :
The beam-line staff selects and copies the datasets from detector machines to a separate workstation.
The copy is done manually over the network mount point.
Step 4 :
Reconstruction is done on the data.
If the data size is too big to complete the reconstruction in few days, they hand over the data and reconstruction software to the user.
Step 5 :
The result of reconstruction and original data-set is copied to set of external hard-drives that users bring with them.
Step 6 :
The result and original dataset is copied to Lustre over network mount point.
There is no tape based back-up.
The data is mostly kept at local Raid drive for about 3-months or until they run out of space.
Datasets are removed starting from the oldest collected date.
They keep datasets for few collaborators for a much longer time.
Before removing the files, an email is sent to the users notifying them about the file removal.
The beam-line staff makes sures that they hear back from the users before removing a data-set.
- Lot of manual steps that can be automated e.g. copying files from the detector machine to reconstruction machine etc.
- Policy based management of user data e.g. what to delete and when, automatically remind users of file deletion.
- Lustre should be more reliable.