Accession Number ADA575859
Title Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing.
Publication Date Dec 2012
Media Count 17p
Personal Author H. Li M. Zaharia S. Shenker T. Das T. Hunter
Abstract Many 'big data' applications need to act on data arriving in real time. However, current programming models for distributed stream processing are relatively low-level often leaving the user to worry about consistency of state across the system and fault recovery. Furthermore, the models that provide fault recovery do so in an expensive manner, requiring either hot replication or long recovery times. We propose a new programming model discretized streams (D-Streams), that offers a high-level functional API, strong consistency, and efficient fault recovery. D-Streams support a new recovery mechanism that improves efficiency over the traditional replication and upstream backup schemes in streaming databases-parallel recovery of lost state-and unlike previous systems also mitigate stragglers. We implement D-Streams as an extension to the Spark cluster computing engine that lets users seamlessly intermix streaming, batch and interactive queries. Our system can process over 60 million records/second at sub-second latency on 100 nodes.
Keywords Computer programming
D-streams(Discretized streams)
Distributed data processing
Fault tolerant computing


 
Source Agency Non Paid ADAS
NTIS Subject Category 62B - Computer Software
Corporate Author California Univ., Berkeley. Dept. of Electrical Engineering and Computer Science.
Document Type Technical report
Title Note Technical rept.
NTIS Issue Number 1319
Contract Number FA8650-11-C-7136

Science and Technology Highlights

See a sampling of the latest scientific, technical and engineering information from NTIS in the NTIS Technical Reports Newsletter

Acrobat Reader Mobile    Acrobat Reader