toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
  Record Links
Author Cristina L. Abad; Yi Lu; Roy H. Campbell pdf  url
  Title DARE: Adaptive Data Replication for Efficient Cluster Scheduling Type Conference Article
  Year 2011 Publication (up) IEEE International Conference on Cluster Computing, 2011 Abbreviated Journal  
  Volume Issue Pages 159 - 168  
  Keywords MapReduce, replication, scheduling, locality  
  Abstract Placing data as close as possible to computation is a common practice of data intensive systems, commonly referred to as the data locality problem. By analyzing existing production systems, we confirm the benefit of data locality and find that data have different popularity and varying correlation of accesses. We propose DARE, a distributed adaptive data replication algorithm that aids the scheduler to achieve better data locality. DARE solves two problems, how many replicas to allocate for each file and where to place them, using probabilistic sampling and a competitive aging algorithm independently at each node. It takes advantage of existing remote data accesses in the system and incurs no extra network usage. Using two mixed workload traces from Facebook, we show that DARE improves data locality by more than 7 times with the FIFO scheduler in Hadoop and achieves more than 85% data locality for the FAIR scheduler with delay scheduling. Turnaround time and job slowdown are reduced by 19% and 25%, respectively.  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language English Summary Language English Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes Approved yes  
  Call Number cidis @ cidis @ Serial 21  
Permanent link to this record
Select All    Deselect All
 |   | 

Save Citations:
Export Records: