Class Meetings
Tuesdays and Thursdays (9/27-12/6)
UCSC: noon-1:45pm PT, BE-156, LANL: 1-2:45pm MT, Access Grid Conf. Room
Wednesdays (10/10-12/5)
UCSC: 10am-noon PT, E2-369, LANL: 11am-1pm MT, carlosmalt
Schedule and Readings
Please retrieve the readings from the Web. Contact me if you have trouble finding them. The schedule and readings might change to accommodate guest speakers.
Guest speakers so far:
10/31: Byung-Gon Chun (Yahoo! Research):
Deconstructing Production MapReduce Workloads
11/8: Fred Douglis (EMC/DataDomain): Data Deduplication
11/13: Ike Nassi (UCSC): Flash
Thursday, September 27
Introduction
Tuesday, October 2
No meeting
Thursday, October 4
Uniprocessor file systems
Paper 1: M. K. McKusick, W. N. Joy, S. J. Leffler and R. S. Fabry, "A Fast File System for UNIX," ACM Transactions on Computer Systems 2(3), August 1984, pages 181–197.
Paper 2: M. Rosenblum and J. K. Ousterhout, "The Design and Implementation of a Log-Structured File System," ACM Transactions on Computer Systems 10(1), February 1992, pages 26-52.
Tuesday, October 9
File system usage patterns
Paper 1: M. G. Baker, J. H. Hartman, M. Kupfer, K. Shirriff, and J. Ousterhout, "Measurements of a Distributed File System," Proceedings of the 13th ACM Symposium on Operating Systems Principles (Monterey, CA), October 1991, pages 198-212.
Paper 2: D. Roselli, J. Lorch, and T. Anderson, "A Comparison of File System Workloads," Proceedings of the 2000 USENIX Technical Conference, June 2000.
[optional] W. Vogels, "File system usage in Windows NT 4.0," Proceedings of the 17th ACM Symposium on Operating Systems Principles (Kiawah, SC), December 1999.
Thursday, October 11
RAID
Paper 1: P. Cao, S. B. Lim, S. Venkataraman, and J. Wilkes, "The TickerTAIP Parallel RAID Architecture," ACM Transactions on Computer Systems 12(3), August 1994, pages 237–269.
Paper 2: J. Wilkes, R. Golding, C. Staelin, and T. Sullivan, "The HP AutoRaid Hierarchical Storage System," ACM Transactions on Computer Systems 14(1), February 1996, pages 108–136.
Tuesday, October 16
Distributed File Systems: Beginnings
Paper 1: D. Hitz, J. Lau, and M. Malcom, "File System Design for an NFS File Server Appliance," Proceedings of the Winter 1994 USENIX Conference, January 1994, pages 235–246.
Paper 2: J. H. Hartman and J. K. Ousterhout, “The Zebra striped network file system,” ACM Transactions on Computer Systems, vol. 13, no. 3, pp. 274–310, 1995.
Thursday, October 18
Distributed File Systems: Chunks and Objects
Paper 1: S. Ghemawat, H. Gobioff, and S.-T. Leung, “The Google file system,” in SOSP ’03, (Bolton Landing, NY), ACM, Oct. 2003.
Paper 2: S. A. Weil, S. A. Brandt, E. L. Miller, D. D. E. Long, and C. Maltzahn, “Ceph: A scalable, high- performance distributed file system,” in OSDI’06, (Seattle, WA), Nov. 2006.
Tuesday, October 23
SRL/ISSDM Symposium
Today is the yearly SRL/ISSDM symposium. The class is invited to attend this all-day event where students of the UCSC Systems Research Lab are presenting their work to invited guests from industrial and governmental research labs. Much of this work is directly related to this class.
Thursday, October 25
Distributed File Systems: P2P
Paper 1: I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan, “Chord: A scalable peer-to-peer lookup service for Internet applications,” in Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM ’01), (San Diego, CA), pp. 149– 160, 2001.
Paper 2: A. Muthitacharoen, R. Morris, T. M. Gil, and B. Chen, “Ivy: A read/write peer-to-peer file system,” in Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI), (Boston, MA), Dec. 2002.
Tuesday, October 30
Byung Gon Chun (Yahoo! Research): Deconstructing Production MapReduce Workloads
Abstract:
Large scale data analytics has become widespread. As the number of users increases, so too does the demand for improved performance, leading to efforts in academia and industry alike. However, performance improvements are inherently tied to the underlying workload: different workloads demand different and often opposing optimizations. To understand datacenter workloads, we characterize a two-month MapReduce trace from a \yahoo production cluster.
Our analysis aims to answer two high level questions: (1) can we succinctly characterize a workload's resource demands? and (2) can we explain a workload's performance as measured by job completion times? In response to the former, we show that Yahoo!'s MapReduce workload is highly predictable and can be characterized by a small number of classes that capture both a job's resource consumption and its semantic intent. Toward the latter, we present an in-depth evaluation of job completion times, identifying new factors that impact performance, quantifying the impact of previously discussed factors and identifying gaps that remain to be explored.
This is joint work with Kay Ousterhout (UCB), Amar Kamat (Yahoo!), and Sylvia Ratnasamy (UCB).
Thursday, November 1
Performance Virtualization
Paper 1: T. Kaldewey, A. Povzner, T. Wong, R. Golding, S. A. Brandt, and C. Maltzahn, “Virtualizing disk performance,” in RTAS 2008, (St. Louis, Missouri), April 2008.
Paper 2: A. Povzner, D. Sawyer, and S. Brandt, “Horizon: Efficient deadline-driven disk i/o management for distributed storage systems,” in HPDC 2010, 2010.
Tuesday, November 6
Security
Paper 1: J. D. Strunk, G. R. Goodson, M. L. Scheinholtz, C. A. N. Soules, and G. R. Ganger, “Self-securing storage: Protecting data in compromised systems,” in Proceedings of the 4th Symposium on Operating Systems Design and Implementation (OSDI), pp. 165–180, Oct. 2000.
Paper 2: A. W. Leung, E. L. Miller, and S. Jones, “Scalable security for petascale parallel file systems,” in Proceedings of the SC07, (Reno, NV), Nov. 2007.
Thursday, November 8
Fred Douglis (EMC): Deduplication
Paper 1: B. Zhu, K. Li, and H. Patterson, “Avoiding the disk bottleneck in the data domain deduplication file system,” in FAST 2008, (San Jose, CA), 2008.
Paper 2: M. Lillibridge, K. Eshghi, D. Bhagwat, V. Deolalikar, G. Trezise, and P. Camble, “Sparse indexing: Large scale, inline deduplication using sampling and locality,” in FAST 2009, (San Jose, CA), February 24-27 2009.
Paper 3: G. Wallace, F. Douglis, H. Qian, P. Shilane, S. Smaldone, M. Chamness, and W. Hsu, “Characteristics of backup workloads in production systems,” in FAST 2012, (San Jose, CA), February 2012. [optional]
Paper 4: D. T. Meyer and W. J. Bolosky, “A study of practical deduplication,” in FAST 2011, (San Jose, CA), February 2011. [optional]
Tuesday, November 13
Flash (instructor: Ike Nassi)
Paper 1: M. Balakrishnan, D. Malkhi, V. Prabhakaran, T. Wobber, M. Wei, and J. D. Davis, “Corfu: A shared log design for flash clusters,” in NSDI’12, (San Jose, CA), April 2012.
Paper 2: N. Liu, J. Cope, P. Carns, C. Carothers, R. Ross, G. Grider, A. Crume, and C. Maltzahn, “On the role of burst buffers in leadership-class storage systems,” in MSST/SNAPI 2012, (Pacific Grove, CA), April 16 - 20 2012.
Thursday, November 15
No meeting
Tuesday, November 20
Big Data: Key/Value Stores
Paper 1: G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels, “Dynamo: Amazon’s highly available key-value store,” in Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP ’07), pp. 205–220, 2007.
Paper 2: R. Geambasu, A. A. Levy, T. Kohno, A. Krishnamurthy, and H. M. Levy, “Comet: An active distributed key-value store,” in OSDI’10, (Vancouver, BC, Canada), October 4-6 2010.
Thursday, November 27
Storage of array data
Paper 1: E. Soroush, M. Balazinska, and D. Wang, “Arraystore: A storage manager for complex parallel array processing,” in SIGMOD ’11, (Athens, Greece), June 12–16 2011.
Paper 2: A. Seering, P. Cudre-Mauroux, S. Madden, and M. Stonebraker, “Efficient versioning for scientific array databases,” in ICDE’12, 2012.
Tuesday, November 29
Large Data Capture
Paper 1: J. Bent, G. Gibson, G. Grider, B. McClelland, P. Nowoczynski, J. Nunez, M. Polte, and M. Wingate, “Plfs: A checkpoint filesystem for parallel applications,” in SC09, (Portland, OR), November 2009.
Paper 2: D. Bigelow, S. Brandt, J. Bent, and H. Chen, “Valmar: High-bandwidth real-time streaming data man- agement,” in MSST ’12, (Lake Arrowhead, CA), May 6-10 2012.
Thursday, December 4
Final presentations
Tuesday, December 6
Final presentations
Friday, December 14
Project reports due