可以在线阅读
分布式领域论文译序sql&nosql年代记SMAQ:海量数据的存储计算和查询一.google论文系列1. google系列论文译序2. The anatomy of a large-scale hypertextual Web search engine (译 zz)3. web search for a planet :the google cluster architecture(译)4. GFS:google文件系统 (译)5. MapReduce: Simplied Data Processing on Large Clusters (译)6. Bigtable: A Distributed Storage System for Structured Data (译)7. Chubby: The Chubby lock service for loosely-coupled distributed systems (译)8. Sawzall:Interpreting the Data--Parallel Analysis with Sawzall (译 zz)9. Pregel: A System for Large-Scale Graph Processing (译)10. Dremel: Interactive Analysis of WebScale Datasets(译zz)11. Percolator: Large-scale Incremental Processing Using Distributed Transactions and Notifications(译zz)12. MegaStore: Providing Scalable, Highly Available Storage for Interactive Services(译zz)13. Case Study GFS: Evolution on Fast-forward (译)14. Google File System II: Dawn of the Multiplying Master Nodes15. Tenzing - A SQL Implementation on the MapReduce Framework (译)16. F1-The Fault-Tolerant Distributed RDBMS Supporting Google's Ad Business17. Elmo: Building a Globally Distributed, Highly Available Database18. PowerDrill:Processing a Trillion Cells per Mouse Click19. Google-Wide Profiling:A Continuous Profiling Infrastructure for Data Centers20. Spanner: Google’s Globally-Distributed Database(译zz)21. Dapper, a Large-Scale Distributed Systems Tracing Infrastructure(笔记)22. Omega: flexible, scalable schedulers for large compute clusters23. CPI2: CPU performance isolation for shared compute clusters24. Photon: Fault-tolerant and Scalable Joining of Continuous Data Streams(译)25. F1: A Distributed SQL Database That Scales26. MillWheel: Fault-Tolerant Stream Processing at Internet Scale(译)27. B4: Experience with a Globally-Deployed Software Defined WAN28. The Datacenter as a Computer29. Google brain-Building High-level Features Using Large Scale Unsupervised Learning30. Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing(译zz)31. Large-scale cluster management at Google with Borg google系列论文翻译集(合集)二.分布式理论系列00. Appraising Two Decades of Distributed Computing Theory Research 0. 分布式理论系列译序1. A brief history of Consensus_ 2PC and Transaction Commit (译)2. 拜占庭将军问题 (译) --Leslie Lamport3. Impossibility of distributed consensus with one faulty process (译)4. Leases:租约机制 (译)5. Time Clocks and the Ordering of Events in a Distributed System(译) --Leslie Lamport6. 关于Paxos的历史7. The Part Time Parliament (译 zz) --Leslie Lamport 8. How to Build a Highly Available System Using Consensus(译)9. Paxos Made Simple (译) --Leslie Lamport10. Paxos Made Live - An Engineering Perspective(译) 11. 2 Phase Commit(译) 12. Consensus on Transaction Commit(译) --Jim Gray & Leslie Lamport 13. Why Do Computers Stop and What Can Be Done About It?(译) --Jim Gray 14. On Designing and Deploying Internet-Scale Services(译) --James Hamilton 15. Single-Message Communication(译)16. Implementing fault-tolerant services using the state machine approach 17. Problems, Unsolved Problems and Problems in Concurrency 18. Hints for Computer System Design 19. Self-stabilizing systems in spite of distributed control 20. Wait-Free Synchronization 21. White Paper Introduction to IEEE 1588 & Transparent Clocks 22. Unreliable Failure Detectors for Reliable Distributed Systems 23. Life beyond Distributed Transactions:an Apostate’s Opinion(译zz) 24. Distributed Snapshots: Determining Global States of a Distributed System --Leslie Lamport 25. Virtual Time and Global States of Distributed Systems 26. Timestamps in Message-Passing Systems That Preserve the Partial Ordering 27. Fundamentals of Distributed Computing:A Practical Tour of Vector Clock Systems 28. Knowledge and Common Knowledge in a Distributed Environment 29. Understanding Failures in Petascale Computers 30. Why Do Internet services fail, and What Can Be Done About It? 31. End-To-End Arguments in System Design 32. Rethinking the Design of the Internet: The End-to-End Arguments vs. the Brave New World 33. The Design Philosophy of the DARPA Internet Protocols(译zz) 34. Uniform consensus is harder than consensus 35. Paxos made code - Implementing a high throughput Atomic Broadcast 36. RAFT:In Search of an Understandable Consensus Algorithm分布式理论系列论文翻译集(合集)三.数据库理论系列0. A Relational Model of Data for Large Shared Data Banks --E.F.Codd 19701. SEQUEL:A Structured English Query Language 19742. Implentation of a Structured English Query Language 19753. A System R: Relational Approach to Database Management 19764. Granularity of Locks and Degrees of Consistency in a Shared DataBase --Jim Gray 19765. Access Path Selection in a RDBMS 1979 6. The Transaction Concept:Virtues and Limitations --Jim Gray7. 2pc-2阶段提交:Notes on Data Base Operating Systems --Jim Gray8. 3pc-3阶段提交:NONBLOCKING COMMIT PROTOCOLS9. MVCC:Multiversion Concurrency Control-Theory and Algorithms --1983 10. ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging-199211. A Comparison of the Byzantine Agreement Problem and the Transaction Commit Problem --Jim Gray 12. A Formal Model of Crash Recovery in a Distributed System - Skeen, D. Stonebraker13. What Goes Around Comes Around - Michael Stonebraker, Joseph M. Hellerstein 14. Anatomy of a Database System -Joseph M. Hellerstein, Michael Stonebraker 15. Architecture of a Database System(译zz) -Joseph M. Hellerstein, Michael Stonebraker, James Hamilton四.大规模存储与计算(NoSql理论系列)0. Towards Robust Distributed Systems:Brewer's 2000 PODC key notes1. CAP理论2. Harvest, Yield, and Scalable Tolerant Systems3. 关于CAP 4. BASE模型:BASE an Acid Alternative5. 最终一致性6. 可扩展性设计模式7. 可伸缩性原则8. NoSql生态系统9. scalability-availability-stability-patterns10. The 5 Minute Rule and the 5 Byte Rule (译) 11. The Five-Minute Rule Ten Years Later and Other Computer Storage Rules of Thumb12. The Five-Minute Rule 20 Years Later(and How Flash Memory Changes the Rules)13. 关于MapReduce的争论14. MapReduce:一个巨大的倒退15. MapReduce:一个巨大的倒退(II)16. MapReduce和并行数据库,朋友还是敌人?(zz)17. MapReduce and Parallel DBMSs-Friends or Foes (译)18. MapReduce:A Flexible Data Processing Tool (译)19. A Comparision of Approaches to Large-Scale Data Analysis (译)20. MapReduce Hold不住?(zz) 21. Beyond MapReduce:图计算概览22. Map-Reduce-Merge: simplified relational data processing on large clusters23. MapReduce Online24. Graph Twiddling in a MapReduce World25. Spark: Cluster Computing with Working Sets26. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing27. Big Data Lambda Architecture28. The 8 Requirements of Real-Time Stream Processing29. The Log: What every software engineer should know about real-time data's unifying abstraction30. Lessons from Giant-Scale Services五.基本算法和数据结构1. 大数据量,海量数据处理方法总结2. 大数据量,海量数据处理方法总结(续)3. Consistent Hashing And Random Trees4. Merkle Trees5. Scalable Bloom Filters6. Introduction to Distributed Hash Tables7. B-Trees and Relational Database Systems8. The log-structured merge-tree (译)9. lock free data structure10. Data Structures for Spatial Database11. Gossip12. lock free algorithm13. The Graph Traversal Pattern六.基本系统和实践经验1. MySQL索引背后的数据结构及算法原理2. Dynamo: Amazon’s Highly Available Key-value Store (译zz)3. Cassandra - A Decentralized Structured Storage System (译zz)4. PNUTS: Yahoo!’s Hosted Data Serving Platform (译zz)5. Yahoo!的分布式数据平台PNUTS简介及感悟(zz)6. LevelDB:一个快速轻量级的key-value存储库(译)7. LevelDB理论基础8. LevelDB:实现(译)9. LevelDB SSTable格式详解10. LevelDB Bloom Filter实现11. Sawzall原理与应用12. Storm原理与实现13. Designs, Lessons and Advice from Building Large Distributed Systems --Jeff Dean14. Challenges in Building Large-Scale Information Retrieval Systems --Jeff Dean15. Experiences with MapReduce, an Abstraction for Large-Scale Computation --Jeff Dean16. Taming Service Variability,Building Worldwide Systems,and Scaling Deep Learning --Jeff Dean17. Large-Scale Data and Computation:Challenges and Opportunitis --Jeff Dean18. Achieving Rapid Response Times in Large Online Services --Jeff Dean19. The Tail at Scale(译) --Jeff Dean & Luiz André Barroso 20. How To Design A Good API and Why it Matters21. Event-Based Systems:Architect's Dream or Developer's Nightmare?22. Autopilot: Automatic Data Center Management七.其他辅助系统1. The ganglia distributed monitoring system:design, implementation, and experience2. Chukwa: A large-scale monitoring system3. Scribe : a way to aggregate data and why not, to directly fill the HDFS?4. Benchmarking Cloud Serving Systems with YCSB5. Dynamo Dremel ZooKeeper Hive 简述八. Hadoop相关0. Hadoop Reading List1. The Hadoop Distributed File System(译)2. HDFS scalability:the limits to growth(译)3. Name-node memory size estimates and optimization proposal.4. HBase Architecture(译)5. HFile:A Block-Indexed File Format to Store Sorted Key-Value Pairs6. HFile V27. Hive - A Warehousing Solution Over a Map-Reduce Framework8. Hive – A Petabyte Scale Data Warehouse Using Hadoop转载请注明作者:phylips@bmy 2011-4-30
作者 石默研
在云计算基础设施IaaS服务中,“存”与“算”的分界是清晰的,客户会分别为“存”与“算”按需消费。不只是专门的存储服务如S3、对象存储、文件存储、NAS等,即使是在最基本的虚拟机服务ECS上,“存”也需要由消费者进行选择,而选择的对象是云盘,即位置对用户透明,不需要消费者关心是否在计算节点的本地:其实连计算节点本身位于何处也是无需关心,又何谈本地。随着云计算服务的持续发展,“存”与“算”的界限,无论是从消费模式上,还是从技术上,都呈现出越来越清晰的趋势。
而在PaaS层的数据库服务中,则出现两种情况。一种是“存”与“算”也由消费者分别选择并扩缩,而另一种则是购买服务时,“存”与“算”是固定捆绑的架构组合,可以定义大小,但无法相对独立地选择、部署与扩缩。
引发上述数据库服务不同消费模式的因素,实质上是在云中部署的数据库产品本身不同的技术架构,即“存”“算”分离,或“存”“算”一体。由于对单体数据库谈“存”与“算”的分离与一体,并没有多大意义,因此,主要是针对分布式数据库而言,其不同的特性带来了业界较为广泛的讨论。
那么,首先分析一下,在“存”“算”基础设施愈来愈独立清晰的趋势下,建立在其上的数据库服务“存”“算”一体现象从何来呢?不难发现,云平台上这样的数据库服务,大多都是基于“从非云环境中、应企业级On Premise需求产生与发展而来”的数据库产品。也就是说,其产品本初的设计理念就与“云”无关,只是后来为了寻求不同的商业模式而部署在云上而已;而大多数“存”“算”分离的数据库产品,其创始之初,就面向云环境进行设计。这里,顺便澄清一下现在极为流行的云原生概念,相当多的人混淆了云适配部署与云原生的概念,认为只要部署在云上,就是云原生了。其实云原生的概念与其字面意思极为直白契合,就是指在“云环境”中“原生”的,而不是从别的地方迁来的,即 “云原生”就是生长于云上的,而非云原生则是迁移到云上的 。这与要深入理解目前同样火热的NFT,就必须先正确理解“区块链原生”概念的道理是一样的。
相信现在,关于“云”的问题应该是比较清晰了:“存”“算”分离是云原生的架构,而“存”“算”一体则不是,这一点相信读者不会有太多的疑问。那么,接下来的问题是:“云原生”就一定好吗?面向企业级的需求,“存”“算”分离与“存”“算”一体孰优孰劣?
世界上本来就没有绝对的好与绝对的坏,“存”“算”一体架构的设计,也是在满足企业需求的过程中自然产生的,对分布式数据库而言,“存”“算”一体的设计,无论是对传统单体数据库的替代上,还是对采用业务单元化策略的局部性满足上,还是对基于已有成熟数据库体系以二次开发构建分库分表数据库产品的方便性上,都产生了积极的 历史 作用。在那种情况下,不去考虑“云”的趋势与设计需求,也是合理的。
然而,过去几十年的 历史 已经证明,计算机技术的发展是极为迅速的,无论是软件还是硬件,当然包括数据库技术同样如此。
首先,往远处看的话:从计算机科学发展的角度,在云计算大趋势的驱动下,“计算”与“存储”技术相对独立的发展道路已经越来越明显,越来越清晰。可以想见,未来“计算”力相关的技术、架构与产品必将会发展到比如今所有极为先进的状态;未来“存储”相关技术、架构与产品也必将会进展到一个无法完全预计的崭新阶段,同时越来越“智能”。并且从目前的形势看,这个未来并不会太久远,“存”“算”分离无疑是适合那个未来的各种可能的,因为它本身就是为此而原生的,“存”“算”一体在未来或许将变得无从谈起;而从国际上先进数据库技术发展的实际情况来看,绝大多数崭新的、最前沿的数据库相关技术与产品,都是云原生的,换句话说,都是采用“存”“算”分离的架构,这一点,几乎少有例外。
(或许可以猜测,把磁盘挂在本地这种现存商业计算机的架构,也是由企业/个体对计算机使用的商业模式驱动的,而不一定是技术驱动的必然结果)
其次,往近处看:对企业级现阶段数字化转型中,传统单体数据库替换的紧迫需求而言,大量的事实已经证明,云原生架构的数据库完全可以满足各种实际的业务转型需求:
例子还有很多.......
最后还有一点需要强调:对于那些 将“云”策略当成技术与业务核心发展战略 的企业来讲, 云原生架构 无论是面向现在与未来,自然是 最为适合 的;
或许可以这样说,“存”“算”一体的架构是现代分布式数据库技术进化过程中的一个重要过渡阶段,其 历史 作用不可否认,毋庸质疑;而不久的将来,分布式数据库架构向云原生快速发展普及的趋势将会越来越明显,步伐将会越来越加快......
世界潮流,浩浩荡荡;顺之者昌,逆之者亡,顺应 历史 的潮流与趋势的选择一般都是明智的。
189 浏览 3 回答
106 浏览 4 回答
290 浏览 4 回答
141 浏览 3 回答
238 浏览 3 回答
103 浏览 4 回答
232 浏览 2 回答
292 浏览 3 回答
103 浏览 4 回答
334 浏览 2 回答
293 浏览 2 回答
103 浏览 2 回答
92 浏览 2 回答
308 浏览 2 回答
142 浏览 5 回答