Data lakeA data lake is a system or repository of data stored in its natural/raw format, usually object blobs or files. A data lake is usually a single store of data including raw copies of source system data, sensor data, social data etc., and transformed data used for tasks such as reporting, visualization, advanced analytics and machine learning. A data lake can include structured data from relational databases (rows and columns), semi-structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs) and binary data (images, audio, video).
Apache HadoopApache Hadoop (pronhəˈdu:p) is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware.
Stored procedureA stored procedure (also termed proc, storp, sproc, StoPro, StoredProc, StoreProc, sp, or SP) is a subroutine available to applications that access a relational database management system (RDBMS). Such procedures are stored in the database data dictionary. Uses for stored procedures include data-validation (integrated into the database) or access-control mechanisms. Furthermore, stored procedures can consolidate and centralize logic that was originally implemented in applications.
ComputabilityComputability is the ability to solve a problem in an effective manner. It is a key topic of the field of computability theory within mathematical logic and the theory of computation within computer science. The computability of a problem is closely linked to the existence of an algorithm to solve the problem. The most widely studied models of computability are the Turing-computable and μ-recursive functions, and the lambda calculus, all of which have computationally equivalent power.
BigtableBigtable is a fully managed wide-column and key-value NoSQL database service for large analytical and operational workloads as part of the Google Cloud portfolio. Bigtable development began in 2004. It is now used by a number of Google applications, such as Google Analytics, web indexing, MapReduce, which is often used for generating and modifying data stored in Bigtable, Google Maps, Google Books search, "My Search History", Google Earth, Blogger.com, Google Code hosting, YouTube, and Gmail.
BigQueryBigQuery is Google's fully managed, serverless data warehouse that enables scalable analysis over petabytes of data. It is a Platform as a Service (PaaS) that supports querying using a dialect of SQL. It also has built-in machine learning capabilities. BigQuery was announced in May 2010 and made generally available in November 2011. BigQuery provides external access to Google's Dremel technology, a scalable, interactive ad hoc query system for analysis of nested data.