Pages

30 September 2018

Big Data Glossary


Term


Definition


1
The idea of minimising the complexity of something by hiding the details and just providing the relevant information. It is about providing a high-level specification rather than going into lots of detail about how something works. In the cloud, for instance, in an IaaS delivery model, the infrastructure is abstracted from the user.
2
Determining who or what can go where, when, and how.
3
Algorithms for complex analysis of either structured or unstructured data. It includes sophisticated statistical models, machine learning, neural networks, text analytics, and other advanced data-mining techniques Advanced analytics does not include database query and reporting and OLAP cubes.
4
ALC
The process of maintaining a piece of code so that it is consistent and predictable as it is changed to support business requirements.
5
API
A defined protocol that allows computer programs to use functionality and data from other software systems.
6
In information processing, the design approach taken in developing a program or system.
7
The process by which a database or file data that are seldom used or outdated, but that’s required for historical or audit reasons, is copied to a cheaper form of storage. The storage medium may be online, tape, or optical disc. Companies are using the cloud as a means of archiving data.
8
Software that allows organisations to record all information about their hardware and software. Most such applications capture cost information, license information, and so on. Such information belongs in the configuration management database.
9
ACID
Are the main requirements for guaranteed transaction processing.
10
A check on the effectiveness of a task or set of tasks, and how the tasks are managed and documented. Auditing is also a process that is used within organisations to ensure that the data is secure and in compliance with regulatory organisations.
11
A trace of a sequence of events in a clerical or computer system. This audit usually identifies the creation or modification of any element in the system, who did it, and (possibly) why it was done.
12
The process by which the identity of a person or computer process is verified.
13
A utility that copies databases, files, or subsets of databases and files to a storage medium. This copy can be used to restore the data in case of system failure.
14
Technically, the range of frequencies over which a device can send or receive signals. The term is also used to denote the maximum data transfer rate, measured in bits per second, that a communications channel can handle.
15
A non-interactive the process that runs in a queue, usually when the system load is lowest, and generally used for processing batches of information in a serial and usually efficient manner. Early computers were capable of only batch processing.
16
An effective way of doing something. It can relate to anything from writing program code to IT governance.
17
The capability to manage a huge volume of disparate data, at the right speed and within the right time frame, to allow real-time analysis and reaction. Big data is typically broken down by three characteristics, including volume (how much data), velocity (how fast that data is processed), and variety (the various types of data).
18
It is developed by Google to be a distributed storage system intended to manage highly scalable structured data. Data is organised into tables with rows and columns. Unlike a traditional relational database model, Bigtable is a sparse, distributed, persistent, multidimensional sorted map. It is intended to store huge volumes of data across commodity servers.
19
Making the necessary connections among software components so that they can interact.
20
Using a person’s unique physical characteristics to prove his identity to a computer — for example, by using a fingerprint scanner or voice analyser.
21
A component or device with an input and an output whose inner workings need not be understood by or accessible to the user.
22
In computer programming, a program that accepts requests from one software layer or component and translates them into a form that can be understood by another layer or component.
23
A technology that connects multiple components so they can talk to one another. In essence, a bus is a connection capability. A bus can be a software (such as an enterprise service bus) or hardware (such as a memory bus).
24
The codification of rules and practices that constitute a business.
25
BPEL
A computer language based on WSDL (Web Services Description Language, an XML format for describing web services) and designed for programming business services.
26
BPM
A technology and methodology for controlling the activities — both automated and manual — needed to make a business function.
27
A technique for transforming how a business operates into a codified source so that it can be translated into software.
28
BPaaS
A whole business the process is provided as a service involving little more than a software interface, such as a parcel delivery service.
29
Constraints or actions that refer to the actual commercial world but may need to be encapsulated in service management or business applications.
30
An individual function or activity that is directly useful to the business.
31
An efficient method of storing data in memory so that future requests for that data can be achieved more quickly.
32
CoE
A group of key people from all areas of the business and operations that focus on best practices. A center of excellence provides a way for groups within the company to collaborate. This group also becomes a force for change, because it can leverage its growing knowledge to help business units benefit from the experience.
33
The management of change in operational processes and applications. Change management is critical when IT organisations are managing software infrastructure in conjunction with new development processes. All software elements have to be synchronised so that they work as intended.
34
A computing model that makes IT resources such as servers, middleware, and applications available as services to business organisations in a self-service manner.
35
A database that stores data across columns rather than rows. This is in contrast to a relational database that stores data in rows.
36
CEP
A technique for tracking, analysing, and processing data as an event happens. This information is then processed and managed based on business rules and processes.
37
A piece of computer software that can be used as a building block in larger systems. Components can be parts of business applications that have been made accessible through web service-related standards and technologies, such as WSDL, SOAP, and XML. See: Web Service.
38
The complete description of the way in which the constituent elements of a software product or system interrelate, both in functional and physical terms.
39
The management of configurations, normally involving holding configuration data in a database so that the data can be managed and changed where necessary.
40
CMDB
In general, a repository of service management data.
41
In computer programming, a data structure or object used to manage collections of other objects in an organised way.
42
CMS
A system that provides methods and tools to capture, manage, store, preserve, and deliver content and documents related to organisational processes. The technologies include document management, records management, imaging, workflow management, web content management, and collaboration.
43
COBIT
An IT framework with a focus on governance and managing technical and business risks.
44
Data that is placed in storage rather than used in real-time.
45
Software used to identify potential data-quality problems. If a customer is listed multiple times in a customer database because of variations in the spelling of her name, the data-cleansing software makes corrections to help standardise the data.
46
Data access to a variety of data stores, using consistent rules and definitions that enable all the data stores to be treated as a single resource.
47
Data that is moving across a network or in memory for processing in real-time.
48
A subset of a data warehouse that is designed to focus on a specific set of business information.
49
The process of exploring and analysing large amounts of data to find patterns.
50
A technique or process that helps one to understand the content, structure, and relationships of their data. This process also helps them validate their data against technical and business rules.
51
Characteristics of data such as consistency, accuracy, reliability, completeness, timeliness, reasonableness, and validity. Data-quality software ensures that data elements are represented in a consistent way across different data stores or systems, making the data more trustworthy across the enterprise.
52
A process by which the format of data is changed so that it can be used by different applications.
53
A large data store containing the organisation’s historical data, which is used primarily for data analysis and data mining. It is the data system of record.
54
A computer system intended to store large amounts of information reliably and in an organised fashion. Most databases provide users with convenient access to the data, along with helpful search capabilities.
55
DBMS
Software that controls the storage, access, deletion, security, and integrity of primarily structured data within a database.
56
A term used in both computing and telephony to indicate an organised map of devices, files, or people.
57
The ability to process and manage the processing of algorithms across many different nodes in a computing environment.
58
It is defining the connections among applications before processing to improve speed. This also limits flexibility.
59
The capability to expand or shrink a computing resource in real-time, based on need.
60
EDI
The practice of businesses exchanging information electronically instead of using traditional paper-based methods. This includes transmitting documents like purchase orders, advance ship alerts, and invoices.
61
When hardware, software, or a combination of both duplicates the functionality of a computer system in a different, second system. The behaviour of the second system will closely resemble the original functionality of the first system.
62
ERP
A packaged set of business applications that combine business rules, processes, and data management into a single integrated environment to support a business.
63
ESB
A packaged set of middleware services that are used to communicate between business services in a secure and predictable manner.
64
ER
A data management approach that graphically represents relationships between data. This allows developers to create new relationships between data sources without complex programming.
65
XML
A way of presenting data as plain-text files that have become the lingua franca of SOA. In XML, as in HTML, data is delimited in tags that are enclosed in angle brackets (< and >), although the tags in XML can have many more meanings.
66
XSLT
A computer language, based on XML, that specifies how to change one XML document into another.
67
ELT
Tools for locating and loading data into a business application so that it can be later transformed. This is similar to ETL (see its entry) but is associated with big data integration processes.
68
ETL
Tools for locating and accessing data from a data store (data extraction), changing the structure or format of the data so it can be used by the business application (data transformation), and applying the data to the business application (data load).
69
The capability of a system to provide uninterrupted service despite the failure of one or more of the system’s components.
70
The combination of disparate things so that they can act as one — as in federated states, data, or identity management — and to make sure that all the right rules apply.
71
A support structure for developing and managing software products.
72
The ability to ensure that corporate or governmental rules and regulations are conformed with. Governance is combined with compliance and security issues across computing environments.
73
An important software design concept, especially in relation to components, referring to the amount of detail or functionality — from fine to coarse — provided in a service component. One software component can do something quite simple, such as calculate a square root; another has a great deal of detail and functionality to represent a complex business rule or workflow. The first component is fine-grained, and the second is coarse-grained. Developers often aggregate fine-grained services into coarse-grained services to create a business service.
74
A step beyond distributed processing, involving large numbers of networked computers (often geographically dispersed and possibly of different types and capabilities) that are harnessed to solve a common problem. A grid computing model can be used instead of virtualisation in situations that require real-time where latency is unacceptable.
75
An Apache-managed software framework derived from MapReduce and Bigtable. Hadoop allows applications based on MapReduce to run on large clusters of commodity hardware. Hadoop is designed to parallelise data processing across computing nodes to speed computations and hide latency. Two major components of Hadoop exist: a massively scalable distributed file system that can support petabytes of data and a massively scalable MapReduce engine that computes results in batch.
76
HDFS
A versatile, resilient, clustered approach to managing files in a big data environment. HDFS is not the final destination for files. Instead, it is a data “service” that offers a unique set of capabilities needed when data volumes and velocity are high.
77
The act of subdividing and isolating elements of a physical server into fractions, each of which can run an operating system or an application.
78
A computing environment that includes the use of public and private clouds as well as data centre resources in a coordinated fashion.
79
Hardware that allows multiple operating systems to share a single host. The hypervisor sits at the lowest levels of the hardware environment and uses a thin layer of code in software to enable dynamic resource sharing. The hypervisor makes it seem like each operating system has the resources all to itself.
80
Keeping track of a single user’s (or asset’s) identity throughout the engagement with a system or set of systems.
81
A database structure where information is managed and processed in memory rather than on disk.
82
A process using software to link data sources in various departments or regions of the organisation with an overall goal of creating more reliable, consistent, and trusted information.
83
ITIL
A framework and set of standards for IT governance based on best practices.
84
The fundamental systems necessary for the ordinary operation of anything, be it a country or an IT department. The physical infrastructure that people rely on includes roads, electrical wiring, and water systems. In IT, infrastructure includes basic computer hardware, networks, operating systems, and other software that applications run on top of.
85
Services provided by the infrastructure. In IT, these services include all the software needed to make devices talk to one another, for starters.
86
IaaS
Infrastructure, including a management interface and associated software, provided to companies from the cloud as a service.
87
ISO
An organisation that has developed more than 17,000 international standards, including standards for IT service management and corporate governance of information technology.
88
The capability of a product to interface with many other products; usually used in the context of software.
89
Deferring the necessary connections among applications to when the connection is first needed. Late binding allows more flexibility for changes than early binding does, but it imposes some cost in processing time.
90
The amount of time lag before a service executes in an environment. Some applications require less latency and need to respond in near real-time, whereas other applications are less time-sensitive.
91
Any application that is more than a few years old. When applications can’t be disposed of and replaced easily, they become legacy applications. The good news is that they are still doing something useful when selected pieces of code can be turned into business services with new standardised interfaces.
92
An open-source operating system based upon and similar to UNIX. In cloud computing, Linux is the dominant operating system, primarily because it is supported by a large number of vendors.
93
The vast majority of websites run on the Linux operating system managed by a Linux web hosting service using the LAMP (Linux, Apache, MySQL, PHP) software stack.
94
LAMP
An increasingly popular open-source approach to building web applications. LAMP is a software bundle made up of the Linux operating system, the Apache web server, a MySQL database, and a scripting language such as PHP, Perl, or Python.
95
An approach to distributed software applications in which components interact by passing data and requests to other components in a standardised way that minimises dependencies among components. The emphasis is on simplicity and autonomy. Each component offers a small range of simple services to other components.
96
Google designs it as a way of efficiently executing a set of functions against a large amount of data in batch mode. The “map” component distributes the programming problem or tasks across a large number of systems and handles the placement of the tasks in a way that balances the load and manages recovery from failures. After the distributed computation is completed, another function called “reduce” aggregates all the elements back together to provide a result.
97
A way of encoding information that uses plain text containing special tags often delimited by angle brackets (< and >). Specific markup languages are based on XML to standardise the interchange of information between different computer systems and services.
98
A program (possibly installed on a web page) that combines content from more than one source, such as Google Maps and a real estate listing service.
99
MOM
A precursor to the enterprise service bus. See: Enterprise Service Bus (ESB), a set of packaged middleware services.
100
The definitions, mappings, and other characteristics used to describe how to find, access, and use the company’s data and software components.
101
A container of consistent definitions of business data and rules for mapping data to its actual physical locations in the system.
102
Multipurpose software that lives at a layer between the operating system and application in distributed computing environments.
103
An application that a business cannot afford to be without at any time.
104
MDDS
Organises data in a matrix-like structure, whose dimensions correspond to different qualities. These dimensions may encompass time, product, location, and additional factors. The matrix stores several metrics, such as sales income, amount sold, or customer count, in the cells where they intersect.
105
MDMS
A specific type of database management system that efficiently stores data in a multidimensional array. This storage method is designed to enhance the efficiency of data warehousing and online analytical processing (OLAP) tasks. By permitting the inclusion of several dimensions, it facilitates intricate queries and analysis.
106
This refers to the situation where a single instance of an application runs on a SaaS vendor’s servers, but serves multiple client organisations (tenants), keeping all their data separate. In a multitenant architecture, a software application partitions its data and configuration so that each customer has a customised virtual application instance.
107
An open-source option to SQL.
108
The connection of computer systems (nodes) by communications channels and appropriate software.
109
A set of technologies that created a broad array of database management systems that are distinct from relational database systems. One major difference is that SQL is not used as the primary query language. These database management systems are also designed for distributed data stores.
110
OODBMS
A database management system where data is stored as an object that is closely aligned with an application.
111
A movement in the software industry that makes programs available along with the source code used to create them so that others can inspect and modify how programs work. Changes to source code are shared with the community at large.
112
Making analytics part of a business process.
113
A networking system in which nodes in a network exchange data directly instead of going through a central server.
114
A guarantee that data stored in a database won’t be changed without permissions, and it will make available as long as it is important to the business.
115
PaaS
A cloud service that abstracts the computing services, including the operating software and the development and deployment and management life cycle. It sits on top of Infrastructure as a Service.
116
The most widely used open-source relational database.
117
A statistical or data-mining solution consisting of algorithms and techniques that can be used on both structured and unstructured data (together or individually) to determine future outcomes. It can be deployed for prediction, optimisation, forecasting, simulation, and many other uses.
118
As opposed to a public cloud, which is generally available, a private cloud is a set of computing resources within the corporation that serves only the corporation, but that is set up to operate in a cloud-like manner in regard to its management.
119
A high-level, end-to-end structure useful for decision making and normalising how things get done in a company or organisation.
120
A set of rules that computers use to establish and maintain communication among themselves.
121
Making resources available to users and software. A provisioning system makes applications available to users and makes server resources available to applications.
122
A resource that is available to any consumer either as a fee-per-transaction service or as a free service. It does not have deep security or a well-defined SLA.
123
RFID
A technology that uses small, inexpensive chips attached to products (or even animals) that then transmit a unique identification number over a short distance to a special radio transmitter/receiver.
124
A form of processing in which a computer system accepts and updates data at the same time, feeding back immediate results that influence the data source.
125
A class of applications that demand timely response to actions that take place out in the world. Typical examples include automated stock trading and RFID.
126
A single source for all the metadata needed to gain access to a web service or software component.
127
RDBMS
A database management system that organises data in defined tables.
128
A database for software and components, with an emphasis on revision control and configuration management (where they keep the good stuff, in other words).
129
REST
Designed specifically for the Internet and is the most commonly used mechanism for connecting one web resource (a server) to another web resource (a client). A RESTful API provides a standardised way to create a temporary relationship (also called “loose coupling”) between and among web resources.
130
A set of compute, storage, or data services that are combined to be used across hybrid environments.
131
The time from the moment at which a transaction is submitted by a user or an application to the moment at which the final result of that transaction is made known to the user or application.
132
In regard to hardware, the capability to go from small to large amounts of processing power with the same architecture. It also applies to software products such as databases, in which case it refers to the consistency of performance per unit of power as hardware resources increase.
133
A computer programming language that is interpreted and has access to all or most operating system facilities. Common examples include Perl, Python, Ruby, and JavaScript. It is often easier to program in a scripting language, but the resulting programs generally run more slowly than those created in compiled languages such as C and C++.
134
SSL
A popular method for making secure connections over the Internet, first introduced by Netscape.
135
SAML
A standard framework for exchanging authentication and authorisation information (that is, credentials) in an XML format called assertions.
136
In computer programming, what the data means as opposed to the formatting rules (syntax).
137
A purposeful activity carried out for the benefit of a known target. Services are often made up of a group of component services, some of which may also have component services. Services always transform something, and they complete by delivering an output.
138
A directory of IT services provided across the enterprise, including information such as service description, access rights, and ownership.
139
A single point of contact for IT users and customers to report any issues they may have with the IT service (or, in some cases, with IT’s customer service).
140
Monitoring and optimising service to ensure that it meets the critical outcomes that the customer values and the stakeholders want to provide.
141
SLA
A document that captures the understanding between a service user and a service provider regarding quality and timeliness.
142
SOA
An approach to building applications that implements business processes or services by using a set of loosely coupled black-box components orchestrated to deliver a well-defined level of service.
143
In IT, an application with a single narrow focus, such as human resources management or inventory control, no intention or preparation for use by others.
144
SOAP
A protocol specification for exchanging data. Along with REST, it is used for storing and retrieving data in the Amazon storage cloud.
145
SaaS
The delivery of computer applications over the Internet.
146
A database that is optimised for data related to where an object is in a given space.
147
A core set of common, repeatable best practices and protocols that have been agreed on by a business or industry group. Typically, vendors, industry user groups, and end-users collaborate to develop standards based on the broad expertise of a large number of stakeholders. Organisations can leverage these standards as a common foundation and innovate on top of them.
148
SAN
A high-speed network of interconnected storage devices. These storage devices might be servers, optical disc drives, or other storage media. The difference between a SAN and a NAS (Network Attached Storage) is that a SAN runs at a higher speed than a NAS, while a NAS is generally easier to install and provides a file system.
149
An analytic computing platform that is focused on speed. Data is continuously analysed and transformed in memory before it is stored on a disk. This platform allows the analysing of large volumes of data in real-time.
150
Data that has a defined length and format. Examples of structured data include numbers, dates, and groups of words and numbers called strings (for example, a customer’s name, address, and so on).
151
SQL
The most popular computer language for accessing and manipulating databases.
152
The process of analysing unstructured text, extracting relevant information, and transforming it into structured information that can be leveraged in various ways.
153
The rate at which transactions are completed in a system.
154
TQM
A popular quality-improvement program.
155
A computer action that represents a business event, such as debiting an account. When a transaction starts, it must either complete or not happen at all.
156
TLS
A newer name for SSL.
157
Data that does not follow a specified data format. Unstructured data can be text, video, images, and so on.
158
A metered service that acts like a public service based on payment for the use of a measured amount of a component or asset.
159
Virtual memory is the use of a disk to store active areas of memory to make the available memory appear larger. In a virtual environment, one computer runs software that allows it to emulate another machine. This kind of emulation is commonly known as virtualisation.
160
A software component created with an interface consisting of a WSDL definition, an XML schema definition, and a WS-Policy definition. Collectively, components could be called a service contract — or, alternatively, an API.
161
WSDL
An XML (eXtended Markup Language) format for describing web services.
162
WS
A policy framework that provides a way of expressing the capabilities, requirements, and characteristics of software components in a Web Services system.
163
This is a sequence of task-oriented steps needed to carry out a business process.
164
A language for defining and describing the structure of XML documents.
165
XSD
The description of what can be in an XML document.

Table 23: Big Data Glossary


Reference(s)
Book
Foster, I., Ghani, R., Jarmin, R. S., Kreuter, F. & Lane, J. (2016) Big Data and Social Science: A Practical Guide to Methods and Tools. Chemical Rubber Company Press: United States of America (USA), Florida (FL), Palm Beach, Boca Raton. [ISBN: 9781498751407]. [Available on: Amazon: https://amzn.to/3D5tCVn].
Book
Hurwitz, J., Nugent, A., Halper, F. & Kaufman, M. (2013) Big Data For Dummies. John Wiley & Sons: United States of America (USA), New Jersey (NJ), Hudson, Hoboken. [ISBN: 9781118504222]. [Available on: Amazon: https://amzn.to/3N96FoA].
Book
Srinivasan, S. (2017) Guide to Big Data Applications. Springer International Publishing: United States of America (USA), New York (NY). [ISBN: 9783319538167]. [Available on: Amazon: https://amzn.to/3DabXvw].
Book
Zikopoulos, P. C. & Eaton, C. (2012) Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. McGraw-Hill Osborne: United States of America (USA), New York (NY). [ISBN: 9780071790536]. [Available on: Amazon: https://amzn.to/3FgVbh3].

Reference (or cite) Article
Kahlon, R. S. (2018) Big Data Glossary [Online]. dkode: United Kingdom, England, London. [Published on: 2018-09-30]. [Article ID: RSK666-0000135]. [Available on: dkode | Ravi - https://ravi.dkode.co/2018/09/big-data-glossary.html].

No comments:

Post a Comment

Comments on this blog are not moderated.

But, offensive ones will be deleted.