본문 바로가기

Data Analytics/Google Data Analytics

[Coursera] Course 3. Prepare Data for Exploration

WEEK 1 - Data types and structures

[Definition]

First-party data Second-party data Third-party data
Data collected by an individual or group using their own resources Data collected by a group directly from its audience and the sold Data collected from outside sources who did not collect it directly

 

Discrete data Continuous data
Data that is counted and has a limited number of values Data that is measured and can have almost any numeric value

 

Nominal data Ordinal data
A type of qualitative data that is categorized without a set order A type of qualitative data with a set order or scale

 

Internal data External data
Data that lives within a company's own systems Data that lives and is generated outside of an organization

 

Structured data Unstructured data
Data organized in a certain format such as rows and columns Data that is not organized in any easily identifiable manner

 

Wide data Long data
Data in which every data subject has a single row with multiple columns to hold the values of various attributes of the subject Data in which each row is one time point per subject, so each subject will have data in multiple rows

 

 

-Population: All possible data values in a certain dataset

-Sample: A part of a population that is representative of the population

-Data model: A model that is used for organizing data elements and how they relate to one another

-Data elements: Pieces of information, such as people's names, account numbers, and addresses

-Data type: A specific kind of data attribute that tells what kind of value the data is

 

 

[Levels of data modeling]

Feature Conceptual Logical Physical
Entity Names O O  
Entity Relationships O O  
Attributes   O  
Primary Keys   O O
Foreign Keys   O O
Table Names     O
Column Names     O
Column Data Types     O

(Source: https://www.1keydata.com/datawarehousing/data-modeling-levels.html)

 


WEEK 2 - Bias, credibility, privacy, ethics, and access

[Definition]

-Bias: A preference in favor of or against a person, group of people, or thing

-Data bias: A type of error that systematically skews results in a certain direction

 

Sampling bias Observer bias
(experimenter/research bias)

Interpretation bias Confirmation bias
When a sample isn't representative of the population as a whole The tendency for different people to observe things differently The tendency to always interpret ambiguous situation in a positive or negative way The tendency to search for or interpret information in a way that confirms pre-existing beliefs

 

-Ethics: Well-founded standards of right and wrong that prescribe what humans ought to do, usually in terms of rights, obligations, benefits to society, fairness, or specific virtues

-Data ethics: Well-founded standards of right and wrong that dictate how data is collected, shared, and used

-GDPR: General Data Protection Regulation of the European Union

-Data interoperability: The ability of data systems and services to openly connect and share data

[Aspects of data ethics]

  • Ownership: Individuals own the raw data they provide and they have primary control over its usage, how it's processed, and how it's shared
  • Transaction transparency: All data-processing activities and algorithms should be completely explainable and understood by the individual who provides their data
  • Consent: An individual's right to know explicit details about how and why their data will be used before agreeing to provide it
  • Currency: Individuals should be aware of financial transactions resulting from the use of their personal data and the scale of these transactions
  • Privacy: Preserving a data subject's information and activity any time a data transaction occurs
  • Openness: Free access, usage, and sharing of data

WEEK 3 - Databases: Where data lives

[Definition]

-Relational database: A database that contains a series of related tables that can be connected via their relationships

-Primary key: An identifier that references a column in which each value is unique

-Foreign key: A field within a table that is a primary key in another table

-Metadata: Data about data

-Metadata repository: A database specifically created to store metadata

-Data governance: A process to ensure the formal management of a company's data assets

-Sorting data: Arranging data into a meaningful order to make it easier to understand, analyze, and visualize

-Filtering: Showing only the data that meets a specific criteria while hiding the rest

[3 common types of metadata]

Descriptive Structural Administrative
Metadata that describes a piece of data and can be used to identify it at a later point in time Metadata that indicates how a piece of data is organized and whether it is part of one, or more than one, data collection Metadata that indicates the technical source of a digital asset

[SQL practices]

 

In-depth guide_SQL best practices.pdf
0.33MB

 

[Regular Expressions Tutorial]

https://www.regular-expressions.info/tutorialcnt.html

 

Regular Expression Tutorial Table of Contents

Regular Expressions Tutorial Table of Contents This regular expressions tutorial teaches you every aspect of regular expressions. Each topic assumes you have read and understood all previous topics. If you are new to regular expressions, you should read th

www.regular-expressions.info


WEEK 4 - Organizing and protecting your data

[Definition]

-Naming conventions: Consistent guidelines that describe the content, data,or version of a file in its name

-Data security: Protecting data from unauthorized access or corruption by adopting safety measures

[Best practices when organizing data]

  • Naming conventions
  • Foldering
  • Archiving older files
  • Align your naming and storage practices with your team
  • Develop metadata practices

[Security measure]

Encryption Tokenization
Use a unique algorithm to alter data and make it unusable by users and applications that don't know the algorithm.

The algorithm is saved as a key which can be used to reverse the encryption.
Replace the data elements with randomly generated data referred to as a token.

The origianl data is stored in a separate location and mapped to the tokens. Need a permission to use the tokenized data and the token mapping.

Even if the tokenized data is hacked, the original data is still safe and secure in a separate location.