WEEK 1 - Data types and structures
[Definition]
First-party data | Second-party data | Third-party data |
Data collected by an individual or group using their own resources | Data collected by a group directly from its audience and the sold | Data collected from outside sources who did not collect it directly |
Discrete data | Continuous data |
Data that is counted and has a limited number of values | Data that is measured and can have almost any numeric value |
Nominal data | Ordinal data |
A type of qualitative data that is categorized without a set order | A type of qualitative data with a set order or scale |
Internal data | External data |
Data that lives within a company's own systems | Data that lives and is generated outside of an organization |
Structured data | Unstructured data |
Data organized in a certain format such as rows and columns | Data that is not organized in any easily identifiable manner |
Wide data | Long data |
Data in which every data subject has a single row with multiple columns to hold the values of various attributes of the subject | Data in which each row is one time point per subject, so each subject will have data in multiple rows |
-Population: All possible data values in a certain dataset
-Sample: A part of a population that is representative of the population
-Data model: A model that is used for organizing data elements and how they relate to one another
-Data elements: Pieces of information, such as people's names, account numbers, and addresses
-Data type: A specific kind of data attribute that tells what kind of value the data is
[Levels of data modeling]
Feature | Conceptual | Logical | Physical |
Entity Names | O | O | |
Entity Relationships | O | O | |
Attributes | O | ||
Primary Keys | O | O | |
Foreign Keys | O | O | |
Table Names | O | ||
Column Names | O | ||
Column Data Types | O |
(Source: https://www.1keydata.com/datawarehousing/data-modeling-levels.html)
WEEK 2 - Bias, credibility, privacy, ethics, and access
[Definition]
-Bias: A preference in favor of or against a person, group of people, or thing
-Data bias: A type of error that systematically skews results in a certain direction
Sampling bias | Observer bias (experimenter/research bias) |
Interpretation bias | Confirmation bias |
When a sample isn't representative of the population as a whole | The tendency for different people to observe things differently | The tendency to always interpret ambiguous situation in a positive or negative way | The tendency to search for or interpret information in a way that confirms pre-existing beliefs |
-Ethics: Well-founded standards of right and wrong that prescribe what humans ought to do, usually in terms of rights, obligations, benefits to society, fairness, or specific virtues
-Data ethics: Well-founded standards of right and wrong that dictate how data is collected, shared, and used
-GDPR: General Data Protection Regulation of the European Union
-Data interoperability: The ability of data systems and services to openly connect and share data
[Aspects of data ethics]
- Ownership: Individuals own the raw data they provide and they have primary control over its usage, how it's processed, and how it's shared
- Transaction transparency: All data-processing activities and algorithms should be completely explainable and understood by the individual who provides their data
- Consent: An individual's right to know explicit details about how and why their data will be used before agreeing to provide it
- Currency: Individuals should be aware of financial transactions resulting from the use of their personal data and the scale of these transactions
- Privacy: Preserving a data subject's information and activity any time a data transaction occurs
- Openness: Free access, usage, and sharing of data
WEEK 3 - Databases: Where data lives
[Definition]
-Relational database: A database that contains a series of related tables that can be connected via their relationships
-Primary key: An identifier that references a column in which each value is unique
-Foreign key: A field within a table that is a primary key in another table
-Metadata: Data about data
-Metadata repository: A database specifically created to store metadata
-Data governance: A process to ensure the formal management of a company's data assets
-Sorting data: Arranging data into a meaningful order to make it easier to understand, analyze, and visualize
-Filtering: Showing only the data that meets a specific criteria while hiding the rest
[3 common types of metadata]
Descriptive | Structural | Administrative |
Metadata that describes a piece of data and can be used to identify it at a later point in time | Metadata that indicates how a piece of data is organized and whether it is part of one, or more than one, data collection | Metadata that indicates the technical source of a digital asset |
[SQL practices]
[Regular Expressions Tutorial]
https://www.regular-expressions.info/tutorialcnt.html
Regular Expression Tutorial Table of Contents
Regular Expressions Tutorial Table of Contents This regular expressions tutorial teaches you every aspect of regular expressions. Each topic assumes you have read and understood all previous topics. If you are new to regular expressions, you should read th
www.regular-expressions.info
WEEK 4 - Organizing and protecting your data
[Definition]
-Naming conventions: Consistent guidelines that describe the content, data,or version of a file in its name
-Data security: Protecting data from unauthorized access or corruption by adopting safety measures
[Best practices when organizing data]
- Naming conventions
- Foldering
- Archiving older files
- Align your naming and storage practices with your team
- Develop metadata practices
[Security measure]
Encryption | Tokenization |
Use a unique algorithm to alter data and make it unusable by users and applications that don't know the algorithm. The algorithm is saved as a key which can be used to reverse the encryption. |
Replace the data elements with randomly generated data referred to as a token. The origianl data is stored in a separate location and mapped to the tokens. Need a permission to use the tokenized data and the token mapping. Even if the tokenized data is hacked, the original data is still safe and secure in a separate location. |
'Data Analytics > Google Data Analytics' 카테고리의 다른 글
Sample size calculator (0) | 2023.06.06 |
---|---|
[Coursera] Course 4. Process Data from Dirty to Clean (0) | 2023.06.05 |
Sites for open data (0) | 2023.05.31 |
[Coursera] Course 2. Ask Questions to Make Data-Driven Decisions (0) | 2023.05.18 |
[Coursera] Course 1. Foundations: Data, Data, Everywhere (0) | 2023.05.10 |