-
Notifications
You must be signed in to change notification settings - Fork 0
Description
DB DB DB (https://engineering.videoblocks.com/web-architecture-101-a3224e126947)
Improve db performace - this can be done with
1.)Index
2.)Partioning at table level (funny is partiong can be done at db level like adding more servers ..horz scalling or sharding)
Hhorizontal Scaling /partioning - HorizontalParitoning (Vertical and Horizontal) (Split data like 100 emp in one,200 emp in another)
VERTICAL Scaling /partioning - store few files in one table and rest in other like(Class and sylabbus)
https://www.sqlshack.com/database-table-partitioning-sql-server/
4.)Purging
4.)archiving
Purging is the process of freeing up space in the database or of deleting obsolete data that is not required by the system. The purge process can be based on the age of the data or the type of data.
Archive Process
Archiving is the process of backing up the obsolete data that will be deleted during the purge process. During the archive process, data will be moved from the main transactional tables to the backup tables.)
5.)Sharding(DB Level not like table level) -Sharding is the equivalent of "horizontal partitioning".
When you shard a database, you create replica's of the schema, and then divide what data is stored in each shard based on a shard key. For example, I might shard my customer database using CustomerId as a shard key - I'd store ranges 0-10000 in one shard and 10001-20000 in a different shard. When choosing a shard key, the DBA will typically look at data-access patterns and space issues to ensure that they are distributing load and space across shards evenly.
*Tables can be partitioned into up to 1024K-1 separate partitions. Any table can be partitioned except those tables containing columns with LONG or LONG RAW datatypes. You can, however, use tables containing columns with CLOB or BLOB datatypes.
Good examples of horizontal scaling are Cassandra, MongoDB, Google Cloud Spanner ..
and a good example of vertical scaling is MySQL
6.)If your application is bound by read performance, you can add caches or database replicas
8.)reporting db
9.)https://www.oracle.com/technetwork/database/database-technologies/timesten/overview/timesten-imdb-cache-101293.html
10.) Add caching for cosntant queries like(get all dog pics) -use redis
https://stackoverflow.com/questions/5400163/when-to-redis-when-to-mongodb
https://static.simonwillison.net/static/2010/redis-tutorial/
12.)Multiple servers -like we have today-same dta copied into multiple dbs
->
oracle is declarative oprograaming-meaning you tell what u want but not how u want-
Declarative programming is when you write your code in such a way that it describes what you want to do, and not how you want to do it. It is left up to the compiler to figure out the how.
Examples of declarative programming languages are SQL and Prolog.
BEST DOC->
https://pdfs.semanticscholar.org/presentation/2fd4/2646536d516d4cb31cf2ade30f95cb3b6d5e.pdf
http://www.agiledata.org/essays/umlDataModelingProfile.html
--> partition pruning
good question- if i have a horizontal partioning on Months like
JAN-FEB
MAR-APR
NOW,how does an idnex work on them -if i need to pull data
When SELECTing from the table, the first thing done is to decide which partition(s) may contain the desired data. This is "partition pruning" and uses the "partition key" (which is apparently to be year). Then the select is applied to the subtables that are relevant, using whatever index is appropriate. In that case it is a waste to have INDEX(year, ...), since you are already pruned down to the year.
Don't use partitioning unless you expect at least a million rows. (That would be a lot of years.)
Don't use partitioning unless you have a use case where it will help you. (Apparently not your case.)
Don't have columns for the parts of a datetime, when it is so easy to compute the parts: YEAR(date), MONTH(date), etc.
Don't index columns with low cardinality; the Optimizer will end up scanning the entire table anyway -- because it is faster. (eg: month='05')
--->
partion - better manageability
index-better perfromace
https://richardfoote.wordpress.com/2008/08/28/regular-index-vs-partitions-the-best-of-both-worlds/
---->
Types of partioins-
Oracle provides the following partitioning methods:
Range Partitioning (JAN-FEB) (MAR-APR).so 2 queries can run in parllel much faster
List Partitioning (West sales) (East Sales))
Hash Partitioning -With hash partitioning, you spread your rows uniformly, seemingly randomly over the partitions (in fact, along a hash function on one of the columns). The advantage you expect here is that every single query will be decomposed into parallel queries on all the partitions with smaller data volume for each; the results will then of course have to be centrally combined, though.
Composite Partitioning -combo of any above
Nosql vs sql
https://www.thegeekstuff.com/2014/01/sql-vs-nosql-db/?utm_source=tuicool
-->
-->https://pdfs.semanticscholar.org/presentation/2fd4/2646536d516d4cb31cf2ade30f95cb3b6d5e.pdf
oracle uses b+ trees.
in b+tree the pointer information to the record is stored at leaf node. this givesspace to store more index or nodes.
-->
InnoDB and MYISAM, are storage engines for MySQL.
These two differ on their locking implementation: InnoDB locks the particular row in the table, and MyISAM locks the entire MySQL table.
You can specify the type by giving MYISAM OR InnoDB while creating a table in DB.
ISAM - indexed sequential access method (DEVELOPED BY IBM for mainframe to index data)
Each MyISAM table is stored on disk in three files (if it is not partitioned).
.frm file to store the definition of the table
The data file has a .MYD (MYData) extension.
The index file has a .MYI (MYIndex) extension.
-->
MySql generally has three layers:
Top layer : connection handling, authentication, security
Middle layer: query parsing, analysis, optimization, caching and built-in functions
Botton layer: storage engines
See comment below from Eric Vanier, “A storage engine is a software module that a database management system uses to create, read, update data from a database like for example MySQL Innodb and MyIsam.”
-->
MONGO storage engine-
The storage engine is the component of the database that is responsible for managing how data is stored, both in memory and on disk. MongoDB supports multiple storage engines, as different engines perform better for specific workloads. Choosing the appropriate storage engine for your use case can significantly impact the performance of your applications.
➤ WiredTiger Storage Engine (Default)
WiredTiger is the default storage engine starting in MongoDB 3.2. It is well-suited for most workloads and is recommended for new deployments. WiredTiger provides a document-level concurrency model, checkpointing, and compression, among other features.
In MongoDB Enterprise, WiredTiger also supports Encryption at Rest. See Encrypted Storage Engine.
➤ In-Memory Storage Engine
In-Memory Storage Engine is available in MongoDB Enterprise. Rather than storing documents on-disk, it retains them in-memory for more predictable data latencies.
➤ MMAPv1 Storage Engine (Deprecated as of MongoDB 4.0)
MMAPv1 is the default storage engine for MongoDB versions 3.0 and earlier.
-->>
b trre vs binary tress
b+ tree vs b- tree->
https://techdifferences.com/difference-between-b-tree-and-binary-tree.html
https://techdifferences.com/difference-between-b-tree-and-binary-tree.html
Explain Plan -its lik planned route
Execution Plan -its like the actual taken route.this can change
when you ask oracle for EXPLAIN PLAN it gives u the execution plan.SO,USE DBMS_XPlan TO GET time after execution for each stamtemt.
DBMS_XPlan- is independent of hardware like table sizes..where as execution plan can change depending on schema or table size
EXPLAIN PLAN command — This displays an execution plan for a SQL statement without actually executing the statement.
V$SQL_PLAN — A dynamic performance view introduced in Oracle 9i that shows the execution plan for a
SQL statement that has been compiled into a cursor and stored in the cursor cache.
---->
Ways to get explain plan-
https://blogs.oracle.com/sql/how-to-create-an-execution-plan#dbmsxplan
reason db chosses a different execution plan is because
1.)different schemas
2.)different data sizes of same table
ORDER WHICH Is followed ->but this can change based on excution engine
FROM [MyTable]
ON [MyCondition]
JOIN [MyJoinedTable]
WHERE [...]
GROUP BY [...]
HAVING [...]
SELECT [...]
ORDER BY [...]
Index types= (hAVE index on joins)
b-tree index (most common) -note: if you ask for uppercase or lowercase like LOWER(name) where name is index then index search wont be applied.insted function based search will be used
function based index - define a arthimetic function and use it as idnex like (a+b) =index and in query (a+b)<10
bitmap indexes : used for boolean or gender
bitmap join indexes -
compressed indexes -simialr to b-tree but fewer leaf nodes so less I/o
domain indexes - to create custom index like on video or geo spacital data
https://www.databasejournal.com/features/oracle/article.php/3739071/Oracle-Indexing---What-Where-When.htm
https://docs.oracle.com/cd/B14117_01/appdev.101/b10795/adfns_in.htm
https://www.drupal.org/project/drupal/issues/83738
Guidlines to create an index-
Create an index if you frequently want to retrieve less than about 15% of the rows in a large table. This threshold percentage varies greatly, however, according to the relative speed of a table scan and how clustered the row data is about the index key. The faster the table scan, the lower the percentage; the more clustered the row data, the higher the percentage.
Index columns that are used for joins to improve join performance.
Primary and unique keys automatically have indexes, but you might want to create an index on a foreign key; see Chapter 3, "Maintaining Data Integrity Through Constraints" for more information.
Small tables do not require indexes; if a query is taking too long, then the table might have grown from small to large.
Some columns are strong candidates for indexing. Columns with one or more of the following characteristics are good candidates for indexing:
Values are unique in the column, or there are few duplicates.
There is a wide range of values (good for regular indexes).
There is a small range of values (good for bitmap indexes).
*WHEN CREATING INDEX ORDER matters
------->
Triggers – A trigger is a procedure that is run either before or after an action (such as a create, update, or delete) is performed on a row in a database table.
->
-->
Database cursors – A database cursor is effectively a handle to the results of a SQL query, enabling you to move forward and backward through the result set one or more records at a time.
---->tYPES OF DB
Object/relational databases. Object/relational databases (ORDBs), or more properly object/relational database management systems (ORDBMSs), add new object storage capabilities to RDBMSs. ORDBs, add new facilities to integrate management of traditional fielded data, complex objects such as time-series and geo-spatial data and diverse binary media such as audio, video, images, and (sometimes) Java applets. ORDBMSs basically add to RDBMSs features such as defined data types, for example you could define a data type called SurfaceAddress that has all of the attributes and behaviors of an address, as well as the ability to navigate objects in addition to an RDBMS’s ability to join tables. By implementing objects within the database, an ORDBMS can execute complex analytical and data manipulation operations to search and transform multimedia and other complex objects. ORDBs support the robust transaction and data-management features of RDBMSs while at the same time offer a limited form of the flexibility of object-oriented databases. Because of ORDBMSs relational foundation, database administrators work with familiar tabular structures and data definition languages (DDLs) and programmers access them via familiar approaches such as SQL3, JDBC, and proprietary call interfaces.
Object databases. Object databases (ODBs), also known as object-oriented databases (OODBs) or object-oriented database management systems (OODBMSs), nearly seamlessly add database/persistence functionality to object programming languages. In other words, full-fledged objects are implemented in the database. They bring much more than persistent storage of programming language objects. ODBs extend the semantics of Java to provide full-featured database programming capability, via new class libraries specific to the ODB vendor, while retaining native language compatibility. A major benefit of this approach is the unification of the application and database development into a seamless model. As a result, applications require less code, use more natural persistence modeling, and code bases are easier to maintain. Object-oriented developers can write complete database applications with a modest amount of additional effort without the need to marshal their objects into flatten data structures for storage, as a result forgoing the marshalling overhead inherent with other persistence mechanism technologies (such as RDBs). This one-to-one mapping of application objects to database objects provides higher performance management of objects and enables better management of the complex interrelationships between objects.
Native XML databases. Native XML databases store information as XML documents following one of two approaches: First, a native XML database will either store a modified form of the entire XML document in the file system, perhaps in a compressed or pre-parsed binary form. Second, a native XML database may opt to map the structure of the document to the database, for example mapping the Document Object Model (DOM) to internal structures such as Elements, Attributes, and Text – exactly what is mapped depends on the database. The most important difference between these approaches, from the point of view of an application developer, is the way they are accessed: with the first approach the only interface to the data is XML and related technologies, such as XPath (a language design specifically for addressing parts of an XML document, visit www.w3.org for details) or the DOM whereas with the second approach the database should be accessible via standard technologies such as JDBC. The important thing to understand about native XML databases is that they work with the internal structures of the XML documents – they don’t just store them as a binary large object (BLOB) in the database.
Flat files. Flat files, such as .TXT or. CSV (comma separated value) files, are commonly used to store data. A single file can be used to store one type of data structure, such as customer information or sales transaction information, or through a coding and formatting strategy the structures of several types of data structures. One approach to flat file organization is either to have data values separated by a pre-defined character, such as a comma, or tag, such as in an XML document. Another common approach is to delimit data values by size – the first 20 characters of the row represent the first name of the customer, the next 20 characters represent the surname, and so on.
Hierarchical databases. Hierarchical databases link data structures together like a family tree such that each record type has only one owner, for example an order is owned by only one customer. Hierarchical structures were widely used in the first mainframe database management systems, and are still a very common source of data in many large organizations. Hierarchical databases fell out of favor with the advent of relational databases due to their lack of flexibility because it wouldn’t easily support data access outside the original design of the data structure. For example, in the customer-order schema you could only access an order through a customer, you couldn’t easily find all the orders that included the sale of a widget because the schema isn’t designed to all that.
Prevalence layer. Klaus Wuestefeld defines prevalence as “transparent persistence, fault-tolerance and load-balancing of the execution of the business logic of an information system through the use of state snapshots as well as command and query queuing or logging”. A prevalence layer is effectively a simple persistence framework that serializes obj
->http://www.agiledata.org/essays/relationalDatabases.html
------->
VARCHAR is variable-length.
CHAR is fixed length.
-->
CHAR
Used to store character string value of fixed length.
The maximum no. of characters the data type can hold is 255 characters.
It's 50% faster than VARCHAR.
Uses static memory allocation.
VARCHAR
Used to store variable length alphanumeric data.
The maximum this data type can hold is up to
Pre-MySQL 5.0.3: 255 characters.
In MySQL 5.0.3+: 65,535 characters shared for the row.
It's slower than CHAR.
Uses dynamic memory allocation*.
-->
Data Type Storage Required
TINYINT 1 byte
SMALLINT 2 bytes
MEDIUMINT 3 bytes
INT, INTEGER 4 bytes
BIGINT 8 bytes
FLOAT(p) 4 bytes if 0 <= p <= 24, 8 bytes if 25 <= p <= 53
FLOAT 4 bytes
DOUBLE [PRECISION], REAL 8 bytes
DECIMAL(M,D), NUMERIC(M,D) Varies; see following discussion
BIT(M) approximately (M+7)/8 bytes
-->
VARCHAR is reserved by Oracle to support distinction between NULL and empty string in future, as ANSI standard prescribes.
VARCHAR2 does not distinguish between a NULL and empty string, and never will.
---->
FOREIGN KEYs and JOINs don't do the same thing!
A FOREIGN KEY enforces data integrity, making sure the data confirms to some rules when it is added to the DB.
A JOIN is used when you extract/query data from the DB by giving rules how to select the data.
JOINs work if there are FK or not.
FK's work if you extract data with or without JOINs.
CONCLUSION: FK and JOIN don't allow you to achieve the same goal!
-->
https://www.sqlshack.com/database-table-partitioning-sql-server/