Planet MySQL

Planet MySQL - https://planet.mysql.com
  1. Recently one of our client approach Mydbops with Query slowness on a MySQL environment . They deployed the new code for generate the huge reports for the year end analytics data . After the deployment the queries were extremely slow and they struggled lot , then they approached us for the solution. After the analysis, their OLAP database as expected it was IO bound with 100% disk IOPS utilised during the report generation. So, the queries were starving for the Disk IO slows the process . Problem statement : Reports are majorly focused on two larger log tables ( emp_Report_model , emp_details ) . The report generator (procedure) is using the count(*) statement to stimulate the aggregated data on each call. It is required for their business purpose . Count(*) is terribly slow in MySQL ( Using MySQL 5.7 ) as it needs to count all the rows in the table . ( MySQL 8.0 has Innodb parallel read threads that can make count(*) faster ) MySQL INDEX can’t help as we are aggregating the complete data ( 90% of data on each call ) the queries will be a Full Table Scan (FTS). Then on further analysis it is found it is only a INSERT workload on those tables. There is no UPDATE’s or DELETE’s on those tables . we proposed a solution to overcome the problem with the help of Clickhouse and migrating the data to Clickhouse. What is Clickhouse ? ClickHouse is an open source column-oriented database management system capable of real time generation of analytical data reports using SQL queries. Clickhouse Website The major limitation on MySQL to Clickhouse replication is we can only apply the INSERT statements (append) from the MySQL. Clickhouse will not support for UPDATE’s and DELETE’s as a columnar database it makes sense. Clickhouse Installation : The installation is quite straight forward. The steps are available in Clickhouse official web site, yum install rpm-build epel-release curl -s https://packagecloud.io/install/repositories/altinity/clickhouse/script.rpm.sh | sudo bash yum install -y mysql-community-devel python34-devel python34-pip gcc python-devel libevent-devel gcc-c++ kernel-devel libxslt-devel libffi-devel openssl-devel python36 python36-devel python36-libs python36-tools Clickhouse Server yum install -y clickhouse-server clickhouse-client Clickhouse MySQL replication Library pip3 install clickhouse-mysql Clickhouse startup : [root@mydbopslabs192 ~]# /etc/init.d/clickhouse-server statusclickhouse-server service is stopped[root@mydbopslabs192 ~]#[root@mydbopslabs192 ~]# /etc/init.d/clickhouse-server startStart clickhouse-server service: /etc/init.d/clickhouse-server: line 166: ulimit: open files: cannot modify limit: Operation not permittedPath to data directory in /etc/clickhouse-server/config.xml: /var/lib/clickhouse/DONE[root@mydbopslabs192 ~]#[root@mydbopslabs192 ~]# /etc/init.d/clickhouse-server statusclickhouse-server service is running [root@mydbopslabs192 ~]# clickhouse-client ClickHouse client version 19.17.4.11. Connecting to localhost:9000 as user default. Connected to ClickHouse server version 19.17.4 revision 54428. mydbopslabs192 show databases; SHOW DATABASES ┌─name────┐ │ default │ │ system │ └─────────┘ 2 rows in set. Elapsed: 0.003 sec. It is all set with installation , Next step i need to migrate the data from MySQL to Clickhouse and configure the replication for those tables . Data Migration from MySQL to Clickhouse : Step 1 ( Dump the Clickhouse based schema structure from MySQL ) : First thing i need to migrate the MySQL tables structure to Clickhouse . MySQL and Clickhouse having different data types . So, we cannot apply the same structure from MySQL to Clickhouse . The below document providing the neat comparison between MySQL and Clickhouse data types . https://shinguz.ch/blog/clickhouse-data-types-compared-to-mysql-data-types/ Let us convert table structure from MySQL to Clickhouse using the clickhouse-mysql tool. [root@mydbopslabs192 ~]# clickhouse-mysql --src-host=192.168.168.191 --src-user=clickhouse --src-password=Click@321 --create-table-sql-template --with-create-database --src-tables=data_Analytics.emp_Report_model,data_Analytics.emp_details > data_Reports_Jan21st.sql 2020-01-21 09:03:40,150/1579597420.150730:INFO:Starting 2020-01-21 09:03:40,150/1579597420.150977:DEBUG:{'app': {'binlog_position_file': None, 'config_file': '/etc/clickhouse-mysql/clickhouse-mysql.conf', 'create_table_json_template': False, 2020-01-21 09:03:40,223/1579597420.223511:DEBUG:Connect to the database host=192.168.168.191 port=3306 user=clickhouse password=Click@321 db=data_Analytics 2020-01-21 09:03:40,264/1579597420.264610:DEBUG:Connect to the database host=192.168.168.191 port=3306 user=clickhouse password=Click@321 db=data_Analytics Dumping the table structure , [root@mydbopslabs192 ~]# less data_Reports_Jan12th.sql | grep CREATECREATE DATABASE IF NOT EXISTS data_Analytics;CREATE TABLE IF NOT EXISTS data_Analytics.emp_details (CREATE DATABASE IF NOT EXISTS data_Analytics;CREATE TABLE IF NOT EXISTS data_Analytics.emp_Report_model ( [root@mydbopslabs192 ~]# cat data_Reports_Jan12th.sql | head -n7 CREATE DATABASE IF NOT EXISTS data_Analytics; CREATE TABLE IF NOT EXISTS data_Analytics.emp_details ( WatchID Nullable(String), JavaEnable Nullable(Int32), Title Nullable(String), GoodEvent Nullable(Int32), EventTime Nullable(DateTime), Step 2 ( Import the schema structure into Clickhouse ) : [root@mydbopslabs192 ~]# clickhouse-client -mn < data_Reports_Jan12th.sql [root@mydbopslabs192 ~]# [root@mydbopslabs192 ~]# fg clickhouse-client mydbopslabs192 use data_Analytics USE data_Analytics Ok. 0 rows in set. Elapsed: 0.001 sec. mydbopslabs192 show tables; SHOW TABLES ┌─name─────────────┐ │ emp_Report_model │ │ emp_details │ └──────────────────┘ 2 rows in set. Elapsed: 0.003 sec. Step 3 ( Migrating the data and keep replication sync ) : Before configure the replication , the MySQL server should be configured with the below variables . Mandatory MySQL settings : server-id = <your id>binlog_format = ROWbinlog_row_image = FULL Now, we can configure the replication in two ways , Migrate the existing data , then configure the replication Migrate the existing data and configure the replication in one step i) Migrate the existing data , then configure the replication : Commands to migrating the existing data [root@mydbopslabs192 ~]# clickhouse-mysql --src-host=192.168.168.191 --src-user=clickhouse --src-password=Click@321 --migrate-table --src-tables=data_Analytics.emp_Report_model --dst-host=127.0.0.1 --dst-schema data_Analytics --dst-table emp_Report_model --log-file=emp_Report_model.log [root@mydbopslabs192 ~]# less emp_Report_model.log | grep -i migra'migrate_table': True,'table_migrator': {'clickhouse': {'connection_settings': {'host': '127.0.0.1',2020-01-21 11:04:57,744/1579604697.744533:INFO:List for migration:2020-01-21 11:04:57,744/1579604697.744947:INFO:Start migration data_Analytics.emp_Report_model2020-01-21 11:04:57,891/1579604697.891935:INFO:migrate_table. sql=SELECT WatchID,JavaEnable,Title,GoodEvent,EventTime,Eventdate,CounterID,ClientIP,ClientIP6,RegionID,UserID,CounterClass,OS,UserAgent,URL,Referer,URLDomain,RefererDomain,Refresh,IsRobot,RefererCategories,URLCategories,URLRegions,RefererRegions,ResolutionWidth,ResolutionHeight,ResolutionDepth,FlashMajor,FlashMinor,FlashMinor2,NetMajor,NetMinor,UserAgentMajor,UserAgentMinor,CookieEnable,JavascriptEnable,IsMobile,MobilePhone,MobilePhoneModel,Params,IPNetworkID,TraficSourceID,SearchEngineID,SearchPhrase,AdvEngineID,IsArtifical,WindowClientWidth,WindowClientHeight,ClientTimeZone,ClientEventTime,SilverlightVersion1,SilverlightVersion2,SilverlightVersion3,SilverlightVersion4,PageCharset,CodeVersion,IsLink,IsDownload,IsNotBounce,FUniqID,HID,IsOldCounter,IsEvent,IsParameter,DontCountHits,WithHash,HitColor,UTCEventTime,Age,Sex,Income,Interests,Robotness,GeneralInterests,RemoteIP,RemoteIP6,WindowName,OpenerName,HistoryLength,BrowserLanguage,BrowserCountry,SocialNetwork,SocialAction,HTTPError,SendTiming,DNSTiming,ConnectTiming,ResponseStartTiming,ResponseEndTiming,FetchTiming,RedirectTiming,DOMInteractiveTiming,DOMContentLoadedTiming,DOMCompleteTiming,LoadEventStartTiming,LoadEventEndTiming,NSToDOMContentLoadedTiming,FirstPaintTiming,RedirectCount,SocialSourceNetworkID,SocialSourcePage,ParamPrice,ParamOrderID,ParamCurrency,ParamCurrencyID,GoalsReached,OpenstatServiceName,OpenstatCampaignID,OpenstatAdID,OpenstatSourceID,UTMSource,UTMMedium,UTMCampaign,UTMContent,UTMTerm,FromTag,HasGCLID,RefererHash,URLHash,CLID,YCLID,ShareService,ShareURL,ShareTitle,IslandID,RequestNum,RequestTry FROM data_Analytics.emp_Report_model mydbopslabs192 select count(*) from data_Analytics.emp_Report_model; ┌─count()─┐ │ 8873898 │ └─────────┘ 1 rows in set. Elapsed: 0.005 sec. configuring the replication , [root@mydbopslabs192 ~]# clickhouse-mysql --src-host=192.168.168.191 --src-user=clickhouse --src-password=Click@321 --src-tables=data_Analytics.emp_Report_model --dst-host=127.0.0.1 --dst-schema data_Analytics --dst-table emp_Report_model --src-resume --src-wait --nice-pause=1 --log-level=info --csvpool --mempool-max-flush-interval=60 --mempool-max-events-num=1000 --pump-data --src-server-id=100 --log-file=emp_Report_model_Replication.log 2020-01-21 11:22:18,974/1579605738.974186:INFO:CSVWriter() self.path=/tmp/csvpool_1579605738.9738157_d643efe5-5ae0-47df-8504-40f61f2c139f.csv 2020-01-21 11:22:18,976/1579605738.976613:INFO:CHCSWriter() connection_settings={'port': 9000, 'host': '127.0.0.1', 'password': '', 'user': 'default'} dst_schema=data_Analytics dst_table=emp_Report_model 2020-01-21 11:22:18,976/1579605738.976936:INFO:starting clickhouse-client process 2020-01-21 11:22:19,160/1579605739.160906:INFO:['data_Analytics.emp_Report_model'] 2020-01-21 11:22:19,166/1579605739.166096:INFO:['data_Analytics.emp_Report_model'] 2020-01-21 11:22:19,170/1579605739.170744:INFO:['data_Analytics.emp_Report_model'] (END) ii) Migrate the existing data and configure the replication in one step : Here we need to define the flag –migrate-table with the replication command . [root@mydbopslabs192 ~]# clickhouse-mysql --src-host=192.168.168.191 --src-user=clickhouse --src-password=Click@321 --src-tables=data_Analytics.emp_Report_model --dst-host=127.0.0.1 --dst-schema data_Analytics --dst-table emp_Report_model --src-resume --src-wait --nice-pause=1 --log-level=info --csvpool --mempool-max-flush-interval=60 --mempool-max-events-num=1000 --pump-data --src-server-id=100 --migrate-table --log-file=emp_Report_model_replication_mig.log [root@mydbopslabs192 ~]# less emp_Report_model_replication_mig.log | grep -i mig2020-01-21 11:27:53,263/1579606073.263505:INFO:List for migration:2020-01-21 11:27:53,263/1579606073.263786:INFO:Start migration data_Analytics.emp_Report_model2020-01-21 11:27:53,316/1579606073.316788:INFO:migrate_table. sql=SELECT WatchID,JavaEnable,Title,GoodEvent,EventTime,Eventdate,CounterID,ClientIP,ClientIP6,RegionID,UserID,CounterClass,OS,UserAgent,URL,Referer,URLDomain,RefererDomain,Refresh,IsRobot,RefererCategories,URLCategories,URLRegions,RefererRegions,ResolutionWidth,ResolutionHeight,ResolutionDepth,FlashMajor,FlashMinor,FlashMinor2,NetMajor,NetMinor,UserAgentMajor,UserAgentMinor,CookieEnable,JavascriptEnable,IsMobile,MobilePhone,MobilePhoneModel,Params,IPNetworkID,TraficSourceID,SearchEngineID,SearchPhrase,AdvEngineID,IsArtifical,WindowClientWidth,WindowClientHeight,ClientTimeZone,ClientEventTime,SilverlightVersion1,SilverlightVersion2,SilverlightVersion3,SilverlightVersion4,PageCharset,CodeVersion,IsLink,IsDownload,IsNotBounce,FUniqID,HID,IsOldCounter,IsEvent,IsParameter,DontCountHits,WithHash,HitColor,UTCEventTime,Age,Sex,Income,Interests,Robotness,GeneralInterests,RemoteIP,RemoteIP6,WindowName,OpenerName,HistoryLength,BrowserLanguage,BrowserCountry,SocialNetwork,SocialAction,HTTPError,SendTiming,DNSTiming,ConnectTiming,ResponseStartTiming,ResponseEndTiming,FetchTiming,RedirectTiming,DOMInteractiveTiming,DOMContentLoadedTiming,DOMCompleteTiming,LoadEventStartTiming,LoadEventEndTiming,NSToDOMContentLoadedTiming,FirstPaintTiming,RedirectCount,SocialSourceNetworkID,SocialSourcePage,ParamPrice,ParamOrderID,ParamCurrency,ParamCurrencyID,GoalsReached,OpenstatServiceName,OpenstatCampaignID,OpenstatAdID,OpenstatSourceID,UTMSource,UTMMedium,UTMCampaign,UTMContent,UTMTerm,FromTag,HasGCLID,RefererHash,URLHash,CLID,YCLID,ShareService,ShareURL,ShareTitle,IslandID,RequestNum,RequestTry FROM data_Analytics.emp_Report_model [root@mydbopslabs192 ~]# less emp_Report_model_replication_mig.log | grep -i process 2020-01-21 11:28:01,071/1579606081.071054:INFO:starting clickhouse-client process Validating count post inserting some records in MySQL (Source) mydbopslabs192 select count(*) from data_Analytics.emp_Report_model; ┌─count()─┐│ 8873900 │└─────────┘ MySQL to Clickhouse replication is working as expected . Performance comparison for OLAP workload ( MySQL vs Clickhouse ) : Count(*) in MySQL : mysql> select count(*) from emp_Report_model; 1 row in set (32.68 sec) Count(*) in clickhouse : mydbopslabs192 select count(*) from emp_Report_model; 1 rows in set. Elapsed: 0.007 sec. Aggregated query in MySQL : mysql> select emp_Report_model.WatchID,emp_Report_model.JavaEnable,emp_Report_model.Title,emp_Report_model.RegionID from emp_Report_model inner join emp_details on emp_Report_model.WatchID=emp_details.WatchID and emp_Report_model.RegionID=emp_details.RegionID and emp_Report_model.UserAgentMajor=emp_details.UserAgentMajor where emp_Report_model.SocialSourcePage is not null and emp_details.FetchTiming != 0 order by emp_Report_model.WatchID; 292893 rows in set (1 min 2.61 sec) Aggregated query in Clickhouse : mydbopslabs192 select emp_Report_model.WatchID,emp_Report_model.JavaEnable,emp_Report_model.Title,emp_Report_model.RegionID from emp_Report_model inner join emp_details on emp_Report_model.WatchID=emp_details.WatchID and emp_Report_model.RegionID=emp_details.RegionID and emp_Report_model.UserAgentMajor=emp_details.UserAgentMajor where emp_Report_model.SocialSourcePage is not null and emp_details.FetchTiming != 0 order by emp_Report_model.WatchID; 292893 rows in set. Elapsed: 1.710 sec. Processed 9.37 million rows, 906.15 MB (7.75 million rows/s., 749.15 MB/s.) Yes, Clickhouse is performing very well with COUNT(*) and analytical queries . Query Model MySQL Clickhouse count(*) 33 seconds 0.1 seconds OLAP Query 63 seconds 1.7 seconds MySQL Vs ClickhouseThe above graph is just a pictorial representation of queries tested. Though Clickhouse excels in analytics workload it has it own limitations too. Now we have another happy customer at Mydbops who gets his analytics dashboard faster now. Featured image credits Stephen Dawson on Unsplash
  2. Galera is the best solution for High Availability, It is being used by many peoples world wide . Galera is doing synchronous replication ( really it is Certification based replication ) to keep update the data on group nodes . In this blog I have explained about “How the Galera replication works?” . For the better understanding, I have made an architecture diagram to describe the replication flow . I have also provided the explanation for the key words which has used in the architecture diagram . Architecture flow Diagram : What is writeset ? Writeset contains all changes made to the database by the transaction and append_key of the changed rows . What is append_key ? Append_key registers the key of the changed data by the transaction. The key for rows can be represented in three parts as DATABASE NAME, TABLE NAME, PRIMARY KEY . If the table don’t have the PRIMARY KEY, the HASH of the modified data will be the part of the writeset . What is Certification in Galera ? Certification in Galera will be performed to detect the conflicts and the data consistency among the group . It will be performed before the transaction comiit . What is CVV ( Central Certification Vector ) ? CVV is used to detect the conflcits . The modified keys will added in to the Central Certification Vector. If the added key is already part of the vector, then conflict resolution checks are triggered. Hope this blog will helps someone, who is working with Galera Cluster . I will be come up with my next blog soon . Thanks !!!
  3. In this blog I am going to go through the most important requirements thatNDB Cluster 8.0 is based on. I am going to also list a number of consequencesthese requirements have on the product and what it supports.One slideshare.net I uploaded a presentation of the NDB Cluster 8.0requirements. In this blog and several accompanying I am going to present thereasoning that these requirements led to in terms of software architecture, datastructures and so forth.The requirements on NDB Cluster 8.0 is the following:1) Unavailability of less than 30 seconds per year (Class 6 Availability)2) Predictable latency3) Transparent Distribution and Replication4) Write and Read Scalability5) Highest availability even with 2 replicas6) Support SQL, LDAP, File System interface,...7) Mixed OLTP and OLAP for real-time data analysis8) Follow HW development for CPUs, networks, disks and memories9) Follow HW development in Cloud SetupsThe original requirements of NDB Cluster was to only support Class 5 Availability.Telecom providers have continued supporting even higher number of subscribers pertelecom database and thus driving the requirements to even be Class 6 Availability.NDB Cluster have more than 15 years proven track record of handling Class 6Availability.The requirements on predictable latency means that we need to be able to handletransactions involving around twenty operations within 10 milliseconds even whenthe cluster is working at a high load.To make sure that application development is easy we opted for a model wheredistribution and replication is transparent from the application code. This means thatNDB Cluster is one of very few DBMSs that support auto-sharding requirements.High Write Scalability has been a major requirement in NDB from day one.NDB Cluster can handle tens of million transactions per second, most competingDBMS products that are based on replication protocols can only handletens of thousands of transactions per second.We used an arbitration model to avoid the requirements of 3 replicas, withNDB Cluster 8.0 we fully support 3 and 4 replicas as well, but even with 2 replicaswe get the same availability as competing products based on replication protocolsrequire 3 replicas for.The original requirements on NDB didn't include a SQL interface. An efficientAPI was much more important for telecom applications. However when meetingcustomers of a DBMS it was obvious that an SQL interface was needed.So this requirement was added in the early 2000s. However most early users ofNDB Cluster still opted for a more direct API and this means that NDB Clustertoday have LDAP interfaces through OpenLDAP, file system interface throughHopsFS and a lot of products that use the NDB API (C++), ClusterJ (Java) andan NDB NodeJS API.The model of development for NDB makes it possible to also handle complex queriesin an efficient manner. Thus in development of NDB Cluster 8.0 we added therequirement to better support also OLAP use cases of the OLTP data that is stored inNDB Cluster. We have already made very significant improvements in this area bysupporting parallelised filters and to a great extent parallelisation of join processingin the NDB Data Nodes. This is an active development area for the cominggenerations of NDB Cluster.NDB Cluster started its development in the 1990s. Already in this development wecould foresee some of the HW development that was going to happen. The producthas been able to scale as HW have been more and more scalable. Today this means thateach node in NDB Cluster can scale to 64 cores, data nodes can scale to 16 TB ofmemory and at least 100 TB of disk data and can benefit greatly from higher andhigher bandwidth on the network.Finally modern deployments often happen in cloud environments. Clouds are basedon an availability model with regions, availability domains and failure domains.Thus NDB Cluster software needs to make it possible to make efficient use oflocality in the HW configurations.
  4. NDB Cluster was originally developed for Network DataBases in the telecomnetwork. I worked in a EU project between 1991 and 1995 that focused ondeveloping a pre-standardisation effort on UMTS that later became standardisedunder the term 3G. I worked in a part of the project where we focused onsimulating the network traffic in such a 3G network. I was focusing my attentionespecially on the requirements that this created on a network databasein the telecom network.In the same time period I also dived deeply into research literatures about DBMSimplementation.The following requirements from the 3G studies emerged as the most important:1) Class 5 Availability (less than 5 minutes of unavailability per year)2) High Write Scalability as well as High Read Scalability3) Predictable latency down to milliseconds4) Efficient API5) Failover in crash scenarios within seconds or even subseconds with a real-time OSIn another blog on the influences leading to the use of an asynchronous programmingmodel in NDB Cluster we derive the following requirements on the softwarearchitecture.1) Fail-fast architecture (implemented through ndbrequire macro in NDB)2) Asynchronous programming (provides much tracing information in crashes)3) Highly modular SW architecture4) JAM macros to track SW in crash eventsIn another blog I present the influences leading to NDB Cluster using a sharednothing model.One important requirement that NDB Cluster is fairly unique in addressing is highwrite scalability. Most DBMSs solves this by grouping together large amounts ofsmall transactions to make commits more efficient. This means that most DBMSshave a very high cost of committing a transaction.Modern replicated protocols actually have even made this worse. As an example inmost modern replicated protocols all transactions have to commit in a serial fashion.This means that commit handling is a major bottleneck in many modern DBMSs.Often this limits their transaction rates to tens of thousands commits per second.NDB Cluster went another path and essentially commits every single row changeseparate from any other row change. Thus the cost of executing 1000 transactionswith 1000 operations per transaction is exactly the same as the cost of executing1 million single row transactions.To achieve the grouping we used the fact that we are working in an asynchronousenvironment. Thus we used several levels of piggybacking of messages. One of themost important things here is that one socket is used to transport many thousands ofsimultaneous database transactions. With NDB Cluster 8.0.20 we use multiple socketsbetween data nodes and this scales another 10-20x to ensure that HW limitations isthe bottleneck and not the NDB software.The asynchronous programming model ensures that we can handle thousands ofoperations each millisecond and that changing from working on one transaction toanother is a matter of tens to hundreds of nanoseconds. In addition we can handlethese transactions independently in a number of different data nodes and evenwithin different threads within the same data node. Thus we can handle tens of millionstransactions per second even within a single data node.The protocol we used for this is a variant of the two-phase commit protocol withsome optimisations based on the linear two-phase commit protocol. However therequirements on Class 5 Availability meant that we had to solve the blocking partof the two-phase commit protocol. We solved this by recreating the state of thefailed transaction coordinators in a surviving node as part of failover handling.This meant that we will never be blocked by a failure as long as there is still asufficient amount of nodes to keep the cluster operational.
  5. The requirements on Class 5 availability and immediate failover had two importantconsequences for NDB Cluster. The first is that we wanted a fail-fast architecture.Thus as soon as we have any kind of inconsistency in our internal data structures weimmediately fail and rely on the failover and recovery mechanisms to make the failurealmost unnoticable. The second is that we opted for a shared nothing model where allreplicas are able to take over immediately.The shared disk model requires replay of the REDO log before failover is completedand this can be made fast, but not immediate. In addition as one quickly understandswith the shared disk model is that it relies on an underlying shared nothing storageservice. The shared disk implementation can never be more available than theunderlying shared nothing storage service.Thus it is actually possible to build a shared disk DBMS on top of NDB Cluster.The most important research paper influencing the shared nothing model used in NDBis the paper presented at VLDB 1992 called "Dynamic Data Distribution in aShared-Nothing Multiprocessor Data Store".Obviously it was required to fully understand the ARIES model that was presentedalso in 1992 by a team at IBM. However NDB Cluster actually choose a very differentmodel since we wanted to achieve a logical REDO log coupled with a checkpointmodel that actually changed a few times in NDB Cluster.