Database design optimization - MariaDB (MySQL)

Skillz

FPS Regular
Staff member
AT Refugee
Joined
Jun 4, 2019
Messages
1,472
Points
113
So I run a decent sized database setup. This database collects Distributed Computing "credits" and other metrics revolving around Distributed Computing projects. It's quick, but I think it can be faster, maybe. So here's what I know about it.


Its 3 databases that stores all the information. One of them is a "static" database that keeps data that doesn't get updated all too frequently. The other two databases is where most all the magic happens. These databases are near identical clones of one another. While one of the database is being updated with new stats the other one is being read by the web server to read the data. When the database being updated is completed, they switch so the other database begins its update and the web server then starts reading from the database that was just updated.

Now there is one particular query that takes ~20 minutes to complete. Its an "overall" ranking that combines everyones scores from all projects then ranks them.

I've tried taking this query and changing the database engine to InnoDB then splitting the query up into 4 chunks. That's 2M+ users it's ranking so each one is roughly 500k users it works on. Then it combines them all back into the table in the proper sort order. This took around or more than 45 minutes to do.

So I switched it back to MYISAM and one single process to process it.

I'm really not sure of how else I can make this so it works faster.

The hardware is Samsung NVME U.2 1.9TB PM9A3 drives. Each database is stored on it's own drive using symbolic links to achieve this. I've tried putting the entire database in RAM (Over 750GB RAM) to see if that would speed up anything but no change.

The CPUs are a pair of Xeon Gold 6154.

I feel this is a CPU limitation and going with a much faster CPU would see some speed up, but I don't know that it would be very drastic.
 
Do you have indexes of the specific data in question created for the tables being queried?

Do you know the job that is running for the report, is it using good syntax to prevent things like full table scans where it is not needed?

I'm thinking a good index and optimized query will help a ton. This is where something like SQL enterprise with it's query analyzer would REALLY help to see what's eating time on your job.
 
It does use indexing. I inherited this database/setup from someone who got tired of maintaining it. I'm also far from a "database guy" and have learned everything I know so far using AI.

I don't know if it's doing full table scans. How can I figure that out?
 
It does use indexing. I inherited this database/setup from someone who got tired of maintaining it. I'm also far from a "database guy" and have learned everything I know so far using AI.

I don't know if it's doing full table scans. How can I figure that out?
Honestly I would do a query analyzer to see what your query is doing. Search for a 3rd party tool for the DB software you're using?
 
I'm in the same boat with some reporting tools we have for work. One of our jobs has about 20 queries, and they take about 5 minutes each to run.

It only gets run once a month though, so I haven't put a lot of time into optimization. The table is very narrow, just 3 fields, but it fairly large.
 
Honestly I would do a query analyzer to see what your query is doing. Search for a 3rd party tool for the DB software you're using?

I wouldn't even know where to look, what to look for or understand what it tells me. Haha. Everything I've learned so far, or rather got accomplished so far has been with AI.

I tried using an "optimizer" tool only for it to basically tell me I need to switch all the tables to InnoDB which is definitely the wrong move.
 
Most of that stuff is well beyond what I understand.

This is just a hobby/project web site. So spending a whole lot of time on it isn't really something I can do. If I was making money off this, sure I'd take the time to learn everything I possibly can. Even going as far as taking some courses that teaches stuff.

Another thing I noticed. When I built this new database server to replace the old one. I installed the latest version of MariaDB. Despite being a much more powerful server the query in question still took multiple minutes longer to complete. Somewhere around the 25+ minutes range while the old server was doing it between 15 and 20 minutes. I couldn't figure it out.

New server had much faster CPUs, much more RAM but it has Enterprise U.2 NVME drives.

Old server: Dual Intel Xeon E5-2630 CPUs, 128GB RAM, consumer grade M.2 NVME drives (Samsung 970 and some other Samsungs)
New server: Dual Intel Xeon Gold 6154 CPUS, 768GB RAM, enterprise grade U.2 NVME drives Samsung PM9A3.

For the life of me I couldn't figure out why this new server was doing the same exact queries and taking longer.

At first I thought somehow the consumer drivers were capable of doing this much faster. So that's when I put the whole database in a ram drive on the new server (it's around 80GB in size) and ran the query. Still took just a long. So the drives isn't the bottle neck.

After a while, with AI help, we discovered that the newer version of MariaDB. The old server was running MariaDB 10.3 while the new server was running version 10.11. Apparently sometime around 10.4+ the MariaDB optimizer was changed drastically causing the new server to take a good bit longer for the same query. So I had to tell the new server to use the old optimizer version and it instantly started doing the same query a tad bit faster than the older server.
 
I haven't worked with this particular dB/ tech

In general
  1. CPU cycles are taken up to process rows
  2. Memory is taken up to process columns/bytes
  3. Apply filters as early as possible on small tables.
  4. (Range partition or cluster index large tables by the dimension id's used max for filtering)
  5. Normalize & join by id's (& index the join keys) to the max
  6. Avoid situations where you have to join 2 large tables, directly. If necessary create intermediate objects
  7. Execute the query in small chunks at a time to figure out which step is causing the slow down & then optimize it
  8. Always do equijoins. Avoid less than & greater than etc.
  9. Avoid not in clause. Try left join & where clause to filter out as an alternative for NOT IN
  10. Avoid Cartesian joins. Understand the unique keys of the data you are joining
  11. Avoid unnecessary distincts, order by or UNION etc
 
I haven't worked with this particular dB/ tech

In general
  1. CPU cycles are taken up to process rows
  2. Memory is taken up to process columns/bytes
  3. Apply filters as early as possible on small tables.
  4. (Range partition or cluster index large tables by the dimension id's used max for filtering)
  5. Normalize & join by id's (& index the join keys) to the max
  6. Avoid situations where you have to join 2 large tables, directly. If necessary create intermediate objects
  7. Execute the query in small chunks at a time to figure out which step is causing the slow down & then optimize it
  8. Always do equijoins. Avoid less than & greater than etc.
  9. Avoid not in clause. Try left join & where clause to filter out as an alternative for NOT IN
  10. Avoid Cartesian joins. Understand the unique keys of the data you are joining
  11. Avoid unnecessary distincts, order by or UNION etc
@Skillz These are all great points but as far as point 7 goes, could you try to figure out which specific query or queries in the script that runs is what's taking the longest? If you post that specific query or queries here maybe we'll have some ideas on how to speed it up.
 
Become a Patron!
Back
Top