Database design optimization - MariaDB (MySQL)

Skillz · Sep 3, 2025

So I run a decent sized database setup. This database collects Distributed Computing "credits" and other metrics revolving around Distributed Computing projects. It's quick, but I think it can be faster, maybe. So here's what I know about it.

Its 3 databases that stores all the information. One of them is a "static" database that keeps data that doesn't get updated all too frequently. The other two databases is where most all the magic happens. These databases are near identical clones of one another. While one of the database is being updated with new stats the other one is being read by the web server to read the data. When the database being updated is completed, they switch so the other database begins its update and the web server then starts reading from the database that was just updated.

Now there is one particular query that takes ~20 minutes to complete. Its an "overall" ranking that combines everyones scores from all projects then ranks them.

I've tried taking this query and changing the database engine to InnoDB then splitting the query up into 4 chunks. That's 2M+ users it's ranking so each one is roughly 500k users it works on. Then it combines them all back into the table in the proper sort order. This took around or more than 45 minutes to do.

So I switched it back to MYISAM and one single process to process it.

I'm really not sure of how else I can make this so it works faster.

The hardware is Samsung NVME U.2 1.9TB PM9A3 drives. Each database is stored on it's own drive using symbolic links to achieve this. I've tried putting the entire database in RAM (Over 750GB RAM) to see if that would speed up anything but no change.

The CPUs are a pair of Xeon Gold 6154.

I feel this is a CPU limitation and going with a much faster CPU would see some speed up, but I don't know that it would be very drastic.

Grimlakin · Sep 3, 2025

Do you have indexes of the specific data in question created for the tables being queried?

Do you know the job that is running for the report, is it using good syntax to prevent things like full table scans where it is not needed?

I'm thinking a good index and optimized query will help a ton. This is where something like SQL enterprise with it's query analyzer would REALLY help to see what's eating time on your job.

Skillz · Sep 3, 2025

It does use indexing. I inherited this database/setup from someone who got tired of maintaining it. I'm also far from a "database guy" and have learned everything I know so far using AI.

I don't know if it's doing full table scans. How can I figure that out?

Grimlakin · Sep 3, 2025

Skillz said:
It does use indexing. I inherited this database/setup from someone who got tired of maintaining it. I'm also far from a "database guy" and have learned everything I know so far using AI.

I don't know if it's doing full table scans. How can I figure that out?

Honestly I would do a query analyzer to see what your query is doing. Search for a 3rd party tool for the DB software you're using?

Brian_B · Sep 4, 2025

I'm in the same boat with some reporting tools we have for work. One of our jobs has about 20 queries, and they take about 5 minutes each to run.

It only gets run once a month though, so I haven't put a lot of time into optimization. The table is very narrow, just 3 fields, but it fairly large.

Skillz · Sep 4, 2025

Grimlakin said:
Honestly I would do a query analyzer to see what your query is doing. Search for a 3rd party tool for the DB software you're using?

I wouldn't even know where to look, what to look for or understand what it tells me. Haha. Everything I've learned so far, or rather got accomplished so far has been with AI.

I tried using an "optimizer" tool only for it to basically tell me I need to switch all the tables to InnoDB which is definitely the wrong move.

Brian_B · Sep 4, 2025

I mean, there are Masters level courses in this stuff, depending on how in depth you want to get.

https://mariadb.com/docs/server/ha-and-performance/optimization-and-tuning/query-optimizations

There's also the fact that you can optimize a table distinctly from optimizing a query (which is kinda what switching engines is supposed to help you do) -- and server settings such as cache sizes

Skillz · Sep 4, 2025

Most of that stuff is well beyond what I understand.

This is just a hobby/project web site. So spending a whole lot of time on it isn't really something I can do. If I was making money off this, sure I'd take the time to learn everything I possibly can. Even going as far as taking some courses that teaches stuff.

Another thing I noticed. When I built this new database server to replace the old one. I installed the latest version of MariaDB. Despite being a much more powerful server the query in question still took multiple minutes longer to complete. Somewhere around the 25+ minutes range while the old server was doing it between 15 and 20 minutes. I couldn't figure it out.

New server had much faster CPUs, much more RAM but it has Enterprise U.2 NVME drives.

Old server: Dual Intel Xeon E5-2630 CPUs, 128GB RAM, consumer grade M.2 NVME drives (Samsung 970 and some other Samsungs)
New server: Dual Intel Xeon Gold 6154 CPUS, 768GB RAM, enterprise grade U.2 NVME drives Samsung PM9A3.

For the life of me I couldn't figure out why this new server was doing the same exact queries and taking longer.

At first I thought somehow the consumer drivers were capable of doing this much faster. So that's when I put the whole database in a ram drive on the new server (it's around 80GB in size) and ran the query. Still took just a long. So the drives isn't the bottle neck.

After a while, with AI help, we discovered that the newer version of MariaDB. The old server was running MariaDB 10.3 while the new server was running version 10.11. Apparently sometime around 10.4+ the MariaDB optimizer was changed drastically causing the new server to take a good bit longer for the same query. So I had to tell the new server to use the old optimizer version and it instantly started doing the same query a tad bit faster than the older server.

Marees · Sep 5, 2025

I haven't worked with this particular dB/ tech

In general

CPU cycles are taken up to process rows
Memory is taken up to process columns/bytes
Apply filters as early as possible on small tables.
(Range partition or cluster index large tables by the dimension id's used max for filtering)
Normalize & join by id's (& index the join keys) to the max
Avoid situations where you have to join 2 large tables, directly. If necessary create intermediate objects
Execute the query in small chunks at a time to figure out which step is causing the slow down & then optimize it
Always do equijoins. Avoid less than & greater than etc.
Avoid not in clause. Try left join & where clause to filter out as an alternative for NOT IN
Avoid Cartesian joins. Understand the unique keys of the data you are joining
Avoid unnecessary distincts, order by or UNION etc

Icecold · Sep 5, 2025

Marees said:
I haven't worked with this particular dB/ tech

In general

CPU cycles are taken up to process rows

Memory is taken up to process columns/bytes

Apply filters as early as possible on small tables.

(Range partition or cluster index large tables by the dimension id's used max for filtering)

Normalize & join by id's (& index the join keys) to the max

Avoid situations where you have to join 2 large tables, directly. If necessary create intermediate objects

Execute the query in small chunks at a time to figure out which step is causing the slow down & then optimize it

Always do equijoins. Avoid less than & greater than etc.

Avoid not in clause. Try left join & where clause to filter out as an alternative for NOT IN

Avoid Cartesian joins. Understand the unique keys of the data you are joining

Avoid unnecessary distincts, order by or UNION etc

@Skillz These are all great points but as far as point 7 goes, could you try to figure out which specific query or queries in the script that runs is what's taking the longest? If you post that specific query or queries here maybe we'll have some ideas on how to speed it up.

sdifox · Oct 20, 2025

how is the ranking accomplished, a computed score? Is that calculated when the row is updated? use EXPLAIN to figure out how the query is done.

running 3 DBs is not necessarily faster than running 1.

igor_kavinski · Friday at 1:04 AM

Skillz said:
New server: Dual Intel Xeon Gold 6154 CPUS, 768GB RAM, enterprise grade U.2 NVME drives Samsung PM9A3.

How much of that RAM gets consumed during the long running query? If not much, you could create a ramdrive using imdisk and copy everything needed for the query into a DB on the ramdrive. I tried this to benchmark SQL Server with HammerDB and got 10x more performance simply by virtue of the DB file being on the ramdrive.

igor_kavinski · Friday at 1:13 AM

Skillz said:
The CPUs are a pair of Xeon Gold 6154.

For best performance, use Process Lasso to ensure that the DB engine does not spawn any threads on the second CPU. If it does and the threads on different CPUs need to talk to each other, they will get hit with the horrendously high socket to socket latency. That's the simplest way to avoid performance issues until you can figure out how to put the other CPU to good use while bypassing the socket latency hit.

Skillz · Friday at 8:19 AM

sdifox said:
how is the ranking accomplished, a computed score? Is that calculated when the row is updated? use EXPLAIN to figure out how the query is done.

running 3 DBs is not necessarily faster than running 1.

It adds up all the BOINC credits for the user and then ranks them based on others scores.

Running three databases is faster. While one is being updated the other one is being read by the web server. Then it flips when the update process is completed. This ensures the database being updated isn't consuming any other resources by the web server. Each database is stored on its own individual NVME drive.

igor_kavinski said:
How much of that RAM gets consumed during the long running query? If not much, you could create a ramdrive using imdisk and copy everything needed for the query into a DB on the ramdrive. I tried this to benchmark SQL Server with HammerDB and got 10x more performance simply by virtue of the DB file being on the ramdrive.

I've put the whole database in a ramdisk before and it wasn't any faster.

igor_kavinski said:
For best performance, use Process Lasso to ensure that the DB engine does not spawn any threads on the second CPU. If it does and the threads on different CPUs need to talk to each other, they will get hit with the horrendously high socket to socket latency. That's the simplest way to avoid performance issues until you can figure out how to put the other CPU to good use while bypassing the socket latency hit.

Hmm, this is something I didn't think about. Not running Windows, but I could use taskset to pin it to the socket the NVME drives are connected to directly. Not sure is will really speed things up, but it wouldn't hurt either.

I haven't had a chance to work on this as I've been busy with other things. Mostly moving the whole site to a new web server and it's taking a bit more time than I anticipated.

sdifox · Friday at 9:46 AM

it's not an hardware issue, it's the design of the db/queries/stored procs. You can have a table that stores the data needed to feed the webserver, it's a static table that you update after you compute the results.

Are you computing results as a scheduled task? Are you assuming everyone will need to be updated? The boinc data, is it just all in one table? if you are just doing sum by user, project, it should not take that long, especially if it is a stored procedure.

I never saw that kind of time with hardware from late 1990s and multi million row rdbms. And that was on spinning platters.

Skillz · Friday at 10:24 AM

I don't have the level of knowledge on this to answer your questions, but AI has helped me a lot with this setup. Specifically with converting the php files to work with PHP 7+ as it was all coded for PHH5 and wouldnt work in 7. It also tried to help me speed up the process by changing the table engine from MyISAM to InnoDB, so we could do the batch processing in chunks that it designed for me. Basically splitting the work up until 4 smaller chunks and running them concurrently. While this was faster in this specific step, overall it was much, much slower because after it did this it then had to do some kind of combining the the data back into one which took so long the overall process was nearly 3x longer than before.
Based on our previous conversations, I can answer his questions:

1. "Are you computing results as a scheduled task?"

No, it's continuous processing:

Downloads run all day in a loop (checks all projects → 15 min sleep → repeat)

When a project's stats file timestamp changes, it downloads and processes that data immediately

The rollover happens once daily at 8:00 PM EDT (0000 UTC) - this is the scheduled part

2. "Are you assuming everyone will need to be updated?"

Yes, full table processing:

When a project's stats file updates, ALL users/teams/hosts for that project are re-parsed

Data flows: XML file → staging tables (project_userx) → main table (boinc_user)

The system doesn't do delta updates - it's a full reload per project per update

3. "The boinc data, is it just all in one table?"

Yes, single monolithic tables:

boinc_user - all users from all projects

boinc_team - all teams from all projects

boinc_host - all hosts from all projects

Each has a proj column (VARCHAR) to identify which project

Millions of total rows across all projects

4. "If you are just doing sum by user, project, it should not take that long, especially if it is a stored procedure"

No stored procedures - using Perl scripts with direct SQL:

UPDATE statements are constructed in Perl and executed directly

Not using stored procedures or bulk operations

From the code snippets in logs, appears to be row-by-row or small-batch operations

Takes ~1 hour per database to update (with 2 databases alternating)

Additional Context He Should Know:

MyISAM tables (not InnoDB) - table-level locking

Dual database flip-flop - while one updates, website reads from the other

You attempted chunking/splitting queries previously but it didn't help

The rollover process shifts historical columns: metric1 → days1 → days2 etc.

The way this works is kind of simple. Each one of the BOINC projects updates .xml files on their servers daily. Sometimes a couple times per day, sometimes once per day. Its up to them to decide how often they update these files.

There are basically three key files. hosts.gz user.gz and team.gz.

host.gz tracks all of the hosts attached to the project and updates the points each host has earned.
user.gz, similar to host.gz, but does this for each user that runs the project.
team.gz similar to both of the above, but does the updates based on the teams points those users are part of.

My system takes these files, extracts them, then populates the data into the database. It then calculates, based on the new numbers, how many points the user, team, host has earned in this current update and the current day and stores the new total credit.

Since not all teams, users or hosts get new points every day, or every update, their is no real way to determine this without comparing the data between them all, each time. So that's why it just updates them all.

sdifox · Friday at 10:32 AM

good god, this is negating the purpose of rdbms lol. Is there a reason you are downloading every 15 minutes? Can you not just download 1 file at a fix time, call it midnight, and specify the download is for the day and then process it? I am assuming the donwnload includes only whoever is active for that time interval?
You can determine which user/team has update no? You should only need to update the rows with change.
can you post the 3 files somewhere so I can take a look? sanitise the names if you wish.

I like doing db optimisation.

Skillz · Friday at 10:35 AM

sdifox said:
good god, this is negating the purpose of rdbms lol. Is there a reason you are downloading every 15 minutes? Can you not just download 1 file at a fix time, call it midnight, and specify the download is for the day and then process it? I am assuming the donwnload includes only whoever is active for that time interval?
You can determine which user/team has update no? I am assuming the download is just

The download process takes longer than 15 minutes. I'm not sure on the actual time, but it checks all the projects. If the project files are changed/altered since the last check, it downloads the files. If its not, it goes to the next one.

This gives users who, like me, check their stats multiple times a day to see how they're progressing can easily see so on a single page for all their projects. When the project updates it, they'll know within about 1.5 hours of their new totals.

I asked AI to specifically take a look at the one process that takes the longest (the one that takes ~20 minutes) and tell me about it. This is what it said:

Here's The Problem - I Found It!

Looking at the generate_data() subroutine (lines 214-254), this is what's killing performance:

The Bottleneck (Lines 224-244):

sql
INSERT INTO overall_boinc_userx
(select
cpid,
sum(metric1),
sum(metric2),
sum(metric4),
...
from boinc_user
group by cpid)

What This Does:

Full table scan of boinc_user - reads EVERY user from EVERY project (millions of rows)

GROUP BY cpid - groups all users with the same CPID across all projects

Aggregates totals - sums credits/RAC/etc. for each unique CPID

Inserts into temp table - writes aggregated results to overall_boinc_userx

Then at line 250-253, it updates team assignments with another correlated subquery:

sql
update overall_boinc_userx a
set a.t_team = (Select b.team from boinc_user_teams b where a.t_cpid = b.cpid)

Why It's Slow:

No incremental updates - recalculates EVERYTHING from scratch every time

Full table scan on millions of rows with MyISAM table-level locking

GROUP BY on unindexed column (probably no index on cpid)

Correlated subquery for team updates (runs once per user)

Why Chunking Didn't Help:

You can't easily chunk a GROUP BY operation that needs to aggregate across the entire dataset. Even with InnoDB, the query structure is the problem, not the storage engine.

What Your Database Expert Is Getting At:

He's saying: "You shouldn't need to recalculate everything from scratch."

A better design would be:

Static summary table - boinc_combined_user that's already computed

Update only changed records - when a project updates, only recalculate affected CPIDs

Proper indexes on (proj, cpid) or (cpid)

Stored procedure doing set-based operations more efficiently

Does this match what you're experiencing? The script takes ~1 hour because it's doing this massive aggregation across millions of rows every single time it runs.

sdifox · Friday at 10:39 AM

there are no indicis on key fields like user id and project id?

Skillz · Friday at 10:44 AM

Perfect! Now I can give you a clear answer for your database expert.

Answer to His Question:

YES, there ARE indexes on the key fields:

Looking at the index output:

boinc_user table HAS indexes on:

PRIMARY KEY: (proj, id) - composite index

cpid index: (cpid, proj) - composite index (row 15-16 in the output)

Cardinality shows ~3,066,027 unique CPIDs out of ~6,132,055 total users

This means the index exists and is being used

boinc_combined_user table HAS:

PRIMARY KEY: (cpid)

Multiple other indexes for various ranking queries

BUT - Here's The Real Problem:

Even WITH proper indexes, the query design is still fundamentally slow because:

From the script (line 224-244):

sql
INSERT INTO overall_boinc_userx
(select cpid, sum(metric1), sum(metric2), sum(metric4), ...
from boinc_user
group by cpid)

What This Does:

Reads ALL 6+ million rows from boinc_user (every user from every project)

Groups by cpid (~3 million unique CPIDs)

Sums all metrics for each CPID across all projects

Inserts ~3 million aggregated rows into temp table

Then does a correlated subquery update for teams (line 250)

Why It's Slow Even With Indexes:

MyISAM table-level locking - entire boinc_user table locks during the read

Full table aggregation - must process all 6M+ rows every single time

Not incremental - recalculates EVERYTHING from scratch, even if only one project updated

Temporary table operations - 3M+ rows written to temp table, then merged

What Your Expert Is Saying:

He's pointing out that you're doing a full recalculation when you should be doing incremental updates.

Better Design Would Be:

boinc_combined_user is already populated (static)

When PrimeGrid updates → only UPDATE the CPIDs that changed in PrimeGrid

Keep running totals instead of recalculating from scratch

Use something like:

sql
UPDATE boinc_combined_user
SET metric1 = metric1 - old_pgrid_value + new_pgrid_value
WHERE cpid IN (affected_cpids_from_pgrid)

His "late 1990s hardware" comment means: "Even old hardware could SUM a few million rows quickly IF you were doing it right with stored procedures and incremental updates, not full recalculation."

Yes it uses Indexes, another reply from AI to help me answer you properly. I do read what it says and try to make sense of it to make sure it's actually telling the right things and from what I can tell, it is correct.

Database design optimization - MariaDB (MySQL)

FPS Regular

Forum Posting Supreme

FPS Regular

Forum Posting Supreme

Forum Posting Supreme

FPS Regular

Forum Posting Supreme

FPS Regular

Quasi-regular

n00b

n00b

n00b

n00b

FPS Regular

n00b

FPS Regular

1. "Are you computing results as a scheduled task?"​

2. "Are you assuming everyone will need to be updated?"​

3. "The boinc data, is it just all in one table?"​

4. "If you are just doing sum by user, project, it should not take that long, especially if it is a stored procedure"​

Additional Context He Should Know:​

n00b

FPS Regular

Here's The Problem - I Found It!​

The Bottleneck (Lines 224-244):​

What This Does:​

Why It's Slow:​

Why Chunking Didn't Help:​

What Your Database Expert Is Getting At:​

n00b

FPS Regular

Answer to His Question:​

boinc_user table HAS indexes on:​

boinc_combined_user table HAS:​

BUT - Here's The Real Problem:​

From the script (line 224-244):​

What This Does:​

Why It's Slow Even With Indexes:​

What Your Expert Is Saying:​

1. "Are you computing results as a scheduled task?"

2. "Are you assuming everyone will need to be updated?"

3. "The boinc data, is it just all in one table?"

4. "If you are just doing sum by user, project, it should not take that long, especially if it is a stored procedure"

Additional Context He Should Know:

Here's The Problem - I Found It!

The Bottleneck (Lines 224-244):

What This Does:

Why It's Slow:

Why Chunking Didn't Help:

What Your Database Expert Is Getting At:

Answer to His Question:

boinc_user table HAS indexes on:

boinc_combined_user table HAS:

BUT - Here's The Real Problem:

From the script (line 224-244):

What This Does:

Why It's Slow Even With Indexes:

What Your Expert Is Saying: