Database design optimization - MariaDB (MySQL)

Am I understanding this right? Boinc_User table has ProjectID and UserID as PK? Why not just UserID and add index on ProjectID? Typically you try to normalise the data so that you only update what you need to update. so User in one table, Project in another, the daily transactions you download is a separate one. You update User and Project if there are changes.

like I said, provide the db as an export through sqldump and I can take a look at it and see what can be done. That gives me the schema without the data, you just tell me approximatly how many rows there are in each table.

something like this I think, I googled.

mysqldump -u [username] -p [database_name] > [output_filename].sql
 
Not sure what you are asking here, so I had to ask AI again. Haha.

YES! Your database expert is 100% correct and has identified the fundamental design flaw.


Current Design Problem:


Right now, boinc_user has:


  • PRIMARY KEY: (proj, id) - composite key

This means the same user appears multiple times in the table:




+-------+--------+----------+--------+
| proj | id | nick | cpid |
+-------+--------+----------+--------+
| pgrid | 12345 | Skillz | abc123 |
| seti | 12345 | Skillz | abc123 | <- SAME person, different project
| mine | 67890 | Skillz | abc123 | <- SAME person, different ID on this project
+-------+--------+----------+--------+

The Problems:


  1. User data is duplicated across every project they participate in
  2. Same user can have different IDs on different projects (that's why CPID exists)
  3. Can't directly update "user Skillz" - have to update each (proj, id) combination separately
  4. To get cross-project totals, must GROUP BY cpid and scan millions of duplicate user records



What He's Recommending (Proper Normalization):


Design 1: Normalized Structure




sql
-- Table 1: Users (one row per unique user)
CREATE TABLE users (
cpid CHAR(32) PRIMARY KEY,
nick VARCHAR(250),
country VARCHAR(50),
create_date INT,
-- other user attributes
);

-- Table 2: Projects (one row per project)
CREATE TABLE projects (
proj_id CHAR(6) PRIMARY KEY,
name VARCHAR(100),
url VARCHAR(255),
-- other project attributes
);

-- Table 3: User-Project Stats (daily transaction data)
CREATE TABLE user_project_stats (
cpid CHAR(32),
proj_id CHAR(6),
stat_date DATE,
total_credit DOUBLE,
rac DOUBLE,
project_user_id INT, -- their ID on that specific project
-- other daily metrics
PRIMARY KEY (cpid, proj_id, stat_date),
INDEX (proj_id, stat_date),
INDEX (cpid)
);

Benefits:


  1. User data stored ONCE - update "Skillz" in one place
  2. Only stats are repeated (by date/project) - this is transactional data
  3. Incremental updates: When PrimeGrid updates:



sql
-- Only update the rows that changed
UPDATE user_project_stats
SET total_credit = NEW_VALUE, rac = NEW_VALUE
WHERE proj_id = 'pgrid' AND stat_date = '2025-10-24'
AND cpid IN (list_of_changed_users);

  1. Cross-project totals become easy:



sql
-- Fast aggregation (only sums latest stats, not all history)
SELECT cpid, SUM(total_credit)
FROM user_project_stats
WHERE stat_date = CURDATE()
GROUP BY cpid;



Why Current Design Is Slow:


Your current overallboinc.pl script does:




sql
SELECT cpid, sum(metric1), sum(metric2)...
FROM boinc_user
GROUP BY cpid

This:


  • ✗ Reads 6 million+ rows (every user×project combination)
  • ✗ Most of that data hasn't changed since last run
  • Recalculates everything from scratch
  • ✗ Takes ~1 hour because it's processing millions of unchanged rows

With normalized design:




sql
-- Get today's stats for users in changed projects
-- Only processes NEW data
SELECT cpid, SUM(total_credit)
FROM user_project_stats
WHERE stat_date = CURDATE()
AND proj_id IN ('pgrid', 'mine') -- only projects that updated today
GROUP BY cpid



His Key Point: "You only update what you need to update"


Current Design:


  • PrimeGrid updates with 50,000 users → must reprocess ALL 6M rows to recalculate totals

Normalized Design:


  • PrimeGrid updates with 50,000 users → only INSERT/UPDATE those 50,000 rows in user_project_stats
  • Overall totals can be maintained incrementally or recalculated from current day's stats only



Summary:


Your expert is saying: "The problem isn't the queries or indexes - it's that you've denormalized the data in a way that forces you to recalculate everything repeatedly instead of doing incremental updates."


He's absolutely right. This is a database design issue that no amount of query optimization or better hardware will fix.


The real fix: Redesign to separate:


  1. Static user data (stored once)
  2. Static project data (stored once)
  3. Transactional stats data (daily updates only)

This would make your updates orders of magnitude faster because you'd only process what actually changed.

And I'm still a little lost on what this means. Let me read it over a few more times, maybe it'll get it. Give me a few minutes to figure this out please.
 
You need a User table that has a say autonumber as the PK (LocalUserID for example), the the Boinc User table, then Project table. Add LocalUserID to the BoincUser table, then you need another table, let's call it LocalUserProject where you list the LocalUserID with the ProjectID. That way your references are taken care of. These are probably low change tables, so they form the basis of your sums.
 
Oh I need to wait on my reply. You keep editing your post when I reply. HAHA!

Doing a myslqdump of one of the databases will be over 65GB in size. That's not really feasible for me to upload that anywhere.

So what are you asking for specifically? The database structure without all the data? As this will be smaller, but that command you gave will be the whole database + data = huge database file.
 
No, just exports the db without the data. I don't know mariadb's particular db design export syntax

I'll take a look later when I have some time.
 
I sent you a PM with it.

Its the same as MySQL. You just use the --no-data flag with the mysqldump command. Which I gave you in a PM to download. Its tiny, less than 1MB lol

Much better than 65GB+.
 
that is a lot of tables. the scripts you generated didn't work as an import in mssql cus the syntax was not correct so I just installed mariadb and looked at some of the tables. In general, text make poor keys, so for project you should have a ProjectID that is integer and is the PK.
 
Become a Patron!
Back
Top