Database design optimization - MariaDB (MySQL)

Correct. That is how I get the totals for the current update, the days total, yesterdays total, week total, etc...
This is done for each user, host and team.
 
Splitting them up even more, in VMs even, wouldnt really solve anything. Each VM instance will still take just as long to update the database as the current setup.
May help if the DB engine can't take maximum advantage of available threads. There are two databases in your case but still only one DB engine (the brains) that may or may not be maxing out the available hardware resources. Splitting them into four VMs would ensure that four DB engines are able to work in parallel.
 
Correct. That is how I get the totals for the current update, the days total, yesterdays total, week total, etc...
This is done for each user, host and team.
Why are you updating yesterday's total with today's number?
 
May help if the DB engine can't take maximum advantage of available threads. There are two databases in your case but still only one DB engine (the brains) that may or may not be maxing out the available hardware resources. Splitting them into four VMs would ensure that four DB engines are able to work in parallel.
Err, fixing the workflow is the right solution.
 
May help if the DB engine can't take maximum advantage of available threads. There are two databases in your case but still only one DB engine (the brains) that may or may not be maxing out the available hardware resources. Splitting them into four VMs would ensure that four DB engines are able to work in parallel.

MariaDB (and most other database engines) can already run multiple parallel jobs concurrently without a problem. My dual database design already solves the biggest problem with tables being locked during updates. While that database has its tables locked for updates, the other database is free for the web server to utilize to fetch the data for users visiting the site.

Splitting the database up into separate instances is the fundamental issue I am having here. If I can split the database up into multiple VMs, and it speeds things up, then I can do that with a single instance as well. Which I've already tried splitting the data up into chunks to process concurrently. The problem lies with merging all that data back into a single data set which makes the whole process take 3x longer. If I did this with separate VMs, each with its own database and set of data, all that data will then need to be combined at some point for the overall credits stuff which is the process that takes by far the longest on the setup.

Which means this solution would probably take even longer as it'll have the overhead of multiple VMs, more network traffic for them to communicate with one another, etc...

During normal processing the CPU maxes out at one thread being fully utilized. This is because the actual processing of the data is single threaded. Read what happened when I split this up into multiple chunks. The processor maxed out a little over 4 fully utilized threads during this processing, but the end dropped back down to 1 fully utilized thread to merge all the data.
 
Why are you updating yesterday's total with today's number?

I'm not, todays total gets moved to the "yesterdays" column at the end of the day. The only thing that gets updated is the current update. That gets added to the todays total. The historical data doesn't get updated, it just gets moved to the appropriate location. IE: Todays total gets moved to yesterdays, yesterdays gets moved to 2 days ago, 2 days ago gets moved to 3 days ago, etc... for 28 days.

This allows the site to display:
Recent Update total
Todays total
Yesterdays total
2 days ago total
Last 7 days total
Last 28 days total


(By total, i mean the amount of points earned in that time period)
 
I'm not, todays total gets moved to the "yesterdays" column at the end of the day. The only thing that gets updated is the current update. That gets added to the todays total. The historical data doesn't get updated, it just gets moved to the appropriate location. IE: Todays total gets moved to yesterdays, yesterdays gets moved to 2 days ago, 2 days ago gets moved to 3 days ago, etc... for 28 days.

This allows the site to display:
Recent Update total
Todays total
Yesterdays total
2 days ago total
Last 7 days total
Last 28 days total


(By total, i mean the amount of points earned in that time period)
Err, maybe you should just have date and score table and generate the view based on today's date?

Let me think about it a bit and come up with a workflow that is a lot lighter than what you are doing.

Is there a global boinc team id?
 
Last edited:
Splitting the database up into separate instances is the fundamental issue I am having here. If I can split the database up into multiple VMs, and it speeds things up, then I can do that with a single instance as well. Which I've already tried splitting the data up into chunks to process concurrently. The problem lies with merging all that data back into a single data set which makes the whole process take 3x longer.
How about change the application queries so it knows from user ID which DB instance to send the request to? This way you never have to merge the data.
 
How about change the application queries so it knows from user ID which DB instance to send the request to? This way you never have to merge the data.

The merged data is the total BOINC credits across all the projects. Having the application do this, on the fly every time totaling all these will just move the computation to the web server which would then definitely get overwhelmed with the number of bots that have grown exponentially over the last couple years. Even if I block all those bots (which I try to do, they're tricky) the regular user base checking the site on a daily basis will most like result in the same problem.

The overall boinc credits is stored in a table that gets calculated every time a project has an update. This means this calculation only happens once per database update and the web server just queries that total.
 
what does it mean when a host doesn't have userid field in the xml?
 
so, are we still doing this or not?

Sorry, I recently moved the web server to a new server and I've been tracking down errors/bugs in it for over a week now. Its been my main priority to get the site running on this web server that's hosted in a datacenter, instead of the one that's sitting in my living room.

Its just not a huge priority, mostly because I think it will require a lot of work to implement if if can be improved and realistically the time it takes a database to fully update during an update process doesn't really break anything. So I am just letting it ride for now.

Another main issue is the fact that I only have one database server for now. Which would mean I'd have to do the testing/updates/fixes on the live database and/or create a new database setup on the server so it doesn't disrupt the production database. Not something I really want to do. A new database server is in the works specifically for testing stuff like this.
 
you can have more than 1 db in a db server, you are already running 3. same with webserver. key is to just do delta update as opposed to full.
 
key is to just do delta update as opposed to full.
I asked our "professional" team this exact question and they came back with "no, it can't be done with SQL Server".

God forbid, if they launched a cloud service or something.

"1 month downtime while we move around a few terabytes of data".

"hoooge hoooge data", one of them is fond of saying.
 
I asked our "professional" team this exact question and they came back with "no, it can't be done with SQL Server".

God forbid, if they launched a cloud service or something.

"1 month downtime while we move around a few terabytes of data".

"hoooge hoooge data", one of them is fond of saying.
what? what do you mean it cannot be done with sql server?
 
what? what do you mean it cannot be done with sql server?
How would I know? That's what they told me. I didn't look into it. I'm certainly not going to look into it for them and do their job. I already found them the article showing how to use SQL Server for IIS session persistence. Before I did that, they were like, it's not possible to do round robin load balancing between two IIS web servers due to session persistence issues. Now that it's working fine, not even a thank you or even an acknowledgement that I showed them the light.
 
Become a Patron!
Back
Top