UPDATE FROM CASE

UPDATE FROM CASE


[ Follow Ups ] [ Post Followup ] [ Forum ]

Posted by dragonlady87_711 () (Ranked on Backgammon (Funcom) Ladder) on April 14, 2002 at 08:15:26:

Case's Ladder
Infrastructure Overhaul 2002
Prepared by Case

Introduction

As many of you are aware, we have been having problems with the site being slow
during peak periods over the last several months. We have been working at a
variety of solutions to help speed up the site. To some degree, we have been
successful. To a large degree, we have not. I have spent hundreds of hours over
the past sixty days figuring out the best way to transition from what we have
to where we need to go.

This short article will attempt to fill you in on those details. I hope that
this will satisfy the curiosity that many of our users have, as well as to show
many of you that we have been working hard on these issues and are taking the
steps needed to correct them.

The Problem

In one word: Growth. Case's Ladder has experienced tremendous growth in the
last six months. Growth that has been faster than in the entire history of us
being here. Our user base, matches played, and tournaments hosted has been
skyrocketing! While this can be considered a good thing, it is hard for a small
company to budget resources to plan for growth that 95% of the time isn't going
to happen. We simply were not ready to handle the surge.

Just for the record, we are talking about a great deal of traffic. We receive
over two million page views a day. Not hits, page views. A page view is a
complete page, including the images and banners. If you're talking about hits,
we do over five million per day! I point this out because some people think
we're talking about 10,000 hits per day or so.

When the ladder first started over six long years ago, we had one server.
Everything ran from this. Then we grew and the site started to slow down. So we
added a second server (named CGI). When players posted a match result, it was
done on the CGI server instead of the main site. This kept the pages loading
fast and matches posting smoothly.

Then we grew some more. The main site was slow again. We took our most heavily
used pages from the web site, and put them onto their own server (WWW2). The
main site was fast again!

After this we launched tournaments. Since they were based on a completely
different set of programs and were being maintained by a new programmer, we put
them on WWW2. That way our programmer could experiment with tournaments without
slowing down match reporting for regular ladder games. We grew some more.

Not much changed from the above layout for a long time. We added some extra
servers to handle additional functions (such as moving Find Player to WWW3).
The general approach was to install faster servers for CGI and WWW2 when things
started to slow down. There is a limit to how often you can do this, and we
recently ran into this.

In general, all of our servers need to talk with the CGI database. It contains
all the player records, match results, Gold histories, staff management tools,
ladder leaderboard information, Hall of Fame, etc. You can't get information on
players without the CGI server being involved in some manner.

Tournaments are in a similar boat. Even though we have three servers helping to
serve up tournament pages, they all rely on the information stored on the
database on WWW2. When WWW2 slows down too much, the other two machines slow
down also. In addition, each tournament machine sometimes needs to get
information from the player databases on CGI.

This all worked fine until we started hitting very high numbers of users.
Throwing more and more powerful servers into these spots is not very cost
effective. I'm sure most of you are familiar with this - to get the latest
greatest computer you pay double the cost of a pretty fast computer that's a
few months old. It makes no sense for us to spend $10,000 for extra powerful
machines for WWW2 and CGI and then have to buy $15,000 machines in six months.
We needed a better solution.

The Solution

So, here's what we have been working on (again, just a fraction of the servers
so you get the idea):

We are shifting to spreading our ladders and tournaments over many machines
instead of only relying on one larger box. We will have a master machine (CGI)
that holds certain key data that is common to all the machines (for example,
you can find out what server the Spades ladder might be stored on by asking the
master).

Each CGI machine will be configured to handle a certain number of leagues. Both
ladders and tournaments will run on these servers. This will allow us to make
sure leagues will run faster and more reliably. For example, companies that we
have contracts with to provide ladders might run on their own machine, or if
someone wants to buy a server just for their own league (People have asked!),
while the rest of the ladders run on other servers.

We'll be able to monitor the usage on each machine and move leagues around. If,
for example, we see that one server is starting to slow down and that one
league on that machine has 10,000 players and is running 500 tournaments a day,
we can simply transfer that league to a different server that has more capacity.

Another advantage to this setup is in regards to maintenance and upkeep. With
the exception of critical failures, we should have the ability to move ladders
from one machine to another when doing upgrades. Instead of shutting
down the entire site we can just move those ladders to a backup machine. In
extreme cases (such as a hard drive failure) only part of the site would be
down. We want to avoid downtime, but I think you'll all agree it's better to
have 75% of our ladders working than them all being unavailable.

The best thing about this design is that it scales very well. If we start
seeing lots of traffic, we can simply add another "cheap" machine and move some
leagues. It's much easier for us to go out and buy a $2,000 machine on short
notice than it is to have to custom order a $10k super machine, then shut down
the site and transfer all the data to the new machine.

Another great advantage to this new design is that we avoid having tournament
servers talking back and forth. Since the tournaments will be hosted on the
same machine as the player database, they will not need to open an external
connection to get player information. This should speed things up greatly.

The Timeline

We've been working on a variety of ways to address our load problems. We
finally have settled on this design. It's the most work, but has the best long
term payoffs. It's going to take a while to get things working right -- we have
already started converting software on our development servers. What we're
talking about requires changing literally hundreds of programs that run the
site.

Thanks to the support of our premium members over the last few months, we have
the resources to tackle a project of this size. This wasn't something we could
even consider doing three months ago - we only had two programmers! Now we have
four and have one more starting next week.

We're hoping to get the guts of this new software working in a test environment
within a week. We're going to be moving all of our hardware to a new hosting
facility in the very near future as well. We plan on rolling out the new design
when we move into the new hosting company (AT&T).

At the new hosting facility, we will have ten times the bandwidth that we
currently have available. This will definitely help to speed up the site. But
long term, this new design is going to be the key towards growing Case's Ladder
in the future.

I thank all of you for your patience and understanding as we have worked on
these issues. It's probably going to take a couple weeks to get all the kinks
out once we have moved, but I am sure you will see that it is well worth it!
I'm very excited about this new design (in fact I'm writing this at 2AM because
I wanted to share our progress with you).

Thanks again for your support, and I hope you feel that your membership money
is being spent wisely. If you want any feedback, please post in the forums!

Thanks,
Case



Follow Ups:



Post a Followup

Ladder:
Ladder Name:
Password:
Password saved if checked

Subject:

Comments:

Optional Link URL:
Link Title:
Optional Image URL:


[ Follow Ups ] [ Post Followup ] [ Forum ]


Copyright Policy

Copyright 1996 - 2024 Case's Ladder / Thulium Software, LLC. All Rights Reserved.