Statistics – Simple Logging Design

Introduction

This article is the first in a series focussed on Statistics, Logging and Graphing web application events and information

When creating a web application it is important to keep track of everything a user does. Some people may think that this is a little over the top but the more information we can gather within a web application the better. It also allows simple and easy moderation practices for those that are moderators within the application.

Pre-Planning

This is by far the most important stage in development. During this stage you are going to need to create proper data structures that in theory will not have to be change (although this is extremely rare.) This means that you are going to create the general database that will handle most of the current up to date information and another section that will be used for logging.

Within the main database we are going to have simple things like a user table, a user settings table, a user profile table and a user information table. On tables that hold general information it is important to add a time stamp column that holds the value of last modified. The user should have a column that tells us when the users profile was created. These values aren’t going to really help us when gathering information for a single user simply because the sample of data that we can compare it to is so small. Fortunately these simple values are excellent when we need to pull up values quickly in our system to display user information.

Single User Sample

The logging database is going to help us create a large sample of a single user. This is where we are going to constantly insert information and never delete or modify the records. In one of these tables we can store required values that will tell us when a user has had one of their comments deleted by a moderator. Within this table we are going to log the moderator’s id, the user’s id, the time stamp that this action was performed on. and a numeric value that reflects on the reason that the comment was removed. With this data we can then count the comments that are still alive and well and count the comments that were deleted. With this information we can divide the comments that are still alive and well by the comments that were deleted thus giving us a ratio that we can work with.

Note: It is a good idea to keep track of numbers like this as well and when the ratio was calculated. This will allow us to track user behaviour.

Multiple User Sample

Given that we could have a ratio of how well the user follows rules when it comes to commenting on things and taking part in discussion we filter out the users that are not contributing to the community/application. We can also sort users by this ratio and from there bury into their statistics, activities and logs.

Because we have logged all of the individual comments from this user we can closely examine the user interactions by graphing comments, removed comment, comments removed for x

Password Recovery {Theory}

Password Recovery is a must have in any web application and you as a software engineer need to make sure that you handle this process properly. There are two methods that I like to use but in this article I will only be using one of them. Before you start programming it is a good idea to go through some of the larger web applications to see what they are doing. You may want to modify your process.

Before we begin I should probably make sure that you know just what you are in for before trying this. You are going to need to have knowledge of a server side language, the ability to send mail on your server, store Cookies and Sessions, along with having a running database with access to user emails as a unique field in the database.

Stage 1

The password recovery system is going to first require the user to input their email and submit it for checking. On this processing step we will do the following.
1. Check that the email follows proper formatting and there are no bad characters.
2. Check the email against the database to make sure that the user does exist and they have an active account.
3. Check the password recovery table to make sure that the user has not had their password reset in the last 15 minutes.

Now that we have checked and bypassed anything that will put a large hold on the password recovery process we can move on to actually storing the information and content that is required to reset the users password.

Because we are going to be resetting the users password we need to make sure that the user supplied us with a proper email. Even if they did not we are going to show a success screen saying that an email has been sent. This way a bot that is entering in randomness to get emails will not be able to find them this way since everything will return true.

Note: Brute Force should be check for on all forum submissions and thus a lock out system should be added but that is a whole other article.

We now need to generate the items that are going to be stored. Because we will be storing information in a cookie and inside of a session. I know guys that store two keys but I just store a time stamp in the database, break it up into segments, add alpahnumeric characters and hash the information. This way I can keep a time stamp in the database and keep the information secretly stored in a session specific to the user.

Note: When storing time only use the generating function once and store the value into a variable so we have the same time when committing to the database.

Now that we have the information set into the appropriate variables we can store the information via insert to a table that holds the user id, key and time stamp. This way we can track our two keys and a time stamp of when they tried to reset their password. If we wish to lock the user out of the system for a certain amount of time we can simply change the key to null and check for this. If the (current time – date stored) < (15 * 60) and there is no key then we can just display an error message.

When everything is properly stored into the database we can send the user an email. This email will contain a link to the password recovery page along with a key which will be the one saved in the cookie. This way we can pass two keys back and know that the only way for this user to be the wrong user is if they have the email account as well.

Stage 2

This page is going to check for a key that is alphanumeric and of a certain length. With this key we are going to check to make sure it is the same as the cookie and that it is in the database. If the user is in the database we can pull the rest of the information including their email from the other table to make sure that we are able to regenerate one of their keys assuming that we used it.

Finally we need to make sure that the generated key matches the one that is stored inside of the session and the date is within the allotted time. From here we need to first delete the record in the database, generate a random password, hash it and store it into the system, and send them a copy of this new password via email.

During this last portion you need to handle the case that the page was accessed with improper values or missing values. eg: if the user does not have the cookie set or the time is out of range we need to handle the case and remove the key from the database so we know that that user won’t be able to try and reset their password for another 15 minutes. You can also bump up the time that is stored in the database depending on what was missing in case of a bad attempt which will allow you to easily lock a user out.

URL Authentication – A New Approach

It is time for web developers and software engineers to make a new approach when checking the validity of URLs and emails that are provided by users. Icann has decided that it is going to allow new suffix’ and the ability to host a website on a domain that has no suffix ie: ‘http://codewithdesign/index.php’. With this change in URLs means that all regular expressions are going to have to not force a suffix or that last decimal as a root URL.

How Will The New Domains Affect Me?

The new domain scheme is going to change how your website validates proper emails and URLs which can be a rather large change on websites that do not have formatting called from one place. Because of this change you will have to overhaul your website/blog/app to support new URLs and emails.

How Should I Go About Making The Change

If you haven’t done so already it is a good idea to make sure that your formatting and checking is coming from one place and will be able to handle the errors accordingly. The first thing that is required is either a functions file or a class file that will support multiple formats of input. This way when something like this changes you only need to update the code once.

Ways To Check URLs And Emails

When working with a URL you can change your regular expression to just check for proper characters and the presence of ‘http://’ or ‘https://’, but there is a much more fun way to check for the new format as well. You should still be using a filter or a regular expression but make sure that your version of PHP is high enough if you are going to use a filter.

Check For An Existing Email

Because of the new URL scheme we are going to need to handle the case that the user is from a website such as ‘http://bearattacksarenotgood/’ This will require us to check for an existing email without the checking of a proper suffix in the email since a user can have the email ‘calebjonasson@bearattacksarenotgood’.

We can easily check for an existing email by using a function called ‘checkdnsrr’ but first we need to split the email name from the URL which will give us ‘calebjonasson’ and ‘bearattacksarenotgood’ which we can accomplish by using the list function will will break apart a string by finding a character.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<?php
//load a variable with an email.
$emailAddress = 'calebjonasson@bearattacksarenotgood';
//split the email by the user name and domain name
list($userName, $domainName) = split('@', $emailAddress);
//use checkdnsrr to validate the domain names existence.
if(checkdnsrr($domainName, 'MX'))
{
    //return value on success.
    return true;
}else{
    //return value on failure.
    return false;
}
 
?>

Pinging A Server

It is possible to ping a server through PHP and it is also possible to just use a cURL to request the page information and check and the response headers. The first thing I will show you is how to send a request which will allow you to check for ping of the data server in question.

My prefered method of doing so is to install net ping onto the server. This is the best solution that I have found on the web to this day and is very simple to use. Here is a tutorial from Code Diesel.

Redundant Data Class in PHP {Theory}

As a server side programmer you are going to be spending a lot of time working with data that is pulled from the database and in order to make sure that you do not have any loose ends it is a good idea to handle this situation with a class. This article is not going to cover how to code a data class nor is it going to help you create a specific data class. This article is here to show you how to make your data redundant and error proof.

Pulling all of your information

When pulling the information off the get go you are going to want to do one thing and one thing only: query all of the data that you can use in the class. The query is going to take place inside of a function and will most likely be used every time that you are in need of the data in the database. Once the query to obtain this information has taken place then you are going to either have results or no results and the easy way to check is simply by checking the number of rows returned.

Say we were trying to access user information for a user page. It would be a good idea to query and make sure that all of the information is available. In this query we are most likely going to be joining up to multiple tables because having all of the user information stored in one table would create a fair amount of overhead and would increase the time it took to go through the table. For this reason we break things apart.

An Example Of Database Tables

A well structured database table is going to be pulling from an auto incrementing ID. This auto incrementing id is going to be found on every table but since we can not just assume that if we insert a record into tbl_user and tbl_user_profile at the same time we are going to get ID’s that match properly. Because this is a flawed way of looking at the table set up we are going to use the auto incremental ID from the tbl_user and plug this into a new column on the other user tables. This way we can simply join the tbl_user.user_id to tbl_user_profile.user_nid. Now that we have a relationship in the database that will work and is pretty redundant in theory we are still going to have to deal with the situation of a table not being created properly upon user registration and thus we are lead to…

Enforcing Existing Tables

Remember back up at the top when I was talking about the query all function either working or not working? Well this function upon returning zero rows will tell us one of two things. (Assuming the SQL was written properly.) The first thing that it will tell us is that this user does not exist, and the second thing is that the user is missing a table. This could mean that something was deleted, or maybe the table was recovered in backup at a later date and there was a record missing. Either way we know that by checking the user table for the users existence we are able to confirm that we are dealing with missing data and this is where the checking functions come into place.

In a well written data handling class there will be functions that exist to pull results from each of the individual tables. These functions are excellent for checking that a user does exist in the following table and are a great way to quickly and easily pull sections of content based on the user which are handy when loading content on an interval via AJAX. But now we are getting a little off topic. Creating a function that can pull information and return a number as a status and creating another function that will insert defaults into the table is an excellent way to make sure that you do not lose data and that all of the data is being pulled properly.

Recap

When using the class you are most likely just going to need to get the information which means that you will be using a function that behaves like the queryAll that I was talking about. If the user does not exist then you can simply return false.

The next step is to create functions that will check individual tables based on that original tbl_user.user_id (or whatever yours is) starting with the initial tbl_user which the rest of the tables are based on. This will tell you if it was created in the first place or not.

If we have gotten far enough to know that the user does exist and the user simply doesn’t have a record in one of the tables then we can query each of the tables and find out where the record is missing. Now that we know the table we can simply insert blank data into the table and maybe send a notification to the user that they may need to update a certain part of their profile.

Knowing when we have an error in table creation

Through this data class it is just a matter of adding in an error log message to let the administrators tell if the application has a bug in it but this shouldn’t be a problem if you are handling proper inserts and updates through SQL and checking for an affected row upon creation.

cURL 403 Error Returning

The other day at work we ran into an issue where the server would return a 403 error page when retrieving page information from a cURL call. After searching around the web for a while thinking that we had a server permission issue on our hands it ended up just being a PHP problem.

In order to make a cURL request from your own server you must first make sure that the session has been destroyed prior to and cURL commands. This is because your server cannot have two pages that can access sessions up at the same time and the primary file that you are working from is going to lock the secondary file that you are trying to bring in.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
<?php
session_start();
 
//authentication code.
 
//destroy session first.
session_destroy();
 
//cURL code.
$ch_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $fields_string);
$response = curl_exec($ch)
curl_close($ch);
 
//now we can start the session again
session_start();
 
//do some other stuff...

My only guess as to why PHP does this is to protect itself from breaking sessions with multiple page access which is a pretty good security issue to have in place. Better error reporting would have been nice though.

PHP: Array Of Bad Words

When creating applications that are going to be used by hundreds of thousands; it is important to make sure that you have the proper facilities in place to handle curse words that are entered in by users. This can be done by checking an array of bad words.

The code is simply…

1
2
3
4
5
<?php
//foul language array
$this->badWords = array('word1', 'word2','word3');
//Now you just need to go through your string and make comparisons.
?>

Rather then posting the code directly onto the blog I would rather have a site that is safe for all readers and not be indexed with foul language and racial slurs and thus is why I am offering the array via a text file within a compressed zip.

PHP Automatically Include Classes

Pulling In Content

Php has a nice small feature that allows us to automatically load classes that we use in our web application. This will check the designated location’s files and pull in any classes that we reference inside of the application.

function __autoload($class) {
    include $class . '.php';
}
 
$car = new carObject();
$house = new houseObject();
$boat = new boatObject();

Pull Classes From A Path

It is nice to keep all application classes in the same directory. Here is a quick fix to the above code to import everything from the classes folder on the server.

function __autoload($class) {
    include '/class/'.$class . '.php';
}
 
$car = new carObject();
$house = new houseObject();
$boat = new boatObject();

This is a small quick fix that works because the include function works with a simple string.