Regular Expressions – Data Validation In PHP

Hi! I am back from a serious headache and today am going to talk about regular expressions. The idea is to validate data submitted by the users of our online store. This will help us stop taking in invalid emails, or stuff like that. Also, using a new page created for feedback submission, we would like to send the message to respective departments depending on which  words our customers mention in their messages. Let us get started.

Here is my feedback page:


Basically, we don’t want crazy people submitting fake data just to annoy us and to avoid that, we use regular expression. This is a huge subject and I will not exhaust it, instead, I will touch on the basics so that you can go out there for more details.

postfeedback.php file:

 #Here I define some variables and then check to make sure
 #that they are set in the _POST array.
 $name = '';
 $email = '';
 $feedback = '';

   $name = trim($_POST['name']); //use addslashes
   $email = trim($_POST['email']); //use addslashes
   $feedback = trim($_POST['feedback']); //use addslashes

 #This will return an array of two items (an email is separated
 #by an '@' symbol and so the two sides are stored in our array
 $email_array = explode('@', $email);

From the above code, you might have noticed certain new things: trim, addslashes and explode. When trying to store data in a database, you want to do some clean-up to remove any dangerous characters which might be interpreted by the database as commands. To eliminate such uncertainties, you use trim() and addslashes(). explode breaks the string given using the delimiter (‘@’) given and stores the results in an array.

  #my intention here was to put together some feedback and then email
  #them to the respective departments. So, let us get some variables
  #ready here:
  $toaddress = ''; //default
  if(strstr($feedback, 'shop')){    //if 'shop' is in $feedback
      $toaddress = '';
  }else if(strstr($feedback, 'delivery')){ //if 'delivery' is in..
      $toaddress = '';
  }else if(strstr($feedback, 'bill')){
      $toaddress = '';
  #Even more variables here.
  if(strlen($email) < 6){    //strlen() returns length of string
     echo 'That email address is invalid';

  $subject = 'feedback from website';

  $mailcontent = 'Customer name: '.$name .'\n'.
                 'Customer email: '.$email. '\n'.
                 'Customer comments:\n '.$feedback.'\n';

  $fromaddress = 'From:';

  //invoke mail() function to send the email
   mail($toaddress, $subject, $mailcontent, $fromaddress); //use mail()


As shown above, you might have thought to yourself that there might be better ways to doing the validation! You are right and it is always good to try out different approaches before settling on one.

Other String Manipulation functions:

   strstr(haystack, needle); #case sensitive
   strchr();                 #find character match
   strrchr(haystack, needle);#from the end
   stristr(haystack, needle); #not case sensitive

   strpos(haystack, needle)  #returns position (numerical)
   strrpos(haystack, needle) #returns position of
                             #the last occurrence of the needle

   $sample = 'Hello world';
   strpos($sample, 'o');    #returns 4
   strpos($sample, 'o', 5); #returns 7 (3rd param is the starting point)

   $result = strpos($sample, 'H');
   if($result === false){
     echo 'Not found';
     echo 'Found at position: '.$result;

Let us assume your customers were really outraged by your service and they just unleashed really mean feedback. To protect your employees, you want to replace every cuss word with some other word. Let us do that!

  #--------------------string replacing ---------------------------#
  $offcolor = array('fr***', 'nkdhs', 'k*-^', 'cuss');

  #Now we have our array of cuss words to replace; ($offcolor)
  #So, let us use str_replace() function to achieve this goal
  $feedback = str_replace($offcolor, '%!&@', $feedback);

  #What about substrings? Here is how to replace substrings
  #we have a $test string, we then replace whatever is at position
  # indicated by the 3rd parameter (-1) with 'X'
  substr_replace(string, replacement, start) #replaces a substring
  $test = substr_replace($test, 'X', -1);

  #More string manipulation functions and examples:
  $test = 'Your customers are the best in the world';
  substr($test, 1); //our customers are the best in the world
  substr($test, -10); // the world
  substr($test, 0, 4); // Your
  substr($test, -13); //returns between 4th
                      // character and 13th character-to-last

  strcmp(str1, str2);     #compare strings
  strcasecmp(str1, str2); #not case sensitive
  strnatcmp(str1, str2);  #case-sensitive
  strncasecmp(str1, str2, len); #not case-sensitive


So far, we have not really used any regular expression functions yet!
I am going to make this as simple as possible to avoid confusion because Regular Expressions are not very easy to understand yet they are so useful!

   #Regular Expressions - Smart Form
   [a-zA-Z] #both lower and uppercase letters of alphabets
   [a-z]    #only lower-case alphabetical letters
   [aeiou]  #only vowels
   [^a-z]   #cannot be a member of a set

   [[:alnum:]] #------ Alphanumeric characters
   [[:alpha:]] #------ Alphabetic characters
   [[:lower:]] #------ Lowercase letters
   [[:upper:]] #------ Uppercase letters
   [[:digit:]] #------ Decimal digits
   [[:xdigit:]]#------ Hexadecimal digits
   [[:punct:]] #------ Punctuation
   [[:blank:]] #------ Tabs and spaces
   [[:space:]] #------ Whitespace characters
   [[:cntrl:]] #------ Control characters
   [[:print:]] #------ All printable characters
   [[:graph:]] #------ All printable characters except for space

   [[:alnum:]]+ #------ means --> at least one alphanumeric character

   (very)*large ------ matches: 'large', 'very large',son on

   #counted subexpressions
   (very){1, 3} ------ matches -> 'very', 'very very', 'very very very';

   ^crazy #----- matches 'crazy' at the start of a string
   com$   #----- matches 'com' at the end of a string
   ^[a-b]$ #---- matches a single character from a and z

    com|edu|net #----- matches com, edu and net

Let us take a quick look at what characters mean when used outside and inside the brackets:

#-----------------Outside brackets------------------------|
 # Character | Meaning                      |
 #     \     | Escape character             |
 #     ^     | Match at start of string     |
 #     $     | Match at the end of string   |
 #     .     | Match any character except \n|
 #     |     | Start of branching (OR)      |
 #     (     | Start subpattern             |
 #     )     | End of subpattern            |
 #     *     | Repeat 0 or more times       |
 #     +     | Repeat 1 or more times       |
 #     {     | Start min/max quantifier     |
 #     }     | End min/max quantifier       |
 #     ?     | Mark a subpatter as optional |
 #----------------Inside brackets -------------------------|
 #     \     | Escape character             |
 #     ^     | NOT - if used in initial pos |
 #     -     | Specify character ranging    |


We are almost done here. Now that we know some regular expressions, we should be able to use them to validate our data: particularly the email and the message- searching for keywords then determining where to send the message!

#we check if the $email passes the test here.This will probably
#not work for all kinds of emails.
if(preg_match('~^[a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z]{2,4}$~i', $email)){
   echo '<p>That is not a valid email address!</p>'.
        '<p>Please try again by visiting the previous page.</p>';

 $toaddress = ''; //default value
 #we are using the $feedback variable from the start of our post above

 if (preg_match(“shop|service|retail”, $feedback)){ #use | (OR)
    $toaddress = '';
 }else if(preg_match("deliver|fulfil", $feedback)){ #use | (OR)
    $toaddress = '';
 else if(preg_match(“bill|account”, $feedback)){    #use | (OR)
    $toaddress = '';

 #Always use string functions instead of regex here. This is
 #just an example to demonstrate that you can also use regex
 if(preg_match('bigcustomer\.com', $feedback)){
    $toaddress = '';

Finally, we will look at splitting strings using Regular Exp.


    #Splitting strings using Regular expressions
    $address = “”;
    $arr = split (“\.|@”, $address);
    while (list($key, $value) = each ($arr)) {
       echo “<br />”.$value;
    #Results ---------------------------------------

That is it for today. I hope you found this helpful in a way. If you have any questions, please let me know so that I can answer them. Next time we will tackle functions and code reuse. If you  have spotted any errors, please let me know through the comments section and I will really appreciate.


I know you want to say something, say it!

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s