The Hidden Dangers of Crafting Your Own Regular Expressions for Input Validation

The Hidden Dangers of Crafting Your Own Regular Expressions for Input Validation


images/the-hidden-dangers-of-crafting-your-own-regular-expressions-for-input-validation.webp

In the realm of software development, ensuring the security of your application is paramount. One common practice to secure applications is through input validation, which ensures that only properly formed data can interact with your system. Regular expressions, or regex, are a powerful tool for this purpose, capable of matching patterns within strings with precision and efficiency. However, the journey to crafting secure and reliable regular expressions is fraught with challenges, especially when they are self-made. In this blog post, we delve into the dangers of relying on self-crafted regular expressions for input validation, using Django, a popular Python web framework, as our guiding example.

Regex: A Double-Edged Sword

Regular expressions are like a Swiss Army knife for string manipulation, capable of performing complex validations and searches within text. Yet, their power comes with significant risk. An incorrectly written regex can introduce vulnerabilities, leading to security breaches such as denial of service (DoS) attacks, data leakage, and unauthorized access.

Complexity and Maintenance

One of the primary challenges with regex is their complexity. A seemingly simple task, like email validation, can result in a regex pattern that is difficult to read and maintain. Over time, as the application evolves, maintaining these expressions becomes a Herculean task, especially if the original developer is no longer available.

For example, consider this Django model field validation using a custom regex for email validation:

from django.db import models
import re

class User(models.Model):
    email = models.CharField(max_length=255)

    def clean(self):
        email_regex = re.compile(r"^\w+([\.-]?\w+)*@\w+([\.-]?\w+)*(\.\w{2,3})+$")
        if not email_regex.match(self.email):
            raise ValidationError("Invalid email format")

While this regex might seem adequate, it fails to account for many valid email addresses, leading to false negatives. Moreover, it’s not immediately clear to a new developer what patterns this regex is supposed to allow or block, increasing the risk of errors during future modifications.

Security Vulnerabilities

Self-crafted regular expressions can inadvertently open up attack vectors. For instance, poorly designed regex can be susceptible to what’s known as Regular Expression Denial of Service (ReDoS) attacks. These occur when an attacker provides a specially crafted input that takes a long time to evaluate, effectively halting the application’s execution.

To illustrate, consider a regex designed to match nested quotations within a string:

nested_quotes_regex = re.compile(r"'(?:[^'\\]|\\.)*'")

An attacker could exploit this by sending a string that causes the regex engine to consume excessive time and resources, potentially crashing the server as regular expressions are very powerful, but their power doesn’t come without a hidden cost.

Leveraging Django’s Built-in Validators

One way to mitigate these risks is by utilizing Django’s built-in validators, which are thoroughly tested and optimized for performance and security. Django offers a range of validators that can be easily applied to model fields, forms, and more, ensuring robust input validation without the need for complex, custom regex.

EmailField

For example, instead of the custom email validator shown earlier, Django’s EmailField can be used:

from django.db import models

class User(models.Model):
    email = models.EmailField()

Django’s EmailField automatically uses a well-tested regular expression behind the scenes to validate email addresses, covering a wide range of valid email formats while blocking invalid ones. This not only simplifies the code but also enhances security and maintainability.

URLField

Django’s URLField is another powerful ally in the quest for secure input validation. This field type uses a comprehensive validator to ensure that only valid URLs are accepted, safeguarding your application against a variety of attacks that could stem from maliciously crafted URLs. Utilizing URLField eliminates the need for custom regex, thereby reducing the risk of security vulnerabilities.

from django.db import models

class MyModel(models.Model):
    website = models.URLField(max_length=200)

This simplicity ensures that developers don’t have to reinvent the wheel, focusing instead on what matters most — building secure and effective web applications.

Secure Password Validation with Password Validators

Password security is critical to protecting user data and preventing unauthorized access. Django offers customizable password validators that can be used to enforce strong password policies, such as minimum length, complexity, and common password checks. By configuring these validators in Django’s settings, developers can implement robust password requirements without resorting to custom regex expressions that may not cover all security bases.

AUTH_PASSWORD_VALIDATORS = [
    {
        'NAME': 'django.contrib.auth.password_validation.UserAttributeSimilarityValidator',
    },
    {
        'NAME': 'django.contrib.auth.password_validation.MinimumLengthValidator',
        'OPTIONS': {
            'min_length': 9,
        },
    },
    {
        'NAME': 'django.contrib.auth.password_validation.CommonPasswordValidator',
    },
    {
        'NAME': 'django.contrib.auth.password_validation.NumericPasswordValidator',
    },
]

Integrating Validators for Custom Fields

Beyond the built-in field validators, Django allows for the integration of custom validators for any model field, providing a flexible way to apply specific validation rules without directly writing regex. These custom validators can be used to enforce business logic, such as validating an ISBN number, a credit card number, or any other pattern that requires more than the basic validators offer.

from django.core.exceptions import ValidationError
from django.utils.translation import gettext_lazy as _

def validate_even(value):
    if value % 2 != 0:
        raise ValidationError(
            _('%(value)s is not an even number'),
            params={'value': value},
        )

class MyModel(models.Model):
    even_field = models.IntegerField(validators=[validate_even])

Further Reading

The following are links to Django and Django Rest Framework documentation containing custom fields for Django models, forms, and REST APIs.

  • Model fields - Fields that can be configured for Django models.
  • Form fields - Fields that can be configured for Django forms.
  • DRF Serializers - Serializers for using Django Rest Framework for API validation

Conclusion

While crafting your own regular expressions for input validation might seem like a flexible solution, it’s fraught with potential pitfalls. The complexity, difficulty in maintenance, and security vulnerabilities pose significant risks. In frameworks like Django, leveraging built-in validators offers a safer, more reliable path. These validators are designed with security and efficiency in mind, allowing developers to focus on building features without compromising on security.

In conclusion, before reaching for regex to solve a validation problem, consider whether a built-in solution exists that meets your needs. It’s often the safer, more efficient choice. Remember, in software development, simplicity and reliability are key to security.


About PullRequest

HackerOne PullRequest is a platform for code review, built for teams of all sizes. We have a network of expert engineers enhanced by AI, to help you ship secure code, faster.

Learn more about PullRequest

PullRequest headshot
by PullRequest

April 3, 2024