Developers, Please encode your URLs

If you like it, put a # on it!

·

5 min read

Uniform Resource Locators (URLs) are a funny thing. They seem so simple, but yet they have so many small complex rules to them, that all of a sudden, when you try to explain what a url is and the correct way to parse one, you find yourself in a sea of complexity.

The inner workings of URLs

Let’s start with the basics. A URL identifies a resource and tells you which method you should use to access it. An example of this is:

https://www.appsflyer.com/why-appsflyer/

Here the resource is the https://appsflyer.com/why-appsflyer page The method used is https which means secure hyper-text transfer protocol.

To take this to the specifications, there are two main RFCs that govern our URL usage. Even though RFC3986 is the update to RFC1738, both schemes are still widely used today by applications.

**RFC3986**

<scheme>//<user>:@<host>:<port>/<url-path>

**RFC1738**

<scheme>//<user>:<password>@<host>:<port>/<url-path>

Note, that in the updated RFC3986, using a password in the URL is deprecated — this will be important later on!

A primer on URL encoding

Web browsers request pages from web servers by using URLs. URLs can only be sent over the internet using the ASCII character-set, however, URLs often contain characters outside of this set, therefore the URL has to be encoded into a valid format. URL encoding replaces unsafe characters with a percentage sign (%) followed by two hexadecimal digits.

As an example, URLs cannot contain spaces.

Therefore a space in a URL is encoded to %20.

So the URL appsflyer.com/hello everyone”

would be encoded into appsflyer.com/hello%20everyone

When the web server retrieves this url, they are then able to decode the URL and convert it back into www.appsflyer.com/hello everyone”

For reference of all the URL encodings, visit https://www.w3schools.com/tags/ref_urlencode.ASP

URLs within micro-services

There are many different methods you can use for inter-service communication, such as REST, gRPC, and messaging brokers such as Kafka or RabbitMQ. In this story, we’ll be focusing on microservice communication using HTTP REST.

Here we have a service that receives a URL with two parameters:

  • ‘site’:controls which other microservice the query will get sent to;

  • ‘user’: identifies a user in the system;

service which takes two parameters and builds a new URIservice which takes two parameters and builds a new URI

In this example we added some code that validates the user and creates a new url which gets sent to an internal micro-service. Validating a user from a url parameter that is controlled by the client is not a secure design practise, but this does happen in the real world. We’ve taken this idea as an example to help demonstrate why encoding URLs is so important.

Now in this case we did not explicitly write any code to carry out url decoding or encoding. However, the server engine when processing the request automatically decodes the parameters into the proper variables. Let’s see the same example but with special characters:

Parameters are encoded in browser and decoding in codeParameters are encoded in browser and decoding in code

Url Tampering

In the above examples, the site parameter is not validated and there is no encoding applied. This allows us to tamper with the generated url and control the request.

In the below request, we added a %23(#) in order to convert the following characters in the url to be a fragment. Read more about this here.

In order to identify an issue with no access to the source code, or internal servers, we first attempt to fetch data from our external server. Once we see our external server receive a request, we can let the games begin!!

Injected a # to break url buildingInjected a # to break url building

The next step is to bypass authorization checks by manipulating the URL.

Overridden second variable by injecting data into first parameterOverridden second variable by injecting data into first parameter

In this case we are adding our payload to control all elements after https:// and finishing with a # sign which signals the request sender that this is a url fragment.

The result is that a request is sent with a different user to the one that was validated — bypassing the user validation check!

Note that we couldn’t just change the user parameter because it is validated by our user validation function. We could assume also that the user parameter can come from our sso provider so it can’t be tampered with.

Another common scenario:

In this example we have two dynamic query parameters — “key” & “user” — which are controlled by the client. Note that the “user” parameter is validated properly.

With the same concept of parameter manipulation we can add data to the parameter which will inject new variables into requested url.

Overriding the user variable in requested resourceOverriding the user variable in requested resource

Fixing this

Javascript has a feature called encodeURIComponent, which allows us to safely put in a URI component into the URI without being afraid of users injecting content to break out of our parameter.

Other frameworks and languages have other solutions. You can lookup your language here https://rosettacode.org/wiki/URL_encoding

Common feedback I receive from developers is that they will need to check with the receiving service that they decode it first. The answer is simple, in 99% of the cases the receiving service uses a common framework which automatically decodes their data. Of course full testing should be performed after doing some changes but it usually does not affect the application.

Encoding made the key to be as requested with no injection possibilitiesEncoding made the key to be as requested with no injection possibilities

To summarize

When passing data between microservices or other external services it is vital to verify that you are building the target url in a safe way.

Going down memory lane, in the early days of software development twenty years ago, we had a similar problem with SQL. Developers would build SQL queries without the knowledge and understanding that they can be injected with custom queries. Today, most developers are familiar with SQL Injection, but not with URL Injection.

Please share this knowledge with all of your developer friends. If this is something new to you or you already know of it, we would love to hear your feedback.

As for security researchers, you can add a %23(#) or %26(&) to your payloads, weird activity usually is a good sign.