How I found and fixed a bug in PHP's standard library

栏目: IT技术 · 发布时间: 4年前

内容简介:I’ve been a software developer at EBANX since 2016. There we have a mechanism calledI was deploying some code once and I noticed something weird in the logs. Some requests had two Host headers once they arrived in the release candidate server. Normally we

I’ve been a software developer at EBANX since 2016. There we have a mechanism called Release Candidate . Every time we deploy something new to our monolith PHP system we end up deploying it first to the release candidate server. Then we elect a few random requests and proxy it from our live normal servers to the release candidate server, if those requests go well with no errors then the deploy occurs in the normal API servers, if an error occurs the proxying mechanism is disabled and the responsible for the code can better debug the problem in the release candidate server and rollback the code if needed.

I was deploying some code once and I noticed something weird in the logs. Some requests had two Host headers once they arrived in the release candidate server. Normally we simply copy the headers that came from the original request, so the request should arrive in the release candidate with the host header containing the url of the original server. In the cases where two Hosts were present the top one was the url of the release candidate server and the second one was the original one containing the original api server. This puzzled me, the code I was modifying had nothing to do with the proxying mechanism, and after looking at older logs I found out that this always occurred a few times. Since this mechanism is critical to the safety of our deploys I dropped everything and started to investigate.

Naturally I assumed that the bug was somewhere in our proxying code, after digging for a while I couldn’t find anything wrong with the code. This is the function we use to rewrite the Header of the request to the release candidate with the original host:

private static function adjustRequestTarget(RequestInterface $request): RequestInterface {
  $original_host = $request->getHeaderLine('Host');
  $new_uri = $request->getUri()->withHost(static::getReleaseCandidateInstanceHost());
  return $request
    ->withUri($new_uri)
    ->withHeader('Host', $original_host);
}

We are using Guzzle as the http client. Internally the headers are represented as an array inside the class GuzzleHttp\Psr7\Request. I noticed that when calling the withHeader function the Host header is shifted to the end of array, but regardless of this detail I couldn’t reproduce the issue when testing locally with the exact same requests that caused the problem in production.

I ended up noticing something pretty interesting. The stream option being passed during the request was different on my tests than the production code, this made guzzle use a different http handler, with stream set as true it uses the PHP stream handler, with it set as false it uses cURL, I was testing it with false, changing it to true made me able to reproduce the issue when replaying a request that caused the problem in production, now I just needed to drill down enough in the code that was called to determine what was causing the problem.

After cloning the PHP source and reading for a while I finally found the culprit, the problem is caused by a combination of the change of header array order that the previously referenced adjustRequestTarget function causes, and the way the http fopen wrapper handles the http headers.

Inside the http_fopen_wrapper.c code we have these two interesting snippets:

if ((s = strstr(t, "host:")) &&
                (s == t || *(s-1) == '\r' || *(s-1) == '\n' ||
                             *(s-1) == '\t' || *(s-1) == ' ')) {
                 have_header |= HTTP_HEADER_HOST;
            }
/* Send Host: header so name-based virtual hosts work */
if ((have_header & HTTP_HEADER_HOST) == 0) {
    smart_str_appends(&req_buf, "Host: ");
    smart_str_appends(&req_buf, ZSTR_VAL(resource->host));
    if ((use_ssl && resource->port != 443 && resource->port != 0) ||
        (!use_ssl && resource->port != 80 && resource->port != 0)) {
        smart_str_appendc(&req_buf, ':');
        smart_str_append_unsigned(&req_buf, resource->port);
    }
    smart_str_appends(&req_buf, "\r\n");
}

The first snippet searches in the header string for an instance of the host: string in the beginning of a line, if it finds one there then it enables the have_header HTTP_HEADER_HOST flag, later the second snippet checks that flag and if it’s not enabled then it appends a Host header with the target address.

Turns out that all the affected requests had a header with the string host: somewhere in them before the Host header (which was shifted to the end of array due to us changing it back to the original host), this made the check fail and the wrapper thought we didn’t have a host header because strstr only returns the first instance of a string. So the wrapper ended up appending a new Host header with the target url and it still sends the one that we were trying to send originally causing the issue.

The following test script:

<?php
$opts = array(
  'http'=>array(
    'method'=>"GET",
    'header'=>"RandomHeader: foobar\r\n" .
              "Cookie: foo=bar\r\n" .
              "Host: c764dd43.ngrok.io  \r\n"
  )
);
$context = stream_context_create($opts);
$fp = fopen('http://c764dd43.ngrok.io', 'r', false, $context);
fpassthru($fp);

Will generate the following raw headers:

GET / HTTP/1.0
RandomHeader: foobar
Cookie: foo=bar
Host: c764dd43.ngrok.io
X-Forwarded-For: 2804:14c:8781:8c3a:3805:2fbe:1845:15e6

Now if we change the RandomHeader value to something that contains the host: string:

<?php
$opts = array(
  'http'=>array(
    'method'=>"GET",
    'header'=>"RandomHeader: localhost:8080\r\n" .
              "Cookie: foo=bar\r\n" .
              "Host: testcustomheader  \r\n"
  )
);
$context = stream_context_create($opts);
$fp = fopen('http://c764dd43.ngrok.io', 'r', false, $context);
fpassthru($fp); (edited)

We’ll get two Host headers in the raw request:

GET / HTTP/1.0
Host: c764dd43.ngrok.io
RandomHeader: localhost:8080
Cookie: foo=bar
Host: testcustomhostheader
X-Forwarded-For: 2804:14c:8781:8c3a:3805:2fbe:1845:15e6

According to the RFC 7320 a server must respond with a 400 (bad request) to requests that contain more than one Host header, so every request affected by this bug should break on a properly implemented server.

With this information I decided to open a bug on the PHP issue tracker. After a few days of inactivity I decided I had enough knowledge to try to implement a fix myself!

I cloned the PHP git repository from github and compiled it from the master branch.

I changed the snippet that checks if the host header is present to check for every occurence of the string:

while ((s = strstr(s, "host:"))) {
	if (s == t || *(s-1) == '\r' || *(s-1) == '\n' ||
	                 *(s-1) == '\t' || *(s-1) == ' ') {
		 have_header |= HTTP_HEADER_HOST;
	}
	s++;
}

I recompiled, ran the test script again and it worked! Now I needed to find out the project contributing guidelines.

I found out that PHP has a test suite composed by phpt files, these files are a mixture of plaintext and php with different sections. The most important sections are FILE and EXPECTED , inside the file section there should be a PHP script that should output something to stdout. The EXPECTED section contains the expected output, the test script then compares the output produced by the script inside FILE to the output on the EXPECTED section, if they match the test passes, if they don’t then it fails. It’s very simple and effective. I changed the script I was using to reproduce the issue to become a .phpt file so we could have automated tests for this bug:

--TEST--
Bug #79265 (Improper injection of Host header when using fopen for http requests)
--INI--
allow_url_fopen=1
--SKIPIF--

--FILE--
<?php
require 'server.inc';

$responses = array(
    "data://text/plain,HTTP/1.0 200 OK\r\n\r\n",
);

$pid = http_server("tcp://127.0.0.1:12342", $responses, $output);

$opts = array(
  'http'=>array(
    'method'=>"GET",
    'header'=>"RandomHeader: localhost:8080\r\n" .
              "Cookie: foo=bar\r\n" .
              "Host: userspecifiedvalue\r\n"
  )
);
$context = stream_context_create($opts);
$fd = fopen('http://127.0.0.1:12342/', 'rb', false, $context);
fseek($output, 0, SEEK_SET);
echo stream_get_contents($output);
fclose($fd);

http_server_kill($pid);

?>
--EXPECT--
GET / HTTP/1.0
Connection: close
RandomHeader: localhost:8080
Cookie: foo=bar
Host: userspecifiedvalue

I commited the fix alongside with the test script and opened a pull request . It was merged a few hours later.


以上所述就是小编给大家介绍的《How I found and fixed a bug in PHP's standard library》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

智能主义

智能主义

周鸿祎 / 中信出版集团股份有限公司 / 2016-11-1 / CNY 49.00

大数据和人工智能迅猛发展,对社会和商业的影响日益深刻,从学术界到企业界,智能化时代必将来临,已经成为共识。而此次变革,将会开启新一轮的发展浪潮。企业家、互联网以及传统企业、个人,应当如何理解这一轮的发展,如何行动以抓住智能化所带来的众多机遇,成为所有人持之以恒的关注热点。 周鸿祎作为最具洞察力的互联网老兵、人工智能领域成功的先行者,通过总结360公司的战略布局、产品规划、方法论实践,从思想到......一起来看看 《智能主义》 这本书的介绍吧!

MD5 加密
MD5 加密

MD5 加密工具

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器