PHP的过滤器函数filter_var可以通过FILTER_VALIDATE_EMAIL选项判断字符串是否是邮箱格式,如果是则返回过滤后的字符串,不是则返回FALSE。

但这个邮箱过滤的方式极其地粗糙,使用不当很容易造成安全问题。

FILTER_VALIDATE_EMAIL的实现在 https://github.com/php/php-src 的ext/filter/logical_filters.c文件中可以找到,对应的是该文件的 void php_filter_validate_email(PHP_INPUT_FILTER_PARAM_DECL) 函数。

从具体实现可以知道,FILTER_VALIDATE_EMAIL本质上还是基于正则表达式实现,用于判断的正则表达式有两种,取决于filter_var函数调用时是否使用FILTER_FLAG_EMAIL_UNICODE标志。不用该标志的情况下,使用的是以下正则表达式:

1
const char regexp1[] = "/^(?!(?:(?:\\x22?\\x5C[\\x00-\\x7E]\\x22?)(?:\\x22?[^\\x5C\\x22]\\x22?)){255,})(?!(?:(?:\\x22?\\x5C[\\x00-\\x7E]\\x22?)(?:\\x22?[^\\x5C\\x22]\\x22?)){65,}@)(?:(?:[\\x21\\x23-\\x27\\x2A\\x2B\\x2D\\x2F-\\x39\\x3D\\x3F\\x5E-\\x7E]+)(?:\\x22(?:[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F\\x21\\x23-\\x5B\\x5D-\\x7F](?:\\x5C[\\x00-\\x7F]))*\\x22))(?:\\.(?:(?:[\\x21\\x23-\\x27\\x2A\\x2B\\x2D\\x2F-\\x39\\x3D\\x3F\\x5E-\\x7E]+)(?:\\x22(?:[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F\\x21\\x23-\\x5B\\x5D-\\x7F](?:\\x5C[\\x00-\\x7F]))*\\x22)))*@(?:(?:(?!.*[^.]{64,})(?:(?:(?:xn--)?[a-z0-9]+(?:-+[a-z0-9]+)*\\.){1,126}){1,}(?:(?:[a-z][a-z0-9]*)(?:(?:xn--)[a-z0-9]+))(?:-+[a-z0-9]+)*)(?:\\[(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})(?:(?!(?:.*[a-f0-9][:\\]]){7,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?)))(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){5}:)(?:(?!(?:.*[a-f0-9]:){5,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3}:)?)))?(?:(?:25[0-5])(?:2[0-4][0-9])(?:1[0-9]{2})(?:[1-9]?[0-9]))(?:\\.(?:(?:25[0-5])(?:2[0-4][0-9])(?:1[0-9]{2})(?:[1-9]?[0-9]))){3}))\\]))$/iD";

稍微整理下再看:

1
2
3
4
5
6
7
8
9
10
/^(?!(?:(?:"?\[-~]"?)  (?:"?[^\"]"?)){255,})
(?!(?:(?:"?\[-~]"?) (?:"?[^\"]"?)){65,}@)
(?:(?:[!#-'*+-/-9=?^-~]+) (?:"(?:[--!#-[]-] (?:\[-]))*"))
(?:\.(?:(?:[!#-'*+-/-9=?^-~]+) (?:"(?:[--!#-[]-](?:\[-]))*")))*
@
(?:(?:(?!.*[^.]{64,}) (?:(?:(?:xn--)?[a-z0-9]+(?:-+[a-z0-9]+)*\.){1,126}){1,}
(?:(?:[a-z][a-z0-9]*) (?:(?:xn--)[a-z0-9]+))(?:-+[a-z0-9]+)*) (?:\[(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})
(?:(?!(?:.*[a-f0-9][:\]]){7,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?)))
(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){5}:) (?:(?!(?:.*[a-f0-9]:){5,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3}:)?)))
?(?:(?:25[0-5])(?:2[0-4][0-9])(?:1[0-9]{2})(?:[1-9]?[0-9]))(?:\.(?:(?:25[0-5])(?:2[0-4][0-9])(?:1[0-9]{2})(?:[1-9]?[0-9]))){3}))\]))$/iD

其中一些十六进制转换成非打印字符而显示成□,不过不要紧。重点是可以看出,这个正则是可以匹配诸多特殊字符的,比如!#-'*+-/-9=?^-~",甚至支持punycode,差不多只相当于判断了@ 符号的存在。通过RFC822 email address validator这个在线邮箱地址有效性检测工具,可以看到下面的邮箱都会被认为是有效的。

1
!#-'\*+-/-9=?^[email protected]
1
\"\<script\>\"@126.com
1
script@xn--6qq308a

因此,在brutelogic的XSS邮箱测试( https://brutelogic.com.br/tests/input-formats.php?email=[email protected] )中,可以通过设置email参数

1
\"<svg/onload=alert(1)\"@x\.y

来触发XSS。