Sunday, July 31, 2016

Minimizing apache and PHP

Recently I was looking at migrating a simple website in which every page but one was static.

The simplest thing here would be to use nginx. It's simple, fast, modern, and should make it dead easy to get an A+ on the Qualys SSL test.

But that non-static page? A trivial contact form. Fill in a box, the back-end sends the content of the box as an email message.

The simplest thing here in days gone by would have been to put together a trivial CGI script. Only nginx doesn't do CGI, at least not directly. Not only that, but writing the CGI script and doing it well is pretty hard.

So, what about PHP? Now, PHP has gotten itself a not entirely favourable reputation on the security front. Given the frequent security updates, not entirely undeserved. But could it be used for this?

For such a task, all you need is the mail() function. Plus maybe a quick regex and some trivial string manipulation. All that is in core, so you don't need very much of PHP at all. For example, you could use the follwoing flags to build:

--disable-all
--disable-cli
--disable-phpdbg

So, no modules. Far less to go wrong. On top  of that, you can disable a bunch of things in php.ini:

file_uploads = Off [change]
allow_url_fopen = Off [change]
allow_url_include = Off [default]
display_errors = Off [default]
expose_php=Off [change]

Furthermore, you could start disabling functions to your heart's content:

disable_functions = php_uname, getmyuid, getmypid, passthru, leak, listen, diskfreespace, tmpfile, link, ignore_user_abord, shell_exec, dl, set_time_limit, exec, system, highlight_file, source, show_source, fpaththru, virtual, posix_ctermid, posix_getcwd, posix_getegid, posix_geteuid, posix_getgid, posix_getgrgid, posix_getgrnam, posix_getgroups, posix_getlogin, posix_getpgid, posix_getpgrp, posix_getpid, posix, _getppid, posix_getpwnam, posix_getpwuid, posix_getrlimit, posix_getsid, posix_getuid, posix_isatty, posix_kill, posix_mkfifo, posix_setegid, posix_seteuid, posix_setgid, posix_setpgid, posix_setsid, posix_setuid, posix_times, posix_ttyname, posix_uname, proc_open, proc_close, proc_get_status, proc_nice, proc_terminate, phpinfo

Once you've done that, you end up with a pretty hardened PHP install. And if all it does is take in a request and issue a redirect to a static target page, it doesn't even need to create any html output.

Then, how to talk to PHP?  The standard way to integrate PHP with nginx is using FPM. Certainly, if this was a high or even moderate traffic site, then that would be fine. But that involves leaving FPM running permanently, and is a bit of a pain and a resource hog for one page that might get used once a week.

So how about forwarding to apache? Integration using mod_php is an absolute doddle. OK, it's still running permanently, but you can dial down the process count and it's pretty lightweight. But we have a similar issue to the one we faced with PHP - the default build enables a lot of things we don't need. I normally build apache with:

--enable-mods-shared=most
--enable-ssl

but in this case you can reduce that to:

--enable-modules=few
--disable-ssl

now, there is the option of --enable-modules=none, but I couldn't actually get apache to start at all with that - some modules appear to be essential (mod_authz_host, mod_dir, mod_mime, and mod_log_config at least), and going below the "few" setting is entering unsupported territory.

You can restrict apache even further with configuration, just enable PHP, return an error for any other page, listen only on localhost. (I like the concept of the currently experimental mod_allowmethods as we might only want POST for this case. Normally disabling methods with current apache version involves mod_rewrite, which is one of the more complex modules.)

In the end, we elected to solve the problem a different way, but it was still an instructive exercise.

The above would be suitable for one particular use case. For a general service, it would be completely useless. Most providers and distributions tend to build with the kitchen sink enabled, because you don't know what your users or customers might need at runtime. They might build everything as a shared module, and could package every module in a separate package (although this ends up being a pain to manage); or you rely on the user to explicitly enable and disable modules as necessary.

In Tribblix, I've tended to avoid breaking something like apache or PHP up into multiple packages.  There's one exception, which is that the PHP interface to postgresql is split out into a separate package. This is simply because it links against the postgresql shared library, so I ship that part separately to avoid forcing postgresql to be installed as a dependency.

Saturday, July 30, 2016

Building Tribblix packages

Software in Tribblix is delivered in packages, which come from one of three sources - an illumos build, a bootstrap distribution (OpenIndiana or OpenSXCE depending on hardware architecture), and native Tribblix packages.

The illumos packages are converted from the IPS repo created during a build of illumos-gate using the repo2svr4 script in the tribblix-build repo. There's also a script ips2svr4 in the same repo that's used to construct an SVR4 package from that installed on a system using IPS packaging, such as OpenIndiana.  The OpenSXCE packages are shipped as-is.

(The use of another distro to provide components was expedient during early bootstrapping. Over time, the fraction of the OS provided by that other distribution has shrunk dramatically. At the present time, it's mostly X11.)

What of the other packages, those natively built on Tribblix?

Those are described in the build repo.

In the build repo, there are a number of top-level scripts. Key of these is dobuild, which is the primary software builder. Basically, it unpacks a source archive, runs configure, make, and make install. It can apply patches, run scripting before and after the configure step, and knows how to handle most things that are driven by autoconf.

There are some other scripts of note. The genpkg and create_pkg scripts go from a build to a package. The pkg_tarball script is an easy way to do a straight conversion of an archive to a package. There are scripts to create the package catalogs.

For each package, there is a directory named after the package, contains files used in the build.

At the very minimum, you need a pkginfo file (this is a fragment, the rpocess creates the rest of the actual pkginfo file). There's the possibility of using fixit and fixinstall scripts to fix up any errant behaviour from the make install step before actually creating packages. There are depend files listing package dependencies, and alias files listing user-friendly aliases for packages.

However, how do you know how a package was actually built? Even for packages created with the dobuild script, there are a lot of flags that could have been provided. And a lot of software doesn't fit into the configure style of build in any case.

What I actually did was have a big text file containing the commands I used to build each package. Occasionally with some very unprintable comments about some of the steps I had to take to get things to build. (So simply adding that file to the repo was never going to be a sensible way forward.)

So what I've done is split those notes up and created a file build.sh for each package, which contains the instructions used to create that package. It assumes that the THOME environment variable points to the parent of the build repo, and that there's a parallel tarballs directory containing the archives. (Many of the scripts, unfortunately, assume a certain value for that location, which is the location on my own machine. Yes, that should be fixed.)

There are a number of caveats here.

The first is that some packages don't have a build.sh file. Yet. Some of these are my own existing packages, which were built outside Tribblix. Some go back to the very earliest days, and notes as to how they were built have been lost in the mists of time - these will be added whenever that package is next built.

The second is that the build recipe was valid at the time it was last used. If you were to run the recipe now, it might not work, due to changes in the underlying system - packages are not rebuilt unless they need to be, so the recipes can go all the way back to the very first release. It might not generate the same output. (This is really autoconf, which gropes around the system looking for things it can use, so running it again might pull in additional dependencies. Occasionally this causes problems and I need to explicitly enable or disable certain features. In some cases, you have to uninstall packages to make the build run in a sane manner.)

The third is that, while the build recipe looks like a shell script, and in many cases will actually function as such, it's really a recipe that you cut and paste into a terminal. At least, that's what I do. Sometimes it's necessary, because there was some manual hacky workaround I needed that's just in the build script as a comment.

This has been an outstanding TODO item for a while now, so I'm glad to have got it out of the way.