The Trouble with Tribbles...

Thursday, June 05, 2025

Is the Information Security industry succeeding?

Yesterday I had a trip up to London and had a wander round Infosecurity Europe. It was an interesting day, lots of things to see, many interesting conversations.

The show itself is huge. We've clearly come out of the doldrums of the last few years where shows had become tiny. And this was a dedicated infosec event, not just one part of a larger IT event.

Going by the size of the event, the number of exhibitors, the number of attendees, the size and extravagance of the displays, I think it's fair to say that Information Security as a business sector is doing very well. There's clearly a huge amount of vendor cash to splash around, and a confidence that customers have plenty of cash to buy the products on offer.

But is making money the correct definition of success here?

Most of the industry has a focus on detection and remediation. The pitch is that your systems are horrendously insecure and you need to give vendor X lots of money so they can detect a failure and help get your business back on its feet.

There was very little, in fact almost nothing, aimed at actually building more secure systems. (Even training and awareness is really nothing more than glossing over the cracks.) Maybe the closest is things aimed at the supply chain, but even that's basically detection of someone else's vulnerabilities.

So, in terms of actually building better systems, the Infosecurity industry is failing. It's not even addressing the problem.

(I would say that one definition of success for an information security company would be for it to do such a good job it's no longer needed. Clearly that's not going to be in many business plans.)

Furthermore, a string of high-profile hacks and breaches clearly indicates that the industry is failing to keep businesses secure.

Tuesday, May 13, 2025

Random thoughts on a Next Generation Tribblix

I have a little private project called xTribblix.

What's the x stand for? eXtreme? eXtraordinary? eXperimental? neXt generation?

Honestly, I don't know. It doesn't matter, it's just a little bucket I can drop things in to. But essentially, a set of experiments around changing Tribblix that allows me to do interesting things. The aim would be that, if successful, they get folded back into regular Tribblix; if unsuccessful then it's a learning experience.

It's just the logical continuation of the drive I've always had to make Tribblix faster, leaner, cleaner, fitter, easier, more secure. While retaining compatibility and functionality.

There are a few bits of illumos that really ought to be removed. Printing is a prime example - CUPS is a better, more modern implementation, maintained, familiar to everyone, what most Solaris people wanted anyway, and to be honest printing *isn't* an illumos core competency, so it's an ideal target to be outsourced. That's a clear example with a superior replacement already available; most subsystems might have someone crawl out of the woodwork who's inconvenienced by their removal.

So far, I've simply looked at things and decided to implement many of the simple ones for the next release(s) without the need for a separate experimental release. This isn't new, it's been going on for many releases already, and so far I've managed not to break anything that matters.

Some of the things done already (some will be in the next release):

grub deprecated
update DEFAULT_MAXPID to allow pid > 30000 (eg 99999 like smartos)
delete ftpusers, as there's no illumos ftpd
long usernames now silent rather than warning
removed uucp, and removed the nuucp user
zones based on core-tribblix need to worry less about what to remove
overlays based on core-tribblix with the actual images having a driver layer on top, so cloud/virtual images can slim down
replace /usr/xpg4/bin/more with a link to less
replace pax with the heirloom version
create /var/adm/loginlog by default
increase PASSLENGTH in /etc/default/passwd to 8
remove /etc/log and /var/adm/log, latter only used by volcopy
transformed away and eliminated most uses of isaexec
remove /usr/games
remove all legacy printing
remove libadt_jni
remove ilb
remove the old as on x86, everything should use gas
remove oawk and man page (and ref in awk.1)
remove newform, listusers, asa
no longer install doctools by default
drop the closed iconv bits, as they're useless
remove libfru* on x86
replace sendmail with the upstream
deprecate mailwrapper

A lot of this is simple package manipulation as I convert the IPS repo produced by an illumos build into SVR4 packages, mostly avoiding the need to patch the source or the build.

There's a lot more that could be done, some examples of what I'm thinking of include:

xpgN by default (replace regular binaries in /usr/bin)
sort out cpp (last remaining closed bin)
everything 64-bit
remove /etc links more aggressively
no ucb at all [except mebbe install...]
see if there are any expensive and unused kstats we could remove
firewall on by default
passwd blocklists by default
extendedFILE(7) enabled by default (although not necessary if everything is 64-bit!)
refactor packages so they are along sensible boundaries (with reducing the number of distinct packages being the goal)

Now all I need is some time to implement all this...

Thursday, April 24, 2025

On efficiency and resilience in IT

Years ago, I was in a meeting when a C-level executive proclaimed:

IT systems run at less than 10% utilization on average, so we're moving to the cloud to save money.

The logic behind this was that you could run systems in the cloud that were the size you needed, rather than the size you had on the floor.

Of course, this particular claim was specious. Did he know the average utilization of our systems, I asked. He did not. (It was at least 30%.)

Furthermore, measuring CPU utilization is just one aspect of a complex multidimensional space. Systems may have spare CPU cycles, but are hitting capacity limits on memory, memory bandwidth, network bandwidth, storage and storage bandwidth. It's rare to have a system so well balanced that it saturates all parameters equally.

Not only that, but the load on all systems fluctuate, even on very short timescales. There will always be troughs between the peaks. And, as we all know, busy systems tend to generate queues and congestion - or, as a technical term, higher utilization leads to increased latency.

Attempting to build systems that maximise efficiency implies minimizing waste. But if you always consider spare capacity as wasted capacity, then you will always get congested systems and slow response. (Just think about queueing at the tills in a supermarket where they've staffed them for average footfall.)

So guaranteeing performance and response time implies a certain level of overprovisioning.

Beyond that, resilient systems need to have sufficient capacity to not only handle normal fluctuations in usage, but abnormal usage due to failures and external events. And resilient design needs to have unused capacity to take up the slack when necessary.

In this case, a blinkered focus on efficiency not only leads to poor response, it also makes systems brittle and incapable of responding if a problem occurs.

A simple way to build resiliency is to have redundant systems - provision spare capacity that springs into action when needed. In such an active-passive configuration, the standby system might be idle. It doesn't have to be - you might use redundant systems for development/test/batch workloads (this presupposes you have a mechanism like Solaris zones to provide strong workload isolation).

Going to the cloud might solve the problem for a customer, but the cloud provider has exactly the same problem to solve, on a larger scale. They need to provision excess capacity to handle the variability in customer workloads. Which leads to the creation of interesting pricing models - such as reserved instances and the spot markets on AWS.

Tuesday, April 08, 2025

Understanding emission scopes, or failing to

I've been trying to get my head around all this Scope 1, Scope 2, Scope 3 emissions malarkey. Although it appears that lots of people smarter than me are struggling with it.

Having spent a while looking at how the Scopes are defined, I can understand how this can be difficult.

OK, Scope 1 is an organisation's direct emissions. Presumably an organisation knows what it's doing and how it's doing it, so getting the Scope 1 emissions from that ought to be fairly straightforward.

And Scope 2 is electricity, steam, heating and cooling purchased from someone else. I'm immediately suspicious here because this is a weirdly specific categorisation. But at least it should be easy to calculate - there's a conversion factor but at least you know the usage because it's on a bill you have to pay.

Then Scope 3 is - everything else. The fact that there are 15 official categories included ought to be a big red flag. That it's problematic is shown by the fact so many organisations have problems with it. (And by the growth of an industry to solve the problem for you.)

Personally, I wouldn't have defined it this way. If the idea is to evaluate emissions across the supply chain, then dumping almost all the emissions into the vaguest bucket is always going to be problematic.

So, why wasn't Scope 2 simply defined as the combined Scope 1 emissions of everyone providing services to the organisation. (That includes upstream and downstream, suppliers and employees, by the way.) That has 2 advantages I can see:

It's easy to calculate, because Scope 1 is pretty easy to calculate for all the providers of services (and they may well be doing it anyway), and an organisation ought to know who's providing services to it
It makes Scope 2 bigger (obviously) because there's more included, and therefore makes Scope 3 smaller, so uncertainties in Scope 3 matter less
Because you can better identify the contributors to your Scope 2 emissions, it's easier to know where to start making improvement efforts

I presume there's some reason it wasn't done this way, but I can't immediately see it.

Friday, April 04, 2025

What is this AI anyway?

AI is all the rage right now. It's everywhere, you can't avoid it.

But what is AI?

I'm not going to try and answer that here. What I will do, though, is state the question somewhat differently:

What is meant by "AI" in a given context?

And this matters, because the words we use are important.

The reality is that when you see AI mentioned it really could be almost anything. Some things AI might mean are:

Copilot
ChatGPT
Gemini
Some other specific off the shelf public LLM
Anything involving any off the shelf LLM
A custom domain-specific LLM
Machine learning
Pattern matching
Image recognition
Any old computer program
One of the AI companies

And there's always the possibility that someone has simply slapped AI on a product as a marketing term with no AI involved.

This persistent abuse of terminology is really unhelpful. Yesterday I went to a very interesting event for conversations about Hopes and Fears around AI.

Am I hopeful or fearful about AI? It depends which of the above definitions you mean.

There are certain uses of what might now be lumped in with AI that have proven to be very successful, but in many cases they're really machine learning, and have actually been around for a long time. I'm very positive about those (for example, helping in medical diagnoses).

On the other hand, if the AI is a stochastic parrot trained via large scale abuse of copyright while wreaking massive environmental damage, then I'm very negative about that.

So I think it's important to get away from sticking the AI label onto everything that might have some remote association with a computer program, and be far more careful in our terminology.

Tuesday, March 25, 2025

Tribblix on SPARC: sparse devices in an LDOM

I recently added a ddu like capability to Tribblix.

In that article I showed the devices in a bhyve instance. As might be expected there really aren't a lot of devices you need to handle.

What about SPARC, you might ask? Even if you don't, I'll ask for you.

Running Tribblix in a LDOM, this is what you see:

root@sparc-m32:/root# zap ddu
Device SUNW,kt-rng handled by n2rng in TRIBsys-kernel-platform [installed]
Device SUNW,ramdisk handled by ramdisk in TRIBsys-kernel [installed]
Device SUNW,sun4v-channel-devices handled by cnex in TRIBsys-ldoms [installed]
Device SUNW,sun4v-console handled by qcn in TRIBsys-kernel-platform [installed]
Device SUNW,sun4v-disk handled by vdc in TRIBsys-ldoms [installed]
Device SUNW,sun4v-domain-service handled by vlds in TRIBsys-ldoms [installed]
Device SUNW,sun4v-network handled by vnet in TRIBsys-ldoms [installed]
Device SUNW,sun4v-virtual-devices handled by vnex in TRIBsys-kernel-platform [installed]
Device SUNW,virtual-devices handled by vnex in TRIBsys-kernel-platform [installed]

It's hardly surprising, but that's a fairly minimal list.

It does make me wonder whether to produce a special SPARC Tribblix image precisely to run in an LDOM. After all, I already have slightly different variants on x86 designed for cloud in general, and one for EC2 specifically, that don't need the whole variety of device drivers that the generic image has to include.

Sunday, March 23, 2025

Expecting an AI boom?

I recently went down to the smoke, to Tech Show London.

There were 5 constituent shows, and I found what each sub-show was offering - and the size of each component - quite interesting.

There wasn't much going on in Devops Live, to be honest. Relatively few players had shown up, nothing terribly interesting.

There wasn't that much in Big Data & AI World either. I was expecting much more here, and what there was seemed to be on the periphery. More support services than actual product.

The Cloud & Cyber Security Expo was middling, not great, and there was an AI slant in evidence. Not proper AI, but a sprinkling of AI dust on things just to keep up with the Joneses.

Cloud and AI Infrastructure had a few bright spots. I saw actual hardware on the floor - I had seen disk shelves over in the Big Data section, but here I spotted a Tape Library (I used to use those a lot, haven't seen much in that area for a while) and a VDI blade. Talked to a few people, including the Zabbix and Tailscale stands.

But when it came to Data Centre World, that was buzzing. It was about half the overall floor area, so it was far and away the dominant section. Tremendous diversity too - concrete, generators, power cables, electrical switching, fiber cables, cable management, thermal management, lots of power and cooling. Lots and lots of serious physical infrastructure.

There was an obvious expectation on display that there's a massive market around high-density compute. I saw multiple vendors with custom rack designs - rear-door and liquid cooling in evidence. Some companies addressing the massive demand for water.

If these people are at a trade show, then the target market isn't the 3 or 4 hyperscalers. What's being anticipated in this frenzy is very much companies building out their own datacentre facilities, and that's very much an interesting trend.

There's a saying "During a gold rush, sell shovels". What I saw here was a whole army of shovel-sellers getting ready for the diggers to show up.