Home
kernelslacker
08 December 2008 @ 02:02 pm
Some questions related to virtual memory never seem to go away.

Here are some of my favorites that I just love getting asked at least once every couple weeks. (Despite not being a VM hacker).

  • My computer is using N MB of swap space, and there's memory free! This sucks!

    To understand this, think about what happens to infrequently used pages of a process when there's memory pressure. The OS decided to use those pages of RAM for something more useful.
  • My computer has N MB of swap space free, and my processes still get oom killed. the kernel is buggy!

    When something got oom-killed, you ran out of _memory_, not swap space, and there was nothing else that could be swapped out to free up memory.
  • My computer had N MB of RAM free, and they still got oom killed! the kernel sucks!

    There are different zones of memory. Having hundreds of MB of HIGHMEM available is irrelevant if something really needs memory it can use for DMA (Which is a very tight resource).

And my all-time favorite..
  • I run without any swapspace configured, and once my app that allocates hundreds of gigs of memory keeps getting oom-killed. How do I stop the kernel doing this.

Answers on a postcard for that one ...
Tags: ,
 
 
kernelslacker
"Look for example at the fact that Ubuntu has usually better hardware support, if we all were on the same kernel the others could take the drivers we put in there and have hardware support that is just as good as Ubuntu."

Does no-one else see the hypocrisy in this statement ? Here's how it reads to me... "It would be great if everyone just shipped the Ubuntu kernel and debugged the random crap we merge that we don't have the resources to do ourselves".

If only there were some kind of process of getting drivers merged upstream to kernel.org. Perhaps then we COULD be on the same kernel. Oh wait, there is a process. Ubuntu just chooses to ignore it.

This idea makes absolutely no sense whatsoever when a distro is patching the kernel to hell.

Having distros ship the same version of major components is utterly pointless unless everyone is on the same page, and stops making moronic decisions like "lets replace a major piece of security functionality with something else because the one upstream is complicated".
Tags:
 
 
kernelslacker
Something that's great about conferences, is that they often cause discussion which leads to thinking out loud, and suddenly, an idea comes up that seems to obvious it smacks of "why hasn't anyone done this yet?".

During my talk at fudcon (yes, I said I wasn't going to do one, but it ended up being just a Q&A instead of a 'state of the union' kind of talk), there was quite a bit of talk about the forthcoming kernel modesetting feature, and how awesome it's going to be to be able to capture oopses when we've locked up in X. This led to "I don't know how many times I've seen a kernel panic scroll off the top of the screen, and I didn't have a serial console hooked up".

I boot all my boxes with appropriate grub lines to make it spew out the serial port, but I don't have enough serial cables to keep every box permanently rigged up to a console (I know there are multiplexer type things, I don't have, nor want one of those either). A lot of the time, the box I run minicom on isn't necessarily powered up either. So a lot of the time, that serial output goes nowhere.

After we panic, we just sit there in a for (;;); loop forever.

The 'duhhh' moment that came up on the weekend was..

Why not make that infinite loop do something like:

for (;;) {
wait for a minute
send the dmesg ring buffer out again


As well as this giving you some time to find/hook up a serial cable/start minicom, it would also have the advantage that it would print out the whole dmesg buffer, which may contain additional useful clues as well as the oops.
 
 
kernelslacker
16 June 2008 @ 04:58 pm
Like many other Fedora contributors, I'll be at Fudcon this coming weekend. (And the Red Hat summit on Wednesday).
Trying to decide whether or not to do the usual 'state of the union' kernel talk that I do at every fudcon. Whilst there's thousands and thousands of changes each release, and usually two upstream kernel point releases between each Fedora release, the number of user-visible changes for people to get excited about is dramatically less. For the most part, the majority of the changes that happen are, well.. dull. We have a lot of churn just cleaning up crufty old code, hundreds of trivial bug fixes (and many not-so-trivial ones), but feature wise.. I can't recall anything recently that users would be excited about since the addition of things like kvm & tickless support, which was several releases back now.

Cribbing from slide 21 of Greg's recent talk, which shows a brief list of what went into 2.6.26 so far, nothing really jumps out as something that users are clamoring for. Kgdb is something that will likely prove useful for us to get more debug info out of people seeing problems if they're a) set up for using kgdb, and b) have a bug which allows kgdb type poking (ie, no use in lockup situations). The addition of PAT will surely make the Xorg developers happy, and maybe some other folks too, but everything else is just 'more of the same'. More new drivers. More fixes. Incremental improvements.

Perhaps it's not coincidental that the 2.6.25 feels like a pretty solid release. Shipping F9 with that kernel was definitely a good idea, and early indications seem to show that .26 will be a decent update for it in a few weeks time. 2.6.25 wasn't flawless, but a lot of the really silly things seem to be getting fixed up quickly by the -stable process. Some of the more obscure bugs (like some awkwardness with timers on some chipsets, notably ATI ones) will hopefully be fixed when we rebase to 26.

With F10 still being many moons away, moving forward to whatever .27 brings looks likely.

We'll likely be adding in some bits and pieces that don't make the merge window for .27 too. The usual wireless breakage^Wupdates du jour, probably the DRM modesetting stuff again, nouveau update etc.

That's about all I'd have to say in a slot at Fudcon, so now that I've said it, perhaps I'll pass on speaking this time around, so people won't have to choose between the dull kernel talk or something more interesting. (Additionally, I'll get to sit in on one extra talk).
Tags: ,
 
 
Current Music: Jesu - Sundown
 
 
kernelslacker
21 December 2007 @ 05:03 pm
(This post reads a little more 'shout out to my homies' than I originally anticipated, but it is what it is, and you people kick ass)

2007 has been a good year for the Fedora kernel.

At the end of last year, we doubled the size of the Fedora kernel team when we hired Chuck Ebbert. Whilst Fedora had a number of contributors to the kernel before this, I was the only employee at that time who was working exclusively on the Fedora kernel full-time.

I expected the first few months of the year to go slowly, as Chuck got to grips with processes etc. Boy was I wrong. Within no time at all, Chuck was pushing out updates for our older releases faster than I could keep track. It soon got to a stage where he was single-handedly taking responsibility for the updates, whilst I concentrated on beating rawhide into shape for what would become Fedora 7. Not one to slack off, Chuck also lent a hand there too. It became apparent real quickly we had got the right guy for the job.

Fedora 7 was a pretty solid release. Despite being based on 2.6.21 which wasn't the best kernel we've had. A few months of bug fixing got it into shape in short order. F7 also was the first distro to ship a tickless kernel. The work put in by Thomas Gleixner to nail down a huge number of timer related bugs both during development, and after release cannot be understated. Things would have looked a lot different without his involvement.

F7 was the first release we did where I actually felt 'comfortable' about saying we had wireless support. It still wasn't fantastic, but it was much improved experience over earlier releases. Between John Linville pushing bits into Fedora as fast as they were going into the wireless-dev tree, and the return of Dan Williams to the desktop team after spending a while working on OLPC, the wireless experience in Fedora 8 totally rocks. Maybe I've just been lucky in my wireless chipsets, but I've yet to have F8 fail me. (And the bug reports seem a lot better this time around too).

Asides from technical changes, there have also been a number of process changes this year. First, the creation of the Fedora kernel list, which has grown to several hundred members, whilst still maintaining a healthy signal-to-noise ratio. (Traffic is still light, but sees occasional bursts of activity). Shortly afterwards, the creation of #fedora-kernel which has been really useful for various coordination. To the point that I rarely look at internal Red Hat irc any more.

A few months ago, the Fedora kernel team grew some more, when we hired our third full-time Fedora hacker, Kyle McMartin. Things sure feel different to a year ago, when I feared going on vacation and leaving things hanging until I returned. Kyle & Chuck pick up the ample slack that I create. Whilst I've been schmoozing at conferences, meetings and other blah, these guys have been rocking out pushing updates, fixing bugs, and doing, well, doing what they do. Rock on dudes.

We picked up Dave Airlie this year too, who was a valuable addition to Red Hat. Although he's on a different team, he's been keeping Nouveau and DRM in general in good shape in Fedora this year. I also handed off AGP maintenance this year to him, freeing up even more time for slacking off for me.

Some effort has been spent this year trying to document various tribal knowledge at the Fedora project wiki to hopefully make things easier for future contributors, and also to try and make it easier for people to help themselves provide better bug reports. The 'bug triage' page there has turned into something really useful that I think will only continue to grow more awesome over time. It'll no doubt be handy when we eventually get around to have a 'fedora kernel bug day' which we've been talking about doing for months now, but never quite getting around to. I think we'll have to make this happen soon.

So in summary, a busy year for the Fedora kernel team. Next year, more of the same, but louder, faster, better, more.

[update: I'm sure I've forgotten the valuable contributions of other people over the last year too. Don't feel slighted if I missed you. You also rock, I'm just an amnesiac].
Tags: ,
 
 
kernelslacker
26 October 2007 @ 01:31 pm
I really dislike this stage of Fedora development. Just before we're about to ship. We've frozen on 2.6.23 for F8, which should be out in a few weeks. In the meantime, the blocker list continues to dwindle, and upstream storms ahead towards 2.6.24. With it at -rc1 already, by the time we release, upstream will be well on it's way towards a final 2.6.24, so a big rebase update will probably occur within weeks of F8's release.

There's no shortage of bugs that need fixing, but not much in the critical sense of "can't install / update".

I've been rebasing devel/ to 2.6.24rc, but haven't done any builds yet, so that people focus on beating F8 into shape rather than running off to run the latest shiny thing.
Tags: ,
 
 
kernelslacker
17 October 2007 @ 02:33 pm

$ git diff v2.6.22..v2.6.23 |diffstat
..
7203 files changed, 406268 insertions(+), 339071 deletions(-)

$ git diff v2.6.23.. |diffstat
8193 files changed, 639323 insertions(+), 372811 deletions(-)


And the merge window isn't even closed yet. That's a lot of churn.
The arch/i386 & arch/x86-64 to arch/x86 merge caused a lot of code motion here, so a lot of files just got moved around in place with no actual changes. There have already been quite a few cleanups on top of that work however, deleting duplication, and places where we needlessly diverged between the two architectures.

Probably a lot more of that kind of clean up to come too.
Tags:
 
 
kernelslacker
29 August 2007 @ 07:18 pm
Spent some time today writing documentation on diagnosing common bugs, exposing various tricks that have come in handy whilst fixing Fedora kernel bugs.

It's been a while since I last did something like this. It's made a refreshing change from just plodding through bugzilla.
Now hopefully others can learn from this, and also plod along at home.
 
 
kernelslacker
02 July 2007 @ 11:11 pm
Talking with quite a few people at OLS last week, it seems there are still quite a few misconceptions about just how patched various kernels were throughout the history of Red Hat. One particularly egregious statement I heard was "Early Red Hat kernels had ~2000 patches".

Here's some hard facts on exactly how many patches were in each release.
(I don't have access to earlier kernels than these, but < RHL6, those kernels were probably even less patched).
Caveats:

  • All stats are based on the last state that CVS was in when I ran the grep ^%patch *.spec | wc -l so may not match the kernel that ended up on the ISO for that release.
  • Some releases contain -ac patches, which are roll-ups, sometimes of dozens of patches.
    In later releases, this practise has been deprecated (and also Alan rarely does -ac releases any more).


ReleaseNumber of patchesVersion
Red Hat Linux 7.0702.2.24 !
Red Hat Linux 7.11182.4.20
Red Hat Linux 7.21182.4.20
Red Hat Linux 7.31202.4.20
Red Hat Linux 81172.4.20
Red Hat Linux 91432.4.20


Wow, that takes me back. As the version number shows, these releases were all more or less the same kernel.
The big difference was RHL9 had nptl, and a few other 2.6 backports. These were pretty invasive patches.


ReleaseNumber of patchesVersion
Fedora Core 11052.4.22
Fedora Core 2782.6.10
Fedora Core 31082.6.12
Fedora Core 4852.6.17
Fedora Core 51452.6.20
Fedora Core 61912.6.20
Fedora 71642.6.21
Fedora 8-devel (rawhide) 632.6.22-rc7


A few interesting points on the Fedora kernels.
  • FC1 never rebased off of its original kernel, so there were quite a few > 2.4.22 patches backported to that.
    It still had nptl etc from RHL9, so rebasing was painful back then.
  • FC5 & FC6 have quite high patchcounts, because currently they've stopped following mainline.
    After seeing the fallout from 2.6.21 during F7's development, we decided to skip a release, and hope that .22 works out to be better. When we rebase, approximately half those patches go away.
  • F7 is carrying quite a few patches that really should get pushed through -stable. Some of them may even be pending inclusion. Lots of them are backports from 2.6.22-rc
  • rawhide is still quite high. I need to do a patch bombing soon to try and get a bunch of the bits we're carrying pushed upstream. Should be able to get this back under 50 when the merge window opens up for 2.6.23


Now the fun ones..
ReleaseNumber of patchesVersion
AS 2.14872.4.9
AS 2.1 (ia64)2722.4.18
RHEL33952.4.21
RHEL410782.6.9
RHEL58812.6.18


So over time, RHEL releases accumulate lots of patches. The pain-point of not rebasing means they tend to build up over time. RHEL4 being the 'winner', picking up just over a thousand patches in 2.5 years, though RHEL5 is catching up fast, nearly reaching that number already, after being released earlier this year.
This acceleration of patches may be related to the fact that upstream is moving a lot faster than it was around the time of earlier releases. We're finding and fixing a lot more stuff, a lot faster, and the upstream process seems to show no signs of slowing down (if anything, people are trying to make it go faster).

So, despite the claim that "Red Hat had thousands of patches" isn't true today, it might not be true once we freeze a future RHEL release.
Tags: ,
 
 
kernelslacker
19 June 2007 @ 10:17 pm
GregKH on the future of enterprise kernels suggests one possible future scenario..

On every major update, the kernel is updated to the latest kernel.org release, much
like the consumer products are (Fedora, openSUSE, Ubuntu, Mandriva, etc.)
This will ensure that any upstream update for drivers and new features will be automatically included.

* Pro: All of the latest kernel drivers and features will be automatically supported and
included by the distro, enabling the Partners to focus on upstream kernel.org development
and not worry about backporting things to older kernel versions. All bugfixes and security
updates that the vendor has not included in their minor updates are also pulled in at this
time (and there are a lot of them.)

* Con: Partners whose code is not present in kernel.org releases for whatever reason
(do not want it, incompatible licenses, etc.) will have to do a bit more work in tracking
the new releases, although this should be only be slightly more than the current amount of
development and testing that they currently do.


I don't think this is realistic.

The big problem with this scenario is that it ignores the fact that kernel.org kernels are on the whole significantly less stable these days than they used to be. With the unified development/stable model, we introduce a lot of half-baked untested code into the trees, and this typically doesn't get stabilised until after a distro rebases to that kernel for their next release, and uncovers all the nasty problems with it whilst it's in beta.
As well as pulling 'all bugfixes and security updates', a rebase pulls in all sorts of unknown new problems.

It isn't just new code that is problematic either. Which each upstream point revision, we fix x regressions, and introduce y new ones. This isn't going to make enterprise customers paying lots of $ each year very happy.
Greg points out earlier in his write-up that some customers decide to stay on earlier revisions because they fear regressions in production systems that much. I fear such a move would only increase this use scenario.

Next, some upstream kernel releases are real stinkers. Sometimes the timing is right, the planets line up, and you manage to base a product on a solid base. 2.6.18 was pretty well-rounded for example, and I'm glad we moved up to it for RHEL5 rather than our original plan of 2.6.17 an extra 3 months of in-house stabilisation. On the flip-side, basing Fedora7 on 2.6.21 may not have been such a great idea judging from the fallout in bugzilla.
In an ideal world, the upstream -stable process would be working to such an extent that any 'must haves' that fix up most of the damage would get backported from 2.6.22-rc, but due to (perfectly valid) acceptance criteria in -stable like 'must not be too big', not everything that is needed makes it. End result -- I'm holding out for 2.6.22 to 'fix the world'.

But of course, when we rebase, there will be a host of new problems to deal with.

When we stop rebasing the kernel for a new RHEL release, we typically spend 3 months doing nothing but shaking out the bugs, and a lot of these bugs aren't fix at the time in the next upstream version, so it's more than just backporting we're talking about here. Sustained bug-fixing without introducing new problems along the way isn't easy, and it sure as hell isn't fun, which probably explains why so few people actually enjoy this work. The upstream kernel model as-is isn't designed for this kind of activity. (See for eg, the negative reaction to the recurring suggestion of a 'bugfix only' release).

Finally, the 'ABI' issue. Rebasing a kernel decimates any semblance of an ABI.
Functions disappear, change prototypes, change semantics etc. Someone :-) wrote a nice 'stable API nonsense' document for the kernel Documentation dir explaining just the reasons why. "Get your code upstream" isn't the universal answer. As much as I agree with Greg's stance on binary kernel modules, I don't see nvidia, vmware & co opening their code overnight. Fedora users know only too well how often these modules don't build/work when we rebase. This kind of breakage in an update isn't acceptable for the people paying for those expensive support contracts. Whilst our existing ABI promises aren't perfect (sometimes, screw-ups still occur, and by god do we hear about it when we do), but the tools here are getting better, and for the most part, a binary module built against RHEL5-GA will work on RHEL5-U1 and beyond.

So, what is the ideal answer to Greg's dilemma? I'm not sure, but I don't think the status quo is going to change much any time soon. The current situation isn't perfect, but as Greg summarised, it's for the most part, well understood even if it does involve more work all round for everyone.
 
 
kernelslacker
09 June 2007 @ 09:26 pm
Sigh, where to begin.
I had hoped that yesterdays build was going to be 'the one' to go to updates-testing.

A bunch of new patches arrived upstream, in the form of another -stable rc.
I set off the build, and shortly afterwards, linville noticed that one of the patches we were carrying was now obsolete.
I cancelled the build in progress, and watched his build crawl along for several hours.

Then, about 3 hours in, one of the builders ran out of disk space, aborting the whole build.
I waited a while for it to be cleaned up, and resubmitted the build.
At this point, I was feeling pretty demotivated to do much else, so I went out for a beer.

This morning, I get up to check that the build completed, expecting it to be ready for testing, and find a message on #fedora-devel that the 3224 build doesn't boot.

So now I get to play 'find the bad patch' game again. Hopefully, I can get this narrowed down and fixed up quickly.

I feel like I've lost my kernel building mojo.
Tags: ,
 
 
kernelslacker
07 June 2007 @ 03:18 pm
The keen observers of rawhide will have noticed that I started building F8 kernels already.
So far, nothing really amazing, just a rebase to 2.6.22rc4, and some specfile jiggery-pokery to move away from using rpm's %patch directive to a hand crafted macro that allows patch application with -F1
(%patch won't let you pass arbitrary flags to patch(1), and the code in rpm to handle this made my eyes bleed).
The benefits of patches with no fuzz should be obvious, the diffs end up applying in the right places. This has caused problems in the past to the extent where we had one bug that made the kernel not boot.
That was real 'fun' to track down. Fun that I'd rather not repeat.

Other than this, pretty boring stuff. About half the patches we carried in F7 are now upstream, including the big firewire rewrite. Hopefully by F8 we can stop carrying the wireless stuff too.

With the schedule for F8 just 5 months, it's uncertain just how far forward we'll go. With each upstream kernel release taking 2-3 months, and .22 coming 'soon', it sounds like we'll be lucky to get .23 into F8, and beaten into shape by release, especially with conference season just beginning to ramp up, and various developers disappearing.

Some lessons were learned from the F7 release process though.
We did a last minute rebase to the latest upstream -stable, which brought in the "Dell laptop won't boot without maxcpus=1" bug at the last minute, when it was too late to fix. Next time around, we're not going to do this.
Given we're nearly always doing a kernel update within a week after release, the only patches going into the kernel during the final week (or maybe two weeks), will be individual backports of patches that have been upstream for a few days with no regressions, that fall into a critical criteria.
Where critical means:

  • Fixes a "machine won't boot" bug.
  • Fixes a "ethernet doesn't work" bug (thus preventing updates being downloaded)
  • Fixes a data corruption bug.
  • Fixes a _critical_ security bug.
    (Lower impact things like information leakage can wait until update 1 on the day of release).


There are possibly some other criteria I'm missing, this will get
fleshed out some more on fedora-kernel-list.
Tags: ,
 
 
Current Music: Hybrid - Until Tomorrow (feat. John Graham)
 
 
kernelslacker
07 June 2007 @ 02:41 pm
People keep asking me when the Fedora 7 kernel update is going out.
The answer 'soon' (maybe as soon as tomorrow).

There's still a slew of things that need fixing. For the first round of updates, the focus is almost entirely on fixing the critical bugs like "It crashes during boot". Looking at the first round of bugs that came in during the first week after F7's release, it's pretty horrific. The vast majority seem to be from libata (something of a mix of SATA and PATA bugs).
This was in part expected, as the switchover from ye-olde-IDE to libata pata was something of a gutsy decision to make. It wasn't expected that it'd be this ad.

So, for those asking "will this update fix my suspend/resume" "will my ipw3945 work better?" etc, the answer is "not yet".

The overall quality of 2.6.21 is pretty horrific. It saw the introduction of a lot of new code fundamental to the operation of the kernel (the tickless stuff for eg), massive updates to areas such as ACPI, and just to mix things up, we switched from a known-crap-but-tried-and-tested IDE system to a-bleeding-edge-but-hopefully-with-signs-of-promise libata based system. Lots of changes == lots of fallout the first time it goes into a production OS.

It's likely that we'll skip a 2.6.21 update for FC6, and do an update straight to 2.6.22 for both FC6 and F7 when that gets released.

So, doom and gloom, and business as usual on the kernel front.
On the plus side, I get to go see Skinny Puppy tonight. \o/
Tags: ,
 
 
Current Music: Hybrid - I Choose Noise
 
 
kernelslacker
20 March 2007 @ 05:38 pm
Now that the Fedora kernel team is more than one person, it's taken a while to train people to add Chuck to the Cc: of fedora kernel related mails. To not have to go through this again, today I set up Fedora-kernel-list.

Hopefully it'll be useful for other purposes too, and maybe even attract some new lurk^Wcontributors.
Tags: ,