Discussion:
Remedies for suboptimal mmap performance on zfs
Iwan Aucamp
2012-05-28 19:06:46 UTC
Permalink
I'm getting sub-optimal performance with an mmap based database
(mongodb) which is running on zfs of Solaris 10u9.

System is Sun-Fire X4270-M2 with 2xX5680 and 72GB (6 * 8GB + 6 * 4GB)
ram (installed so it runs at 1333MHz) and 2 * 300GB 15K RPM disks

- a few mongodb instances are running with with moderate IO and total
rss of 50 GB
- a service which logs quite excessively (5GB every 20 mins) is also
running (max 2GB ram use) - log files are compressed after some time to
bzip2.

Database performance is quite horrid though - it seems that zfs does not
know how to manage allocation between page cache and arc cache - and it
seems arc cache wins most of the time.

I'm thinking of doing the following:
- relocating mmaped (mongo) data to a zfs filesystem with only
metadata cache
- reducing zfs arc cache to 16 GB

Is there any other recommendations - and is above likely to improve
performance.

--
Iwan Aucamp
Andrew Gabriel
2012-05-28 19:39:21 UTC
Permalink
Post by Iwan Aucamp
I'm getting sub-optimal performance with an mmap based database
(mongodb) which is running on zfs of Solaris 10u9.
System is Sun-Fire X4270-M2 with 2xX5680 and 72GB (6 * 8GB + 6 * 4GB)
ram (installed so it runs at 1333MHz) and 2 * 300GB 15K RPM disks
- a few mongodb instances are running with with moderate IO and total
rss of 50 GB
- a service which logs quite excessively (5GB every 20 mins) is also
running (max 2GB ram use) - log files are compressed after some time
to bzip2.
Database performance is quite horrid though - it seems that zfs does
not know how to manage allocation between page cache and arc cache -
and it seems arc cache wins most of the time.
- relocating mmaped (mongo) data to a zfs filesystem with only
metadata cache
- reducing zfs arc cache to 16 GB
Is there any other recommendations - and is above likely to improve
performance.
1. Upgrade to S10 Update 10 - this has various performance improvements,
in particular related to database type loads (but I don't know anything
about mongodb).

2. Reduce the ARC size so RSS + ARC + other memory users < RAM size.
I assume the RSS include's whatever caching the database does. In
theory, a database should be able to work out what's worth caching
better than any filesystem can guess from underneath it, so you want to
configure more memory in the DB's cache than in the ARC. (The default
ARC tuning is unsuitable for a database server.)

3. If the database has some concept of blocksize or recordsize that it
uses to perform i/o, make sure the filesystems it is using configured to
be the same recordsize. The ZFS default recordsize (128kB) is usually
much bigger than database blocksizes. This is probably going to have
less impact with an mmaped database than a read(2)/write(2) database,
where it may prove better to match the filesystem's record size to the
system's page size (4kB, unless it's using some type of large pages). I
haven't tried playing with recordsize for memory mapped i/o, so I'm
speculating here.

Blocksize or recordsize may apply to the log file writer too, and it may
be that this needs a different recordsize and therefore has to be in a
different filesystem. If it uses write(2) or some variant rather than
mmap(2) and doesn't document this in detail, Dtrace is your friend.

4. Keep plenty of free space in the zpool if you want good database
performance. If you're more than 60% full (S10U9) or 80% full (S10U10),
that could be a factor.

Anyway, there are a few things to think about.
--
Andrew
Lionel Cons
2012-05-28 19:46:00 UTC
Permalink
I'm getting sub-optimal performance with an mmap based database (mongodb)
which is running on zfs of Solaris 10u9.
System is Sun-Fire X4270-M2 with 2xX5680 and 72GB (6 * 8GB + 6 * 4GB) ram
(installed so it runs at 1333MHz) and 2 * 300GB 15K RPM disks
- a few mongodb instances are running with with moderate IO and total rss
of 50 GB
- a service which logs quite excessively (5GB every 20 mins) is also
running (max 2GB ram use) - log files are compressed after some time to
bzip2.
Database performance is quite horrid though - it seems that zfs does not
know how to manage allocation between page cache and arc cache - and it
seems arc cache wins most of the time.
- relocating mmaped (mongo) data to a zfs filesystem with only metadata
cache
- reducing zfs arc cache to 16 GB
Is there any other recommendations - and is above likely to improve
performance.
The only recommendation which will lead to results is to use a
different OS or filesystem. Your choices are
- FreeBSD with ZFS
- Linux with BTRFS
- Solaris with QFS
- Solaris with UFS
- Solaris with NFSv4, use ZFS on independent fileserver machines

There's a rather mythical rewrite of the Solaris virtual memory
subsystem called VM2 in progress but it will still take a long time
until this will become available for customers and there are no real
data yet whether this will help with mmap performance. It won't be
available for Opensolaris successors like Illumos available either
(likely never, at least the Illumos leadership doesn't see the need
for this and instead recommends to rewrite the applications to not use
mmap).

Lionel
Richard Elling
2012-05-28 20:10:57 UTC
Permalink
Post by Lionel Cons
I'm getting sub-optimal performance with an mmap based database (mongodb)
which is running on zfs of Solaris 10u9.
System is Sun-Fire X4270-M2 with 2xX5680 and 72GB (6 * 8GB + 6 * 4GB) ram
(installed so it runs at 1333MHz) and 2 * 300GB 15K RPM disks
- a few mongodb instances are running with with moderate IO and total rss
of 50 GB
- a service which logs quite excessively (5GB every 20 mins) is also
running (max 2GB ram use) - log files are compressed after some time to
bzip2.
Database performance is quite horrid though - it seems that zfs does not
know how to manage allocation between page cache and arc cache - and it
seems arc cache wins most of the time.
- relocating mmaped (mongo) data to a zfs filesystem with only metadata
cache
- reducing zfs arc cache to 16 GB
Is there any other recommendations - and is above likely to improve
performance.
The only recommendation which will lead to results is to use a
different OS or filesystem. Your choices are
- FreeBSD with ZFS
- Linux with BTRFS
- Solaris with QFS
- Solaris with UFS
- Solaris with NFSv4, use ZFS on independent fileserver machines
There's a rather mythical rewrite of the Solaris virtual memory
subsystem called VM2 in progress but it will still take a long time
until this will become available for customers and there are no real
data yet whether this will help with mmap performance. It won't be
available for Opensolaris successors like Illumos available either
(likely never, at least the Illumos leadership doesn't see the need
for this and instead recommends to rewrite the applications to not use
mmap).
This is a mischaracterization of the statements given. The illumos team
says they will not implement Oracle's VM2 for valid, legal reasons.
That does not mean that mmap performance improvements for ZFS
cannot be implemented via other methods.

The primary concern for mmap files is that the RAM footprint is doubled.
If you do not manage this via limits, there can be a fight between the
page cache and ARC over a constrained RAM resource.
-- richard

--
ZFS Performance and Training
***@RichardElling.com
+1-760-896-4422
Lionel Cons
2012-05-28 21:18:58 UTC
Permalink
Post by Lionel Cons
The only recommendation which will lead to results is to use a
different OS or filesystem. Your choices are
- FreeBSD with ZFS
- Linux with BTRFS
- Solaris with QFS
- Solaris with UFS
- Solaris with NFSv4, use ZFS on independent fileserver machines
There's a rather mythical rewrite of the Solaris virtual memory
subsystem called VM2 in progress but it will still take a long time
until this will become available for customers and there are no real
data yet whether this will help with mmap performance. It won't be
available for Opensolaris successors like Illumos available either
(likely never, at least the Illumos leadership doesn't see the need
for this and instead recommends to rewrite the applications to not use
mmap).
This is a mischaracterization of the statements given. The illumos team
says they will not implement Oracle's VM2 for valid, legal reasons.
That does not mean that mmap performance improvements for ZFS
cannot be implemented via other methods.
I'd like to hear what the other methods should be. The lack of mmap
performance is only a symptom of a more severe disease. Just doing
piecework and alter the VFS API to integrate ZFS/ARC/VM with each
other doesn't fix the underlying problems.

I've assigned two of my staff, one familiar with the FreeBSD VM and
one familiar with the Linux VM, to look at the current VM subsystem
and their preliminary reports point to disaster. If Illumos does not
initiate a VM rewrite project of it's own which will make the VM aware
of NUMA, power management and other issues then I predict nothing less
than the downfall of Illumos within a couple of years because the
performance impact is dramatic and makes the Illumos kernel no longer
competitive.
Despite these findings, of which Sun was aware for a long time, and
the number of ex-Sun employees working on Illumos, I miss the
commitment to launch such a project. That's why I said "likely never",
unless of course someone slams Garrett's head with sufficient force on
a wooden table to make him see the reality.

The reality is:
- The modern x86 server platforms are now all NUMA or NUMA-like. Lack
of NUMA support leads to bad performance
- They all use some kind of serialized link between CPU nodes, let it
be Hypertransport or Quickpath, with power management. If power
management is active and has reduced the number of active links
between nodes and the OS doesn't manage this correctly you'll get bad
performance. Illumo's VM isn't even remotely aware of this fact
- Based on simulator testing we see that in a simulated environment
with 8 sockets almost 40% of kernel memory accesses are _REMOTE_
accesses, i.e. it's not local to the node accessing it
That are all preliminary results, I expect that the remainder of the
analysis will take another 4-5 weeks until we present the findings to
the Illumos community. But I can say already it will be a faceslap for
those who think that Illumos doesn't need a better VM system.
Post by Lionel Cons
The primary concern for mmap files is that the RAM footprint is doubled.
It's not only that RAM is doubled, the data are copied between both
ARC and page cache multiple times. You can say memory and the in
memory copy operation are cheap, but this and the lack of NUMA
awareness is a real performance killer.

Lionel
Richard Elling
2012-05-28 21:40:22 UTC
Permalink
[Apologies to the list, this has expanded past ZFS, if someone complains, we can
move the thread to another illumos dev list]
Post by Lionel Cons
Post by Lionel Cons
The only recommendation which will lead to results is to use a
different OS or filesystem. Your choices are
- FreeBSD with ZFS
- Linux with BTRFS
- Solaris with QFS
- Solaris with UFS
- Solaris with NFSv4, use ZFS on independent fileserver machines
There's a rather mythical rewrite of the Solaris virtual memory
subsystem called VM2 in progress but it will still take a long time
until this will become available for customers and there are no real
data yet whether this will help with mmap performance. It won't be
available for Opensolaris successors like Illumos available either
(likely never, at least the Illumos leadership doesn't see the need
for this and instead recommends to rewrite the applications to not use
mmap).
This is a mischaracterization of the statements given. The illumos team
says they will not implement Oracle's VM2 for valid, legal reasons.
That does not mean that mmap performance improvements for ZFS
cannot be implemented via other methods.
I'd like to hear what the other methods should be. The lack of mmap
performance is only a symptom of a more severe disease. Just doing
piecework and alter the VFS API to integrate ZFS/ARC/VM with each
other doesn't fix the underlying problems.
I've assigned two of my staff, one familiar with the FreeBSD VM and
one familiar with the Linux VM, to look at the current VM subsystem
and their preliminary reports point to disaster. If Illumos does not
initiate a VM rewrite project of it's own which will make the VM aware
of NUMA, power management and other issues then I predict nothing less
than the downfall of Illumos within a couple of years because the
performance impact is dramatic and makes the Illumos kernel no longer
competitive.
Despite these findings, of which Sun was aware for a long time, and
the number of ex-Sun employees working on Illumos, I miss the
commitment to launch such a project. That's why I said "likely never",
unless of course someone slams Garrett's head with sufficient force on
a wooden table to make him see the reality.
- The modern x86 server platforms are now all NUMA or NUMA-like. Lack
of NUMA support leads to bad performance
SPARC has been NUMA since 1997 and Solaris changed the scheduler
long ago.
Post by Lionel Cons
- They all use some kind of serialized link between CPU nodes, let it
be Hypertransport or Quickpath, with power management. If power
management is active and has reduced the number of active links
between nodes and the OS doesn't manage this correctly you'll get bad
performance. Illumo's VM isn't even remotely aware of this fact
- Based on simulator testing we see that in a simulated environment
with 8 sockets almost 40% of kernel memory accesses are _REMOTE_
accesses, i.e. it's not local to the node accessing it
That are all preliminary results, I expect that the remainder of the
analysis will take another 4-5 weeks until we present the findings to
the Illumos community. But I can say already it will be a faceslap for
those who think that Illumos doesn't need a better VM system.
Nobody said illumos doesn't need a better VM system. The statement was that
illumos is not going to reverse-engineer Oracle's VM2.
Post by Lionel Cons
Post by Lionel Cons
The primary concern for mmap files is that the RAM footprint is doubled.
It's not only that RAM is doubled, the data are copied between both
ARC and page cache multiple times. You can say memory and the in
memory copy operation are cheap, but this and the lack of NUMA
awareness is a real performance killer.
Anybody who has worked on a SPARC system for the past 15 years is well
aware of NUMAness. We've been living in a NUMA world for a very long time,
a world where the processors were slow and far memory latency is much, much
worse than we see in the x86 world.

I look forward to seeing the results of your analysis and experiments.
-- richard

--
ZFS Performance and Training
***@RichardElling.com
+1-760-896-4422
Iwan Aucamp
2012-05-28 20:25:32 UTC
Permalink
Post by Andrew Gabriel
Post by Iwan Aucamp
- relocating mmaped (mongo) data to a zfs filesystem with only
metadata cache
- reducing zfs arc cache to 16 GB
Is there any other recommendations - and is above likely to improve
performance.
1. Upgrade to S10 Update 10 - this has various performance improvements,
in particular related to database type loads (but I don't know anything
about mongodb).
2. Reduce the ARC size so RSS + ARC + other memory users< RAM size.
I assume the RSS include's whatever caching the database does. In
theory, a database should be able to work out what's worth caching
better than any filesystem can guess from underneath it, so you want to
configure more memory in the DB's cache than in the ARC. (The default
ARC tuning is unsuitable for a database server.)
3. If the database has some concept of blocksize or recordsize that it
uses to perform i/o, make sure the filesystems it is using configured to
be the same recordsize. The ZFS default recordsize (128kB) is usually
much bigger than database blocksizes. This is probably going to have
less impact with an mmaped database than a read(2)/write(2) database,
where it may prove better to match the filesystem's record size to the
system's page size (4kB, unless it's using some type of large pages). I
haven't tried playing with recordsize for memory mapped i/o, so I'm
speculating here.
Blocksize or recordsize may apply to the log file writer too, and it may
be that this needs a different recordsize and therefore has to be in a
different filesystem. If it uses write(2) or some variant rather than
mmap(2) and doesn't document this in detail, Dtrace is your friend.
4. Keep plenty of free space in the zpool if you want good database
performance. If you're more than 60% full (S10U9) or 80% full (S10U10),
that could be a factor.
Anyway, there are a few things to think about.
Thanks for the Feedback, I cannot really do 1, but will look into points
3 and 4 - in addition to 2 - which is what I desire to achieve with my
second point - but I would still like to know if it is recommended to
only do metadata caching for mmaped files (mongodb data files) - the way
I see it this should get rid of the double caching which is being done
for mmaped files.
Richard Elling
2012-05-28 20:34:18 UTC
Permalink
question below...
Post by Andrew Gabriel
Post by Iwan Aucamp
- relocating mmaped (mongo) data to a zfs filesystem with only
metadata cache
- reducing zfs arc cache to 16 GB
Is there any other recommendations - and is above likely to improve
performance.
1. Upgrade to S10 Update 10 - this has various performance improvements,
in particular related to database type loads (but I don't know anything
about mongodb).
2. Reduce the ARC size so RSS + ARC + other memory users< RAM size.
I assume the RSS include's whatever caching the database does. In
theory, a database should be able to work out what's worth caching
better than any filesystem can guess from underneath it, so you want to
configure more memory in the DB's cache than in the ARC. (The default
ARC tuning is unsuitable for a database server.)
3. If the database has some concept of blocksize or recordsize that it
uses to perform i/o, make sure the filesystems it is using configured to
be the same recordsize. The ZFS default recordsize (128kB) is usually
much bigger than database blocksizes. This is probably going to have
less impact with an mmaped database than a read(2)/write(2) database,
where it may prove better to match the filesystem's record size to the
system's page size (4kB, unless it's using some type of large pages). I
haven't tried playing with recordsize for memory mapped i/o, so I'm
speculating here.
Blocksize or recordsize may apply to the log file writer too, and it may
be that this needs a different recordsize and therefore has to be in a
different filesystem. If it uses write(2) or some variant rather than
mmap(2) and doesn't document this in detail, Dtrace is your friend.
4. Keep plenty of free space in the zpool if you want good database
performance. If you're more than 60% full (S10U9) or 80% full (S10U10),
that could be a factor.
Anyway, there are a few things to think about.
Thanks for the Feedback, I cannot really do 1, but will look into points 3 and 4 - in addition to 2 - which is what I desire to achieve with my second point - but I would still like to know if it is recommended to only do metadata caching for mmaped files (mongodb data files) - the way I see it this should get rid of the double caching which is being done for mmaped files.
I'd be interested in the results of such tests. You can change the primarycache
parameter on the fly, so you could test it in less time than it takes for me to type
this email :-)
-- richard

--
ZFS Performance and Training
***@RichardElling.com
+1-760-896-4422
Jim Klimov
2012-05-28 22:10:09 UTC
Permalink
Post by Richard Elling
I'd be interested in the results of such tests. You can change the primarycache
parameter on the fly, so you could test it in less time than it takes for me to type
this email :-)
I believe it would also take some time for memory distribution
to settle, expiring ARC data pages and actually claiming the
RAM for the application... Right? ;)

//Jim
Daniel Carosone
2012-05-29 01:29:47 UTC
Permalink
Post by Richard Elling
I'd be interested in the results of such tests.
Me too, especially for databases like postgresql where there's a
complementary cache size tunable within the db that often needs to be
turned up, since they implicitly rely on some filesystem caching as a L2.

That's where this gets tricky: L2ARC has the opportunity to make a big
difference, where the entire db won't all fit in memory (regardless of
which subsystem has jurisdiction over that memory). If you exclude
data from ARC, you can't spill it to L2ARC.

For the mmap case: does the ARC keep a separate copy, or does the vm
system map the same page into the process's address space? If a
separate copy is made, that seems like a potential source of many
kinds of problems - if it's the same page then the whole premise is
essentially moot and there's no "double caching".

--
Dan.
Iwan Aucamp
2012-05-29 19:42:21 UTC
Permalink
Post by Daniel Carosone
For the mmap case: does the ARC keep a separate copy, or does the vm
system map the same page into the process's address space? If a
separate copy is made, that seems like a potential source of many
kinds of problems - if it's the same page then the whole premise is
essentially moot and there's no "double caching".
As far as I understand, for mmap case, is that the page cache is
distinct from ARC (i.e. normal simplified flow for reading from disk
with mmap is DSK->ARC->PageCache) - and only page cache gets mapped into
processes address space - which is what results in the double caching.

I have two other general questions regarding page cache with ZFS + Solaris:
- Does anything else except mmap still use the page cache ?
- Is there a parameter similar to /proc/sys/vm/swappiness that can
control how long unused pages in page cache stay in physical ram if
there is no shortage of physical ram ? And if not how long will unused
pages stay in page cache stay in physical ram given there is no shortage
of physical ram ?
Bob Friesenhahn
2012-05-31 01:55:40 UTC
Permalink
 - Is there a  parameter similar to /proc/sys/vm/swappiness that can control how long unused pages in page cache stay in physical ram
if there is no shortage of physical ram ? And if not how long will unused pages stay in page cache stay in physical ram given there
is no shortage of physical ram ?
Absent pressure for memory, no longer referenced pages will stay in
memory forever. They can then be re-referenced in memory.

Bob
--
Bob Friesenhahn
***@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Jeff Bacon
2012-06-01 12:27:27 UTC
Permalink
Post by Iwan Aucamp
I'm getting sub-optimal performance with an mmap based database
(mongodb) which is running on zfs of Solaris 10u9.
System is Sun-Fire X4270-M2 with 2xX5680 and 72GB (6 * 8GB + 6 *
4GB)
ram (installed so it runs at 1333MHz) and 2 * 300GB 15K RPM disks
- a few mongodb instances are running with with moderate IO and total
rss of 50 GB
- a service which logs quite excessively (5GB every 20 mins) is also
running (max 2GB ram use) - log files are compressed after some
time to bzip2.
Database performance is quite horrid though - it seems that zfs does not
know how to manage allocation between page cache and arc cache - and it
seems arc cache wins most of the time.
Or to be more accurate, there is no coordination that I am aware of between the VM page cache and the ARC. Which, for all the glories of ZFS, strikes me as a *doh*face-in-palm* how-did-we-miss-this sorta thing. One of these days I need to ask Jeff and Bill what they were thinking.

We went through this 9 months ago - we wrote MongoDB, which attempted to mmap() whole database files for the purpose of skimming back and forth through them quickly (think column-oriented database). Performance, um, sucked.

There is a practical limit to the amount of RAM you can shove into a machine - and said RAM gets slower as you have to go to quad-rank DIMMs, which Nehalem can't run at full speed - for the sort of box you speak of, your top end of 1333Mhz is 96G, last I checked. (We're at 192G in most cases.) So while yes copying the data around between VM and ARC is doable, in large quantities that are invariably going to blow the CPU L3, this may not be the most practical answer.

It didn't help of course that
a) said DB was implemented in Java - _please_ don't ask - which is hardly a poster child for implementing any form of mmap(), not to mention spins a ton of threads
b) said machine _started_ with 72 2TB Constellations and a pack of Cheetahs arranged in 7 pools, resulting in ~700 additional kernel threads roaming around, all of which got woken up on any heavy disk access (yes they could have all been in one pool - and yes there is a specific reason for not doing so)

but and still.

We managed to break ZFS as a result. There are a couple of cases filed. One is semi-patched, the other we're told simply can't be fixed in Solaris 10. Fortunately we understand the conditions that create the breakage, and work around it by Just Not Doing That(tm). In your configuration, I can almost guarantee you will not run into them.
Post by Iwan Aucamp
- relocating mmaped (mongo) data to a zfs filesystem with only
metadata cache
- reducing zfs arc cache to 16 GB
Is there any other recommendations - and is above likely to improve
performance.
Well... we ended up
(a) rewriting MongoDB to use in-process "buffer workspaces" and read()/write() to fill/dump the buffers to disk (essentially, giving up on mmap())
(b) moving most of the workload to CentOS and using the Solaris boxes as big fast NFSv3 fileservers (NFSv4 didn't work out so well for us) over 10G, because for most workloads it runs 5-8% faster on CentOS than Solaris, and we're primarily a CentOS shop anyway so it was just easier for everyone to deal with - but this has little to do with mmap() difficulties

Given what I know of the Solaris VM, VFS and of ZFS as implemented - admittedly incomplete, and my VM knowledge is based mostly on SVR4 - it would seem to me that it is going to take some Really Creative Thinking to work around the mmap() problem - a tweak or two ain't gonna cut it.

-bacon
Jeff Bacon
2012-06-01 12:33:29 UTC
Permalink
Post by Richard Elling
I'd be interested in the results of such tests. You can change the primarycache
parameter on the fly, so you could test it in less time than it
takes for me to type this email :-)
-- Richard
Tried that. Performance headed south like a cat with its tail on fire. We didn't bother quantifying, it was just that hideous.

(You know, us northern-hemisphere people always use "south" as a "down" direction. Is it different for people in the southern hemisphere? :) )

There's just too many _other_ little things running around a normal system for which NOT having primarycache is just too painful to contemplate (even with L2ARC) that, while I can envisage situations where one might want to do that, they're very very few and far between.

-bacon
Iwan Aucamp
2012-06-01 13:41:00 UTC
Permalink
Post by Jeff Bacon
Post by Richard Elling
I'd be interested in the results of such tests. You can change the primarycache
parameter on the fly, so you could test it in less time than it
takes for me to type this email :-)
-- Richard
Tried that. Performance headed south like a cat with its tail on fire. We didn't bother quantifying, it was just that hideous.
(You know, us northern-hemisphere people always use "south" as a "down" direction. Is it different for people in the southern hemisphere? :) )
There's just too many _other_ little things running around a normal system for which NOT having primarycache is just too painful to contemplate (even with L2ARC) that, while I can envisage situations where one might want to do that, they're very very few and far between.
Thanks for the valuable feedback Jeff, though I think you might
misunderstand - the idea is to make a zfs filesystem just for the files
being mmaped by mongo - the idea is to only disable ARC where there is
double caching involved (i.e. for mmaped files) - leaving rest of the
system with ARC and taking ARC out of the picture with MongoDB.

Jeff Bacon
2012-06-01 12:36:58 UTC
Permalink
Post by Richard Elling
Anybody who has worked on a SPARC system for the past 15 years is well
aware of NUMAness. We've been living in a NUMA world for a very long time,
a world where the processors were slow and far memory latency is much, much
worse than we see in the x86 world.
I look forward to seeing the results of your analysis and
experiments.
-- Richard
like, um, seconded. Please.

I'm very curious to learn of a "VM2" effort. (Sadly, I spend more time nowadays with my nose stuck into Cisco kit than into Solaris - well, not sadly, they're both interesting - but I'm out of touch with much of what's going on in Solaris world anymore.) It makes sense though. And perhaps it's well overdue. The basic notions of the VM subsys haven't changed in what, 15 years? Ain't-broke-don't-fix sure but ...

-bacon
Loading...