I have enjoyed my time here at geekswithblogs (even the green monster) but I will be moving my blog to codebetter.com. Topics the same, URL different.
This blog will be moving to http://codebetter.com/blogs/gregyoung
Some you have probably seen a post from last Tuesday entitled Floating Point Fun. If you have not read this I would recommend going back and reading it before continuing. In this post I discuss some of the interesting things that can happen when dealing with floating point math in C#, it is important to note that these items did not happen in version 1.x of the framework.
The root of these problems is that when in a register the floating point is treated with a different precision than when it is being held in memory. As such you can run into cases where you are comparing a Float32 or a Float64 against an 80 bit register based float. These equality comparisons (or conversions to other types such as an integer) can obviously fail due to the difference in precision.
After tracing through the generated assembly, I found a great reference on the subject at David Notario's Blog. David correctly points out that this is not a CLR/JIT issue, in fact changes like this were eluded to in the CLR spec (there is a quote from the ECMA spec on his blog) or here http://dotnet.di.unipi.it/EcmaSpec/PartitionI/cont11.html#_Toc527182172
There was some documentation on this breaking change in 2.0. Here is the listing from the breaking changes documentation
In the CLR model, we assert that arguments, locals, and return values (things which you can't tell their size) can be expanded at will. When you say float, it means anything greater than or equal to a float. So we can sometimes give you what you asked for 'or better'. When we do this, we can spill 'extra' data, almost like a 'you only asked for 15 precision points, but congratulations! We gave you 18!'. If someone expected the floating point precision to always remain the exactly the same, they could be affected. In order to faciliate performance improvements and better scenarios, the CLR may rewrite (as in this case) parts of the register. For example, things that used to truncate because of spilling, no longer do. We make these kinds of changes all the time. We believe this is an appropriate change, and it is even called out specifically in the CLI specification, as something which can, and will occur with different iterations:
What makes these changes particularly nasty is that you are forced to second guess how the JIT works in order to provide consistent results. In my previous post I used the example of
|
float f = 97.09f; f = (f * 100f); int tmp = (int)f; Console.WriteLine(tmp); |
This code will work in either debug or release mode when a debugger is attached, having the debugger attached will disable the JIT optimizations that cause the problem. It does as I describe in the previous post fail when run without the debugger. If we wanted it to work all of the time we would need to write it in the form.
|
float f = 97.09f; f = (float)(f * 100f); int tmp = (int)f; Console.WriteLine(tmp); |
The explicit cast to a float forces it to be narrowed back to a float32, without the narrowing it will actually be in a register as an 80 bit float. As such we end with a predictable behavior of always producing the correct result of 9709.
The problem I have with this behavior is that it is a leaky abstraction. In order to have our code work properly (and to be efficient) we need to know exactly how the compiler and the JIT intends to optimize our code. This introduces a logical problem though as by its very definition we do not know how the JIT will optimize our code. The JIT very well could place this into a register at some times and not at others or the JIT run on a different platform could offer a different behavior than the JIT we tested with.
This becomes especially nasty when dealing constants, consider the following code.
|
float f = 97.09f; f = (f * 100f); bool test = f == 97.09f * 100f; Console.WriteLine(test); |
What is the value of test? The abstraction leaks for both the compiler and the JIT. To start with, are the floats actually being calculated at runtime or is the compiler smart enough to realize that they are constants? In this particular case the C# compiler generates instructions for the first floating point operation but recognizes that the second is a constant value and as such pre-computes the value. These types of scenarios are exactly the type of thing that compilers look for when optimizing.
If the compiler did not recognize the constant expression this might work as both of the calculations would have been done with their result being saved in a register, at that point we would actually have to look at how the JIT handled this case. Both of these items may change based upon environment.
The CLR has basically left the choice to the language as to how it wants to handle these cases. Visual C++ has handled this by providing compiler switches. The link is also interesting as it deals with how the switches apply as well to optimizations that occur within the compiler that can cause further issues. C# does not have many such optimizations at this point but it is only a matter of time before they get introduced.
I would therefore propose that C# should be given switches as well (similar to those available for C++) which could allow for the automatic narrowing of floating point values.
It is often brought up that C# does it the way it does it for performance reasons; it is obviously faster to leave values in registers when possible as opposed to narrowing them. The only way consistent way of doing it is through the use of the narrowing. From all of the studies I have seen, C# is primarily used for business applications where consistency (and reduction of programmer thought) is the primary goal and quite often run-time speed is sacrificed in order to better meet these goals (think abstractions). If an argument can be made for C++ to have an option of a precise switch, I would imagine a better argument can in fact be made for C#.
Based upon this I would also propose that the default behavior of the compiler should be to support consistent operations (/fp:precise in C++).
This switch would not eliminate people from writing code that was dependent upon how the compiler/JIT treated things; it would however force the programmer to make a conscious decision by setting the switch that they were assuming the risks associated with the performance gains. VC++ by default runs with /fp:precise so I would not think it a large jump to make the C# compiler consistent.
As a note for the people I am sure will say, “don't do this .. use a precision range or round instead“. These are simply examples ... I am fine with using these solutions (in fact I normally use range checks). The problem is that code like this crops up regularly and it creates a very subtle problem (that did not exist in 1.x). That and there are times when you actually want (validly) to do an equivalence test on two floating point numbers that should have a consistent value (i.e. results of the same calculation). If these operations are to be disallowed, that is fine as well .. but lets completely disallow them and have the compiler generate an warning/error in the circumstance.
This issue is known by very few, if you agree with the concepts here I ask you to either leave a comment below or to link to the post on your blog. Hopefully getting this knowledge more into the mainstream will both reduce the number of bugs caused by this subtlety and bring more focus on it by those with the power to change it.
I have been searching for an answer to this one and I am perplexed.
http://msdn2.microsoft.com/en-us/library/system.type.getgenericarguments.aspx returns an array of the generic arguments .. what I can't figure out is if they will ever be out of order. The arguments themselves have a position on them, is it possible that I get them back out of order where I would need to re-order them ...
basically what I am doing is something similar to the following on a generic type definition
string [] Params = new string[typeArguments.Length];
for (int i=0;i {
Params[i] = typeArguments[i].Name;
}
GenericTypeParameterBuilder[] typeParams = outputType.DefineGenericParameters(Params);
Debug.Assert(typeParams.Length == typeArguments.Length);
for(int i=0;i GenericTypeParameterBuilder builder = typeParams[i];
Type OriginalType = typeArguments[i];
builder.SetGenericParameterAttributes(OriginalType.GenericParameterAttributes);
builder.SetInterfaceConstraints(OriginalType.GetGenericParameterConstraints());
}
I have to say as well that I am rather unimpressed with the bridge I am forced to put up here .. it would be alot nicer to just pass through the generic arguments I already have as opposed to creating a string [] then iterating through .. maybe I am missing something with the API?
and for those who know me well .. you probably know what I am working on .. (hint, the topic this is posted in)\
Update: it seems the documentation has been updated to reflect the return being sorted http://msdn2.microsoft.com/en-us/library/system.type.getgenericarguments.aspx
I knew there was a reason I kept Junfeng Zhang's Blog on my list (even during the slow months). I hadn’t checked the blog in a few weeks but reading it now just made my day.
There are two new items listed on the blog. The first is that someone fixed a huge security hole, I have actually run into this particular security hole. Junfeng calls it kernel object name squatting. I have never heard it called by this name but it is a pretty simple problem. Shared objects between process are shared the question is who protects them from prying eyes?
Let’s propose you have the following code (note this is a trivial example and likely has bugs in it but it should illustrate the point).
|
static void Main(string[] args) { bool Created; Mutex m = new Mutex(false, "MutexWeWillSteal2", out Created); if (Created) { Console.WriteLine("MutexCreated"); } for (int i = 0; i < 100; i++) { Thread.Sleep(200); bool havelock = false; try { havelock = m.WaitOne(5000, false); if (havelock) { Console.WriteLine("acquired lock"); Thread.Sleep(500); } else { Console.WriteLine("Unable to acquire lock"); } } finally { if (havelock) { m.ReleaseMutex(); } } } } |
As we can see this application is simply starting up, creating a mutex if it does not exist already then simply obtaining and releasing the lock. You can quite easily bring up two of these applications to notice that they are synchronizing with each other. Using a named mutex like this is extremely common in order to synchronize two processes.
The problem with mutex is not as great as some objects as I can apply an ACL to prevent people not at a certain level from accessing it. Unfortunately I still suffer from a denial of service attack from applications at my own level. One can quite easily use the debugger (or other tools) to find out the names of the objects I am using (!handle in windbg will bring this right up for me). Once I have that name I can write a bit of code such as the following.
|
static void Main(string[] args)
{ bool Created; Mutex m = new Mutex(true, "MutexWeWillSteal2", out Created); m.WaitOne(-1, false); Thread.Sleep(int.MaxValue); } |
Providing this code access the mutex before our other processes, our other processes will just fail. We have effectively made the other application unable to do anything (the basis of a denial of service attack).
What is being introduced in LH is the ability for me to make my two processes share a namespace. As such their namespace can be protected. The malicious program can start but one can avoid it from having access to the mutex http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/object_namespaces.asp explains exactly how it works but basically the processes that want to share the data define a boundary (and requirements to get into the boundary that they both share)
This function requires that you specify a boundary that defines how the objects in the namespace are to be isolated. Boundaries can include security identifiers (SID), session numbers, or any type of information defined by the application. The caller must be within the specified boundary for the create operation to succeed.
The second post I found really interesting was condition variables http://msdn.microsoft.com/library/en-us/dllproc/base/using_condition_variables.asp. Condition Variables are one of the 3 locking mechanisms (along with mutex and semaphore). I do not quite understand the excitement over it (perhaps POSIX compatibility?). I was under the possibly misinformed impression that it was roughly equivalent to Events in windows.
Basically I can in POSIX use
Pthread_signal() which alerts one thread waitingPthread_boradcast() which alerts all of my threads waiting
In Win32 I can use
PulseEvent() but there are two types of events
AutoResetEvent which will only let one thread though and ManualResetEvent which I can use to let some or all through
I am now morbidly curious on the subject; it has been way too long since I was at this level
I was poking through Jason Haley's blog today when I came across one of his interesting links (which are usually pretty interesting btw). It pointed to a post by Phillip Haack titled A Testing Mail Server for Unit Testing Email Functionality. I generally enjoy reading about other’s unit testing experiences as I often gain quite a bit of perspective (and get to see a lot of problems I may not otherwise get to see).
Basically what he has done is taken an open source SMTP server and used it for in order to allow his unit tests to check to see if an email that was sent through the .NET email libraries actually arrived at its sender. Let me say that he has come up with a very inventive way of automating this integration test. I personally think that his code is extremely useful for automated integration testing. I have to admit, I have done automated integration tests for emailing and I used a far less elegant solution.
As I was reading through the post I came across the comment.
As for the semantic arguments around whether this really constitutes an Integration Test as opposed to a Unit Test, please don’t bore me with your hang-ups. Either way, it deserves a test and what better way to test it than using something like MbUnit or NUnit.
Well, let me start by saying it is an integration test at best (I might even classify this as a user acceptance type test) and has no place being a unit test.
He did however hit the appropriate unit test on the head earlier in his post when he suggested an EmailProvider (I say EmailService but that truly is semantics) which could then be mocked for testing purposes. I would personally either create my own abstraction of an “Email” class or use the one from System.Net.Mail as a parameter instead of passing parameters but the reasons for this will have to wait for another post.
That said let’s look at the test case he created to test that email actually got delivered.
DotNetOpenMailProvider provider = new DotNetOpenMailProvider(); NameValueCollection configValue = new NameValueCollection(); configValue["smtpServer"] = "127.0.0.1"; configValue["port"] = "8081"; provider.Initialize("providerTest", configValue); TestSmtpServer receivingServer = new TestSmtpServer(); try { receivingServer.Start("127.0.0.1", 8081); provider.Send(phil@example.com, nobody@example.com, "Subject to nothing", "Mr. Watson. Come here. I need you.");} finally { receivingServer.Stop(); } // So Did It Work? Assert.AreEqual(1, receivingServer.Inbox.Count); ReceivedEmailMessage received = receivingServer.Inbox[0]; Assert.AreEqual("phil@example.com", received.ToAddress.Email); |
I can identify a few places that could cause this test to fail that would not cause our mock to fail otherwise.
1) We did not properly setup our configValue["smtpServer"] configuration
2) We did not properly setup our configValue["port"] configuration
3) Someone is already listening on our configured port
4) We did not properly start our testing server or it is failing in some way (i.e. configuration)
5) We have a firewall that prevents the communication from working between the client and our email server
6) There is no accessible network between the client and the server
7) The Ethernet elves ate the address that the mail was sent to
8) Numerous other environmental factors
Are you noticing a pattern in these items? We could have configuration failures, failures in our test code, or environmental failures; not a single case is actually testing our code beyond what a mock would do. When our test fails it only tells us that the environment that the test was run in was not appropriate for the test code and as such the test failed. I thought that unit tests were supposed to test your code?
The test results are not dependent upon the code in question; they are dependent upon the testing environment.
If we were to try to come up with an analogy for the type of unit test this was we could say it is like testing your car by turning it on. When the car turns on we can have a grand celebration that our car did in fact turn on, but what if the car does not turn on. All we have gained from this test is that the car did not turn on; we have no insight as to why the car did not turn on.
Any time we have a unit test which gives us no insight as to why a failure is occurring we need to re-think the value of the unit test.
What if the test succeeds, does that tell us our code works? No it tells us that the email client we used was compatible with our test email server and that our testing environment was setup appropriately to run the test.
Unit tests are overhead for a system. They are no different than documentation; they need to pull their weight in order to stay around. Let me go back to the quote statement saying that this does not actually deserve a test. This particular test carries with it a lot of baggage,
1) You have to have the email server library included/maintained with your project
2) You have to have a system ready to run the email server (i.e. nothing running on the port it uses)
3) you have to have an environment where you won’t have firewalls
4) You have to have TCP running (I know this is pretty common but it is still a prerequisite)
Given all of this baggage and the fact that it doesn't really test anything.. I say that this test does not deserve to be around.
What is really scaring me is how this test came into being a unit test using TDD. TDD would have stopped this test ever appearing. Where’s the red bar? The only way to make the test fail where a mock would not providing you have a working method of sending an email (which was said in that they are using System.Web.Mail) is to change the test or the environment.
I apologize for the code not being indented , I am using word 2007 and no matter what I do it does not seem to want to indent properly (I have tried copy/pasting to notepad, you name it), if you know how to get this working please contact me. See picture of word version http://geekswithblogs.net/images/geekswithblogs_net/gyoung/4418/r_word.JPG
update I manually indented all of it but I am still wanting to know how to avoid this
Everyone seems to enjoy performance posts so …. I saw this question in the advanced .net group and found it fairly interesting.
I have a comma separated list of integer values what's the quickest way to turn this into a integer array.
Being the geek that I am I set off immediately to find out just how quick I could make an algorithm fly.
That said let’s start out with the most obvious answer, we will split the string and call int.Parse() on the string elements. We can use this code to benchmark our other results against.
|
static int[] Split(string _Numbers) { string[] pieces = _Numbers.Split(','); int[] ret = new int[pieces.Length]; for (int i = 0; i < pieces.Length; i++) { ret[i] = int.Parse(pieces[i]); } return ret; } |
Next we will need some data to feed into the routine. I chose "1,2,3,4,5,6,7,8,9,10,121,1000,10000" as my testing data. Running through this data 1,000,000 times takes a total of 4.39 seconds on my system (in release mode). I think that we can do better than this!!
Using an old trick I figured that I would just make the string a char [] and iterate through it taking my current number and subtracting its char code from ‘0’ (it just so happens that ‘9’ – ‘0’ = 9 how convenient! Applying this methodology leaves us with the following code.
|
static int[] Iterate(string _Numbers, ref int _Count) { int[] buffer = new int[4]; char[] chars = _Numbers.ToCharArray(); _Count = 0; int holder = 0; for (int i = 0; i < chars.Length; i++) { if (chars[i] == ',') { buffer[_Count] = holder; holder = 0; _Count++; if (_Count == buffer.Length) { int[] tmp = buffer; buffer = new int[tmp.Length * 2]; Buffer.BlockCopy(tmp, 0, buffer, 0, tmp.Length * 4); } } else { holder = holder * 10 + chars[i] - '0'; } } return buffer; } |
This code also has some oddities involved with it since it does not know initially the size of the int [] that it needs to pass back. In order to support this, it grows it’s int [] as it needs to (by doubling). This can be an expensive operation so avoiding it is best. Also since it is doubling its array, it has a new parameter Count which it uses to return the total number of elements in the array it returns (it may return a 32 element array that only uses 18 elements).
As for performance, the exact code above will handle the same data as our first test in <> .5 seconds on my machine with a buffer size of 4. Not bad 10% of our first try! To show the importance of the buffer though, if we make the initial size 16 the code finishes in .4 seconds!
There are still some areas we can optimize though. Keep in mind that this code is creating 1,000,000 char []. This is a pretty expensive operation, by using unsafe code we can avoid doing this. Here is the code
|
static unsafe int[] Unsafe1(string _Numbers, ref int _Count) { int[] buffer = new int[64]; _Count = 0; int total = _Numbers.Length; int holder = 0; fixed (char* a = _Numbers) { char* c = a; while (total > 0) { if (*c == ',') { buffer[_Count] = holder; holder = 0; _Count++; if (_Count == buffer.Length) { int[] tmp = buffer; buffer = new int[tmp.Length * 2]; Buffer.BlockCopy(tmp, 0, buffer, 0, tmp.Length * 4); } } else { holder = holder * 10 + *c - '0'; } c++; total--; } } return buffer; } |
Again we use the same buffering mechanism that we used on the last entry. The main difference here is that instead of creating a char [] we use unsafe code to iterate through the string. With a buffer size of 16, this code runs through the 1,000,000 iterations in .37.
In running these, many of you may notice that for small bits of data that the iterate function may actually run faster than the unsafe code. This can be the case but as you add more data, the gap will grow larger in the favor of the unsafe code (it also uses less memory as it does not duplicate the memory from the original string).
Use the framework Luke!
We have had one problem thus far with our code, its ugly. Having to get back an array then use a separate counter from the length of the array to loop is .. well ugly. As an example consider the following.
|
int foo = 0; int [] values = Iterate(numbers, ref foo);
for (int i = 0; i < foo; i++) {
}
//CANT DO
for (int i = 0; i < values.Length; i++) {
} |
Personally I prefer the second way of doing this was it makes more sense to programmers. Luckily generics in the framework can do just this for us. By changing our return type from an int [] to a generic list we can keep most of the speed and offer a better interface.
|
static unsafe List<Int32> UseGeneric(string _Numbers) { List<int> ret = new List<int>(64); int total = _Numbers.Length; int tmp = 0;
fixed (char* a = _Numbers) { char* c = a; while (total > 0) { if (*c == ',') { ret.Add(tmp); tmp = 0; } else { tmp = tmp * 10 + *c - '0'; } total--; c++; } } return ret; } |
This is a much nicer interface than what we had before and is still very performant. 1,000,000 iterations with this takes approximately the same time as out unsafe example! The reason this works about the same as our first example, is because it does pretty much the same thing internally with it’s buffer as what we were implementing on our own (of course, it does a much nicer job encapsulating it than we were so this is probably the best overall solution). Basically as you add items, it also doubles it's internal array size then copies over the old data in order to allow you to continue adding data.
Another solution which was offered up in the discussion by Ernst Kuschke is syntactical sugar (although not very performant) so I figured it was definitely worth including simply for its elegance (and it should probably be a good general method for doing this. Sugar like this is always more maintainable, converting this to support doubles or another type would be a lot easier than our other examples.
|
public static int[] doTheThing(string commadelimitedInts) { return Array.ConvertAll<string, int>(commadelimitedInts.Split(new string[] { "," }, StringSplitOptions.RemoveEmptyEntries), new Converter<string, int>(intToString)); }
public static int intToString(string strInt) { int i; int.TryParse(strInt, out i); return i; } |
Ernst takes advantage of the ConvertAll method to produce a very short bit of code that does the task. It has performance characteristics similar to our baseline as well (while being significantly shorter). If performance is not you main goal (it is quite rare that we are doing millions of theses transforms in an application), this is definitely the way to go as it is quite maintainable.
Update: Ernst's friend Piers came up with an even more elegant method of calling this
|
public static int[] doTheThing(string commadelimitedInts) { return Array.ConvertAll<string, int>(commadelimitedInts.Split(new string[] { "," }, StringSplitOptions.RemoveEmptyEntries), new Converter<string, int>(System.Convert.ToInt32)); } |
The elegance of this solution is definitely apparent!
Now to put our various solutions to the real test I have made a big string [1..1000] that we will run through 1,000,000 times. All algorithms that use array growing will be seeded with an initial buffer size of 64 to make things fair between them while also forcing their weakness of having to grow the buffer a few times.
|
Algorithm |
Execution Time |
|
Split (Base Line) |
00:05:42:703 |
|
Syntactic Sugar with Delegate |
00:06:36:703 |
|
Iteration |
00:00:22.312 |
|
Unsafe |
00:00:16.90625 |
|
Unsafe with generic return |
00:00:18.937 |
Analysis
As you can see performance for our base line (simple split) and the delegate method completely degrade under larger values as one would expect. As I mentioned earlier, the unsafe methods squeaked ahead of the iteration due to not having to create the char array. Of course, the real winner here is the generic return coming in second who makes our code readable, and extremely fast.
Can you come up with an algorithm for this problem? Post it here!
I am sure by now that most know how floating point approximations work on a computer.. They can be quite interesting. This has to to be the weirdest experience I have ever had with them though
Open a new console application in .NET 2.0 (set to build in release mode /debug:pdbonly should be the default) it is important for me to note that all of this code runs fine in 1.x.
Paste the following code into your main function
|
float f = 97.09f; int tmp = (int) (f * 100.0f); Console.WriteLine(tmp); |
Output: 9708
Interesting eh? It gets more interesting!
|
float f = 97.09f; float tmp = f * 100.0f; Console.WriteLine(tmp); |
Output: 9709
This is very interesting when taken in context with the operation above. Let’s stop for a minute and think about what we said should happen. We told it to take f and multiply It by 100.0 storing the intermediate result as a floating point, and to then take that floating point and convert it to an integer. When we run the second example, we can see that if we do the operation as a floating point, it comes out correctly. So where is the disconnect?
Let’s try to explicitly tell the compiler what we want to do.
|
float f = 97.09f; f = (f * 100f); int tmp = (int)f; Console.WriteLine(tmp); |
Output: 9709 (with a debugger attached, 9708 without in release mode!!) DEBUG:PDBONLY (even with no debug information through advanced settings)
Wow this has become REALLY interesting. What on earth happened here?
Let’s look at some IL to get a better idea of what’s going on here.
|
.locals init ( [0] float32 single1, [1] float32 single2) L_0000: ldc.r4 97.09 L_0005: stloc.0 L_0006: ldloc.0 L_0007: ldc.r4 100 L_000c: mul L_000d: stloc.1 L_000e: ldloc.1 L_000f: call void [mscorlib]System.Console::WriteLine(float32) L_0014: ret
|
|
This is our floating point example that prints the correct value (as a float) |
|
.locals init ( [0] float32 single1, [1] int32 num1) L_0000: ldc.r4 97.09 L_0005: stloc.0 L_0006: ldloc.0 L_0007: ldc.r4 100 L_000c: mul L_000d: conv.i4 L_000e: stloc.1 L_000f: ldloc.1 L_0010: call void [mscorlib]System.Console::WriteLine(int32) L_0015: ret
|
|
This is our floating point example that came out wrong above |
|
.locals init ( [0] float32 single1, [1] int32 num1) L_0000: ldc.r4 97.09 L_0005: stloc.0 L_0006: ldloc.0 L_0007: ldc.r4 100 L_000c: mul L_000d: stloc.0 L_000e: ldloc.0 L_000f: conv.i4 L_0010: stloc.1 L_0011: ldloc.1 L_0012: call void [mscorlib]System.Console::WriteLine(int32) L_0017: ret
|
|
This is our floating point example that gets it right when debugger is attached but not without |
Interesting, the only significant difference between the one that never works and the one that does but only in when a debugger is attached is that the one that does work stores and then loads our value back onto the stack before issuing the conv.i4 on the value.
L_000c: mul
L_000d: stloc.0
L_000e: ldloc.0
L_000f: conv.i4
Basically these instructions are telling it to take the result from the multiplication (pop it off of the stack) and store them back into location0 which is our floating point variable. It then says to take that floating point variable and push it onto the stack so it can be used for the cast operation. This is probably something that should be handled for us (by the C# compiler) in the case of our first example so that it works as well as the 3rd example.
The “debugger/no debugger” problem is still our big problem though. The fact that JIT optimizations are changing behavior of identical IL is frankly kind of scary. My initial thought upon seeing the changes we just identified was that the operation was being optimized away by the JIT (storing and loading the same value on the stack seems like just the thing the JIT optimizer would be looking for) thus causing the problem.
The next step in tracking this down will be to look at the native code being generated.
Note: In order to do this you have to enable “Native Debugging” in Visual Studio.
|
00000000 push esi 00000001 sub esp,8 00000004 fld dword ptr ds:[00C400D0h] 0000000a fld dword ptr ds:[00C400D4h] 00000010 fmulp st(1),st 00000012 fstp qword ptr [esp] 00000015 fld qword ptr [esp] 00000018 fstp qword ptr [esp] 0000001b movsd xmm0,mmword ptr [esp] 00000020 cvttsd2si esi,xmm0 00000024 cmp dword ptr ds:[02271084h],0 0000002b jne 00000037 0000002d mov ecx,1 00000032 call 7870D79C 00000037 mov ecx,dword ptr ds:[02271084h] 0000003d mov edx,esi 0000003f mov eax,dword ptr [ecx] 00000041 call dword ptr [eax+000000BCh] 00000047 call 78776B48 0000004c mov ecx,eax 0000004e mov eax,dword ptr [ecx] 00000050 call dword ptr [eax+64h] 00000053 add esp,8 00000056 pop esi 00000057 ret
|
|
This is our native code when started without the debugger (attach to process when its running) 9708 |
|
00000000 push esi 00000001 sub esp,10h 00000004 mov dword ptr [esp],ecx 00000007 cmp dword ptr ds:[00918868h],0 0000000e je 00000015 00000010 call 79441146 00000015 fldz 00000017 fstp dword ptr [esp+4] 0000001b xor esi,esi 0000001d mov dword ptr [esp+4],42C22E14h 00000025 fld dword ptr ds:[00C51214h] 0000002b fmul dword ptr [esp+4] 0000002f fstp dword ptr [esp+4] 00000033 fld dword ptr [esp+4] 00000037 fstp qword ptr [esp+8] 0000003b movsd xmm0,mmword ptr [esp+8] 00000041 cvttsd2si eax,xmm0 00000045 mov esi,eax 00000047 mov ecx,esi 00000049 call 78767DE4 0000004e call 78767BBC 00000053 nop 00000054 nop 00000055 add esp,10h 00000058 pop esi 00000059 ret
|
|
This is our native code when started with the debugger 9709
(I am fairly certain this disables at least some forms of JIT optimizations) |
Unfortunately when looking at the native code it does not appear that this push/pop is being removed. I have to admit that I am very rusty on my assembly language but my uneducated guess here would be that the difference is being seen due to the change from dword values to qword values . In the version that does not work, the operation is being done on QWORD values, in the version that does work it is being done on DWORD values.
If we look we can see that in the working example, it is done in dwords; then changed to be a qword
0000002b fmul dword ptr [esp+4]
0000002f fstp dword ptr [esp+4]
00000033 fld dword ptr [esp+4]
00000037 fstp qword ptr [esp+8]
In the non-working example all operations are done with qwords
00000010 fmulp st(1),st
00000012 fstp qword ptr [esp]
00000015 fld qword ptr [esp]
00000018 fstp qword ptr [esp]
My (again uneducated) guess is that what is happening is that the higher precision of the qword is picking up a small residual causing the result to be off (just slightly i.e. 98.9999999997). This could easily cause the behavior being seen.
Basically this is not so much a bug, as it is an oddity. The CLR is treating floats internally (when its time to do calculations) as if they were float64s (I would imagine since context switching from floating point to MMX is kind slow?? (again not my area of specialty)). This can cause other issues as well if you have something in a register (fresh from a calculation) and something in memory as they are in different formats, the one in the register is still in a native 64bit format where as the memory one will get widenned to 64 bits in order to be compared (as such they will not be equal)...
Back to our first example .. you remember how it was missing the
L_000d: stloc.0
L_000e: ldloc.0
before the conversion to an integer? It is failing because it is using the 64 bit version of the float value (still in a register) that has not yet been converted back to a 32 bit version.
I took my best uneducated guess, hopefully someone smarter than I can come through here and either confirm what I have said or identify the real problem :)
update: I finally found a resource on this and it seems I am in the right ballpark http://blogs.msdn.com/davidnotario/archive/2005/08/08/449092.aspx
Another good question is, why is this doing anything at runtime :) Couldn't we multiply the two constants at compile time?
As some of you may have realized, I am in the process of re-implementing my AOP framework to fully support generics right now (figured I might as well as I am white boarding it for open source deployment anyways). I have come across numerous issues in dealing with generics. Today I sent an email to the castle project group (who are going through a similar task in supporting generics in Dynamic Proxy). I figured I would post that email here as well in case others have thoughts on some of the issues I present.
I am going to dump out some of my experiences here, I have already shared some of these with hammett off list but they may help in the design of the next version of dynamic proxy (and definitely bring up discussion points).
The goal of truly supporting generics is the ability to reuse the generic proxy. This goal is not easily realized, I am beginning to question whether or not it is even worthwhile to create generic proxies.
Interceptors:
In order to have a functional generic dynamic proxy system I need to support generic aware point cuts.
Foo<T>.SomeMethod .. applies to all proxy instances
Foo<SomeClass>.SomeMethod .. only applies to proxy instances where T=SomeClass
Foo<T>.SomeMethod where T is ISomething .. dynamic application where T implements ISomething
This makes generation of generic proxies nasty at best. If we want to build a reusable proxy we have to build a superset of all defined behavior for any derived versions then conditionally not do anything at those interception points. We can move this behavior out to our interceptor cache ( i.e. simply pass a better context and return null representing no action to be taken) but this is still placing code into our proxies that we know for a fact will never be used.
ex:
an after interceptor defined only on Foo<int> must be placed on Foo<T> and checked in an if for all other classes .. This would allow us to reuse the generic proxy but has a trade off of performance for the other proxies.
This problem becomes compounded when dealing with mixins as someone could define a mixin only to apply to Foo<SomeClass> and not to Foo<T>. As such our previous solution of managing this in an interceptor cache becomes invalid so we are forced to create separate proxies for closed types in many cases. This will alleviate the problem of having garbage interceptor code but now we are also losing the ability to reuse our generic proxy.
This also adds a level of complexity to the cache. When given a Foo<SomeClass> you would first look in the cache for a Foo<SomeClass> .. if Foo<SomeClass> did not exist you would then have to analyze the metadata defining any aspects for Foo<SomeClass> to determine if you could use Foo<T> instead or whether you needed to generate a specific Foo<SomeClass> proxy. Providing we only ever _ADD_ behavior to a proxy we could reuse the open type proxies by inheriting from them for the more specific proxy having the inherited proxy adding further functionality to the base proxy ... This becomes interesting as for interceptors it may or may not add interception points meaning we have to inspect heavily the metadata in order to determine whether or not we actually need to create a subclass or whether we are simply receiving different interceptors at the same locations (easy operation but kind of annoying)
Of course to do things right you would want an operation that could also REMOVE an interception, example I want to add an interceptor to add on all List<T> except for List<MyUseVeryOftenClass> ... this operation obviously makes inheritance a bit more tricky.
Another issue I have come across in dealing with generic mixins is the following situation.
public class A<T,V> {}
public interface B<T,V>{}
public class Bimplementor<T,V> {}
I want to mixin B with A though BImplementor .. when I go to generate my proxy there are two related issues.
1) Are T,V pointing to the same thing? :)
2) If not what should B's T,V be... I need some way of being given this metadata
they have defined .. (in some fictitious language that I will use through out this)
Class A<T,V> mixin B, BImplementor<A, B>
My proxy would need to be defined something similar to the following ..
public class AProxy <T,V,A,B> : B<A,B> {
}
because the T,V does not match up .. when I go to create an instance of this proxy I need to know what to put into A and B as they will not be provided. One could quite easily force a closed definition in the aspect definition i.e.
Class A<T,V> mixin B, BImplementor<int, double>
which would help many of these issues
The flip side of this is if we want them to match up (i.e. we want to reuse the same parameters of the original class)
public class AProxy<T,V> : B<T,V> {
}
perhaps an aspect definition similar to
class A<T,V> mixin B, BImplementor<T,V>
But there should also be a fair amount of error handling to insure that all values of T and V are in fact valid for A and B of the related interface
I guess the real point I am trying to make is that there are all sorts of fun conditions where we end up just using a closed proxy anyways .. How often will we really get to use an open proxy, are the times that we can use them worth all of the complexity? Similar behaviors could have been had in 1.x by allowing inheritance within definitions (and inheriting proxies) but I have yet to see an implementation that did this .. ex:
Class A mixin B, BImplementor
applies not only to A but all classes that inherit from A
In my particular case I have even further problems because I support multiple non-pure base class aggregation with a pattern similar to this
http://www.codeproject.com/csharp/smip.asp . The additional problem here is that while A<T> and B<T> may have no public fields (I therefor inherit from A<T>) .. I may run into a point where B<T> ends up having a public field later on (as such I would need to inherit from B<T> and aggregate A<T> in order to support the public fields)
Just some food for thought :)
Cheers,
Greg
Eralier I posted (and deleted) about the Queue class not implementing ICollection from some research this is by design.
From: http://msdn2.microsoft.com/en-US/library/92t2ye13.aspx
"Some collections that limit access to their elements, like the Queue class and the Stack class, directly implement the ICollection interface."
If you look, the interfaces are also very different from each other in what they include. The generic one includes methods such as ... Add, Remove, and Clear which do not exist on the non-generic ICollection.
So conclusion .. Queue is correct but I have to say that having the generic and non-generic ICollections that represent completely different things is a bit confusing at best :-/
Be very careful when using Array.Sort in 2.0. I had posted a bug report about this a while ago http://lab.msdn.microsoft.com/productfeedback/viewfeedback.aspx?feedbackid=62029e14-2d0b-4250-a163-1583034db250
The behavior observed was originally in arraylist (which uses Array.Sort internally) so keep in mind that this applies to it as well.
Array.Sort(Items, 0, Items.Length, Comparer.Default); //takes 1 minute
Array.Sort(Items2, 0, Items.Length, null); //takes 250 ms
Items and Items2 are both clones of the same object []
What is tricky about this code is that the second call does not actually call Array.Sort .. it calls Array<object> .Sort
First call:
IL_0083: ldsfld class [mscorlib]System.Collections.Comparer [mscorlib]System.Collections.Comparer::Default
IL_0088: call void [mscorlib]System.Array::Sort(class [mscorlib]System.Array, int32, int32, class [mscorlib]System.Collections.IComparer)
Second Call:
IL_00c9: ldnull
IL_00ca: call void [mscorlib]System.Array::Sort (!!0[], int32, int32, class [mscorlib]System.Collections.Generic.IComparer`1)
Ah .. :)
Basically what the issue is is that there are 2 distinct sorting algorithms .. one that is Array.Sort one that is Array.Sort .. they do not use the same algorithm. From what I understand some changes were made in how pivots are chosen.
Hoping that this works…
In the creation of my dynamic proxy I ran into some “interesting” fringe conditions....
I started off as all do needing a very simple interceptor generator, this is btw very easily done if anyone is thinking about attempting it. I later decided to add mixin support which was a bit more interesting. That said lets get into some background information to help explain the issues.
Mixins for those who are not aware involves the dynamic aggregation (either at compile, link, or runtime) of multiple objects. When I first did mixins I only supported interface/implementor pairs (I'll explain why after the example). Here's a basic hand done example of what occurs when we are implementing an interface/implementor pair.
public class BasicClass {
public virtual int Method() {
Console.WriteLine("BasicClass:Method");
return -1;
}
public BasicClass() {}
}
public interface IBar {
void Go();
}
Our implementer
public class BarImplementer : IBar {
public void Go() {
Console.WriteLine(“BarImplementer::Go”);
}
}
and finally the aggregation class that would be generated to show the “mixin“ behavior
public class BasicClassWithIBarProxy : BasicClass, IBar //inherit from our subject and add interface {
private BasicClass m_Subject; //encapulate our subject
private BarImplementer m_Bar; //encapsulate our implementer
//override method and broker the call to our subject.
public override int Method() {
Console.WriteLine("BasicClassProxy:Method");
return m_Subject.Method();
}
public void Go() { //needed for IBar
m_Implementer.Go();
}
//allow our subject to be given to us upon construction
public BasicClassProxy(BasicClass _Subject, BarImplementer _BarImplementer) {
m_Subject = _Subject;
}
}
To make this generic one could say that my dynamic proxy generation went through the following steps
-
Iteratre through interface implementor pairs
-
Add interface to class declaration
-
add parameter to constructor
-
encapsulate the class
-
iterate through the methods of the interface adding method redirection to encapsulated implementor
-
iterate through each event creating a quick handler that bubbles the event
As one can see the generated class is simply a proxy of BasicClass that also aggregates IBar passing the method calls off to an implementor object who knows how to handle the calls received for IBar. You will notice that in this case I am not passing context, I did this for simplicity of the example the real server does pass context information to the implementor. This is a wonderful way of performing my aggregations but as I knew would happen I ran into a case where the code that I needed to aggregate was not mine nor did it support an interface. I at this point decided to support multiple base class base aggregation.
I am quite sure the astute reader just thought “wait this is a single inheritance based environment“. Well actually the very astute reader probably knows that there are numerous simulated multiple inheritance patterns available :) I chose to use David Esparza-Guerrero's pattern listed here http://www.codeproject.com/csharp/smip.asp to do my simulated multiple inheritance. To make a long read short (though I recommend reading it)... it uses implicit operations to allow for the simulation. Note that this aggregation based method only supports public contract MI not true MI within the object itself (although with a bit of hacking with reflections to avoid scope issues ... actually just no ... don't do that :))
Using such a methodology becomes a _bit_ more interesting as we are not assured to be successful every time! The major culprit here is public variables (the astute reader may also point out that non-virtual methods also are a problem). Public variables are a nightmare because there is no way of intercepting them to pass through the call to the subject object. While its quite easy to say “use properties“ it is more difficult in practice when you do not necesarily know what you will be operating nor will you control everything you operate on. Due to this I decided to take a best stab at it.
The first bit that I added was de