[cffi-devel] how to treat expected failures in tests

Fri Jan 13 06:52:08 UTC 2012

11.01.2012, 20:31, "Jeffrey Cunningham" <jeffrey at jkcunningham.com>:
> I really have no idea what is common practice in standard Unit Testing
> protocols - it isn't my background (which is mathematics). The only reason
> I suggested the additions is that it is useful information, some of which
> is lost if you don't have all four cases. And in my consulting practice I
> have used all four and seen them in use by others in one form or another
> in most test settings.

Maybe you are right, and when a test marked as a "known failure" 
passes we should draw user and developer attention to it, we should
not just mark it as "OK". 

My concerns are that I want to keep things as simple as possible -
support for know fail / unexpected ok would require to unify
the way how it is represented in all the testing frameworks used
by CL libraries. Taking into account that some testing frameworks
just does not compile on some lisps, I consider possible ways
to postpone results detalization until we have a reliable way to 
deliver results.

Also, I display not status of individual test, but an aggregated
status of the whole test suite. If all the failures are "known",
the aggregated status will be "know failure". If all OKs
are unexpected, the aggregated status is "unexpected OK".
But if both know failures and unexpected OKs present, 
how to combine them? Probably just as "fail", and expect
the maintainer to click the status to open the full library log
and find details there.

> There are many good descriptions of binary hypothesis testing, here is
> one: [...] from http://cnx.org/content/m11531/latest/

> (the two models in this setting would be something like H='test
> passes' and 0='test fails')
>

Test fails or passes is not a hypothesis, but a given measure - we know
the test status from the test suite. I have impression you speak not about tests 
marked as "known failure", but about the error handling tests, where we expect 
particular code to signal an error, and the test verifies that the error is really signaled.
If the error is signaled - test passes; if is not signaled while expected - test fails.
It's another question, which I leave to the test developers.

If we have test pass/fail as given measurements, the hypothesis pair user is
interested in are H0 "The library version is broken" 
and H1 "I can use this version of the library, it correctly implements 
the functions specified" 

Another pair of hypothesis, important for a developer are:
H0 "My recent changes did not brake anything" and 
H1 "My recent changes introduce new bugs". That's where
annotating the given measures "test fails" by an attribute
"known failure" helps.

> One might argue that Bayes testing procedures are not appropriate in
> software verification tests but I think this would be short-sighted. 

You are right. QA professionals approach the problem
using statistical methods. I remember in university there was 
a course about software reliability, they describe methods
to predict number of undetected bugs remaining in the system,
probability of failure during use of the system, etc.