Saturday, April 21, 2007

this app can break ... Notepad Bug challenge ...

Jon Bach is another brilliant thinker, speaker and coach in the area of software testing. Along with his "illustrious" and "one and only" James Bach - Jon has been doing some pioneering work in the area of exploratory testing and human aspect of testing.

While I was reading Jon's interview by another famous "Michael", the braidy tester - Micheal Hunter , I bumped into this interesting bugs that Jon mentions. This happens to be Jon's one of the most interesting bugs.

In words of Jon "Run notepad.exe and type "this app can break". Save it as test.txt, close it, then open it again. You might see that it now shows all rectangles, as if the font couldn’t be loaded."

There are other mentions of this bug here ( "Bush hid the facts")

I bet as you read this, you will be trying see if bug appears --- Bingo ... what an interesting bug ...

I did some very initial investigations around "nailing down" the bug - I posting them as they are -- I can not wait until I completely nail down the bug. It is too tempting for me to blog about it....

Successful cases
===================

this pap can break
this app can creak
this app nan break
thin app can break
shin app can break
shis app can break

this app can breal

aaaa aaa aaa aaaaa
bbbb bbb bbb bbbbb
aaaa bbb ccc ddddd

1111 111 111 11111 (interesting)


Unsuccessful ones
=====================

1a11 2b2 3c3 4ddd4
1aaa 2bb 3cc 4dddd
this cap can break (interesting)
THIS APP CAN BREAK (interesting too)

Few other observations
======================

Does not happen second time in a same file
Does not happen if a file with garbage appears once and create a copy of the same and open and check.

Investigation continues …..

What are your takes?

Finally, when you give up and do not find any more ideas as what is happening -- See here , here (good ones) and here... there are some interesting discussions and paths of investigation …. But don’t read (cheat) until you give your best try … I would say using Google would also amount to cheating …

BTW -- have seen following testing challenges put up in blog world in recent days?

1. Elisabeth Hendrickson’s Triangle challenge
2. Mathew Heusser’s challenge


Shrini

8 comments:

alan said...

How would you go about isolating the root cause of this bug (without using a search engine)?

I may first try to determine the shortest string that demonstrates the problem. A better first step may be to determine if a longer string can cause the problem. I guess you could also play with numbers, punctuation, and spacing. It would also be smart to try different patterns (e.g. "app this can break").

At some point, I would hope that most testers realize that there's something "interesting" in the way notepad reads short text files.

A possible next question would be "is this bug specific to notepad?". Create the same file, rename it to .htm and load it in IE...it displays correctly. What about another text editor? I created the same file in the popular (and wonderful) notepad2 (http://sourceforge.net/projects/notepad2)...and it has the same bug.

This tells me that theres something wrong in the way Windows loads small text files.

Now, I've learned something that will influence where I look next. I know that the CreateFile function is used to open file handles I cheated and had some domain knowledge). I fired up a debugger (ntsd.exe ships free with windows or from microsoft.com), and set breakpoint on CreateFile. I hit the breakpoint, stepped through the source a bit (symbol files for most ms products are on http://msdl.microsoft.com/download/symbols) and (eventually) saw a call to IsTextUnicode. I saw many other calls, but this was the first thing I saw that I thought could be the culprit.

I have some backgroun in fonts and globalization, so maybe I had an advantage. Anyway, at this point, I loaded the "bad" file again (under the debugger), set a breakpoint on IsTextUnicode and saw that it returns true when fed a file with the "this app can break" text. I quickly loaded a non-failure case "this app can't break", and saw that IsTextUnicode returned false.

An educated guess tells me that IsTextUnicode may be the culprit. A review of the documentation confirms that it doesn't always distinguish small samples of text, but that this behavior is expected (for better or for worse).

At this point, I know that the IsTextUnicode function thinks that the "bad" text is Unicode and uses this information to display the text incorrectly. I poked around inside IsTextUnicode for a few minutes, but didn't learn anything that wasn't in the documentation on MSDN.

Shrini Kulkarni said...

Alan,

you have a great reply. thanks for posting it here. Your approach would demonstrate how a tester can go step by step (isolating the possible culprits) there by narrowing the search space. I also like the way you structured the reply.

My view point of isolation of issue without using search engine was to force tester to use as much as possible his brain before shouting for more ideas ....

Keep coming back to my blog

Thanks
Shrini

chris said...

I tried this, and found that, instead of rectangles, I got Chinese characters. When I used Altavista, got a nonsense phrase: "桴 □the worried thoughts ash 挠 □□knocks □".

Could there be a Unicode translation problem?

alan said...

It's not real Unicode - it's gibberish. If you get boxes instead of the chinese gibberish, it just means that the font you are using in notepad doesn't contain those glyphs.

Because the IsTextUnicode function falsely determines the short string to be unicode, notepad uses eachset of 2 characters in the string to determine the appropriate unicode code points.

It's not a translation of anything, it's just misinterpreted data.

Ben Simo said...

Shrini,

I created two files: one that has the bug, and one that does not.

I then opened the files in other editors and discovered that the file that does not display properly in notepad is standard ASCII. The file that does display properly in notepad is not standard ASCII: it has two bytes before the text starts and each letter of the text if followed by a null byte. A google search for the two-byte header sequence revealed that this sequence identifies that the following text was stored in UTF-16 (little-endian).

This shows me that the problem is not only in opening the file, but in saving the file. It appears that the contents of the text cause notepad to save text in different formats. The text that does not display properly in notepad displays properly in some of the other editors I tried while the text that does display in notepad does not display in some other editors.

As a user of notepad, I always assumed that it saved normal text in standard ASCII and not UTF8 or UTF16. My assumption was wrong.

So now I ask the question: How should notepad save text?

Ben
QualityFrog.com

alan said...

Notepad saves as Unicode when it thinks the text is unicode.

Saving a file containing the "this app can break" text results in an ascii text file on disk containing this text. However, if the file is opened (exposing the bug), and saved, it gets saved as unicode.

I can't use PRE tags, so forgive the unformatted text below. bat.txt is a file created in notepad with the text above.

C:\temp2>debug bat.txt
-d
1385:0100 74 68 69 73 20 61 70 70-20 63 61 6E 20 62 72 65 this app can bre
1385:0110 61 6B 00 00 00 00 00 00-00 00 00 00 34 00 74 13 ak..........4.t.

Note that there is nothing in the file except for the ascii text characters at this point.

Gigin Mon said...

Some more observations about the note pad bug are added..

Rectangles are displayed as question marks!

Steps..

1. Run note pad.exe

2. Type "this app can break" in the newly opened notepad.

3. Save the file as "test.txt" and close it.

4. Open the file and copy the rectangles (which are displayed as the content).

5. Open another note pad and paste the same.

6. Save the file as "test1.txt", click 'OK' for the pop up and close it.

7. Open the new file “test1.txt”…Question marks are displayed... :)



Rectangles are displayed as rectangles...

Steps..

1. Run note pad.exe

2. Type "this app can break".

3. Save the file as "test.txt".

4. Open the file and copy the rectangles (which are displayed as the content) and paste the same as the next line.

5. Save the file and close it.

6. Open the file. Rectangles are displayed in both the lines...:)

Arul Mariappan said...

Hi Srini,

SOme more sentences which yields the same results (all "squares") if you type them in notepad.exe and open them again.

MS app can break
java app can break
VB app can break
.NET app can break
your app can break
mine app can break
bill app can break
gate app can break
this mpp can break
bank app can break
PS app can break
U.S. app can break

Still doing analysis on this notepad bug area. will keep posting my findings and observations into your blog.

Thanks,
Arul