12

This puzzle appeared in my local paper, while I am unable to solve it - maybe you guys can take a crack at it.

Page 1 Part 1

Page 2 Part 2

They were scanned at pretty high resolution, so you should be able to see better when you zoom in.

Feel free to suggest any tags.

UPDATE: Had a small absence, this is the underlined bold characters from the page:

MzJoYWnrZXI1NTdbambaSvbm1vqo

More Information (I dont know if this is relevant).

Paper: Morgen Avisen Jyllands-Posten

Date Published: Friday 3rd March 2017

Update:

BMP File of page thanks @MikeThammer

Update 12/03/2017: According to this reddit post This is from the Danish Cyber Defence (although there is no source).

  • 1
    I am not sure if this will be ever solved here without the text version. Especially, the second picture. – Techidiot Mar 03 '17 at 18:37
  • 1
    Well, the first page, of this this appears to be a copy, seems to be part of a little byte-code interpreter. The second page looks as if it's base64-encoded or something of the kind, but I personally don't feel like typing it in to see whether it turns out to be a program for that virtual machine :-). – Gareth McCaughan Mar 03 '17 at 18:38
  • 1
    I have a PDF document of the scan, if somebody has a program that can rip the text from it that would be awesome –  Mar 03 '17 at 18:45
  • 1
    @Memhave I am actually apart of a project right now that is building a program that uses pdfbox to extract text from PDFs... I'll let you know that getting all the text cleanly from PDFs is an annoyingly difficult thing to do and you're unlikely to get a 100% extraction from any sort of tool. – n_plum Mar 03 '17 at 18:48
  • I transcribed the first few lines of the second page and base64-decoded them. They yield something with clear repeating structure every 32 bits, which is good because the virtual machine has 32-bit instructions (so I shouldn't have called it byte-code). But whichever end the low byte is at, either the second or the fourth instruction seems to be one that doesn't exist (corresponding to a 0 entry in the opcode table). – Gareth McCaughan Mar 03 '17 at 18:54
  • What architecture is this instruction set for? From the first page. –  Mar 03 '17 at 19:11
  • But, aside from that, it looks as if the early parts do correspond to instructions for the virtual machine, loading literal values into a register and writing them out. It begins by writing out something that begins "Wr_ng" where the underscore is actually character 31. (But I suspect there is an error in my transcription, or my brain, and it's actually an "o", which differs from that by only a few bits.) – Gareth McCaughan Mar 03 '17 at 19:18
  • @Memhave A made up one, I think. – Gareth McCaughan Mar 03 '17 at 19:18
  • Ah, and the "wrong" fourth instruction doesn't matter because the second one is actually a jump. (It writes to the program counter.) – Gareth McCaughan Mar 03 '17 at 19:19
  • 3
    "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA." Clearly there's someone trapped in a computer. – Ian MacDonald Mar 03 '17 at 19:23
  • @Silenus I think that is just the other pages since it was scanned in - paper may be thin enough to see into other side a tad – n_plum Mar 03 '17 at 20:21
  • The architecture seems to be Intel80x86, @Memhave, now linked in the community wiki – humn Mar 03 '17 at 21:01
  • They are indeed other pages, so its safe to ignore those (I think) –  Mar 03 '17 at 22:32
  • Are the faint images the opposite side bleeding through in the scan or a watermark on this side? – Dr Xorile Mar 04 '17 at 01:10
  • @DrXorile They are images from other pages bleeding through –  Mar 04 '17 at 09:47
  • Almost all the lines are finished with an hyphen, and those hyphens are never presented in any other place other than line ends (didn't checked that though). The only two lines that miss the hyphen is the very last one and the 6th line at the start of the text (I looked through the text for some other cases but didn't found any). This is a curious fact, but I guess that the sixth line missing an hyphen might be just a small mistake that passed through from whoever edited and reviewed this. – Victor Stafusa - BozoNaCadeia Mar 04 '17 at 15:48
  • Could be that hyphens are just like any other character, just used for line breaks to be sneaky. Could even be that underlined characters result from backspace-underline pairs. While at it, hyphen-linebreak could be a recurring character pair. – humn Mar 05 '17 at 04:38
  • 1
    @humn The VM implementation is for x86, but the actual code is for the VM which uses its own kinda-RISC-ish instruction set. – Gareth McCaughan Mar 06 '17 at 17:25
  • 2
    Please find the original .bmp file here, it's much better than the one scanned off the newspaper: https://fe-ddis.dk/SiteCollectionImages/FE/grafik_og_billeder/3zone-billeder/hacker-opgave2017.bmp – MikeThammer Mar 08 '17 at 20:10
  • 1
    Out of the Information collected here I wrote an interpreter in C#. The Programm askes for a password which obviously will be used to decrypt something contained in the data. I would post the Interpreter but don't have enough rights. – David J Mar 10 '17 at 19:41
  • DavidJ Maybe I can post it if you post a link? Or maybe @VictorStafusa would be nice enough to add it to the community answer. –  Mar 14 '17 at 11:19
  • I think I solved some more part. I wrote some simple disassembler and it turned out that the program is an RC4 implementation protecting some content inside the disk file. Some of the key can be guessed out of some known bytes of the streamcipher prng. – David J Mar 14 '17 at 19:25
  • @DavidJ Post that as an answer. Just tell at the start of your answer that its a partial answer. – Victor Stafusa - BozoNaCadeia Mar 14 '17 at 19:59
  • Sorry, but I dont have enough reputation. I fixed up the Assembler code here with the missing commands (http://pastebin.com/TChuYF29). I did not test it because I used some C# Code to do the Job. Anyway, the idea should be clear, and easy to fix the bugs if needed. – David J Mar 14 '17 at 21:04
  • A friend of mine solved this, but didn't give me the solution.. however I can confirm it is from the Danish Cyber Defence. I'm new at Puzzling, I have the URL you will end on, if you solve the puzzle. I can post it here, if it is allowed (without the solution) :) @Memhave – Bolli Mar 16 '17 at 09:24
  • I'd much prefer to see the process of solving the question as opposed to just the solution. –  Mar 22 '17 at 12:58

3 Answers3

8

This is a community wiki non-answer for transcribing all that stuff.

Transcription of the program

Language: Intel 80x86 Assembly Language

%define REG(r) [REGS + r * 4]
%define PTR(p) [MEM + p]

U5_LE:
    mov ecx, 0x200
    mov edi, MEM
    mov esi, DISK                   ; DISK = page 2 text?
    rep movsb                       ; copy 1st 512 bytes from DISK to MEM
SPIN:
    mov edx, REG(63)                ; fetch instruction  ( REG(63) = program counter )
    mov edx, PTR(edx)               ; edx = 32-bit instruction
    add WORD REG(63), 4
    mov WORD REG(0), 0

    mov ebp, edx
    shr ebp, 21
    and ebp, 77o                    ; ebp = bits 26-21 of inst'n = destination register #
    mov esi, edx
    shr esi, 15
    and esi, 77o                    ; esi = bits 20-15 of instruction
    mov edi, edx
    shr edi, 9
    and edi, 77o                    ; edi = bits 14-9 of instruction

    mov eax, edx
    shr eax, 27                     ; eax = bits 31-27 of instruction = op code
    mov eax, [OP_TABLE + eax * 4]
    jmp eax                         ; execute instruction

OP_TABLE:
    dd OP_LOAD_B, OP_LOAD_H, OP_LOAD_W, 0, OP_STORE_B, OP_STORE_H, OP_STORE_W, \
    0, OP_ADD, OP_MUL, OP_DIV, OP_NOR, 0, 0, 0, 0, OP_MOVI, 0, OP_CMOV, 0, 0,  \
    0, 0, 0, OP_IN, OP_OUT, OP_READ, OP_WRITE, 0, 0, 0, OP_HALT

OP_LOAD_W:
    mov eax, REG(esi)
    add eax, REG(edi)
    mov eax, PTR(eax)
    mov REG(ebp), eax
    jmp SPIN

OP_MUL:
    mov eax, REG(esi)
    mul DWORD REG(edi)
    mov REG(ebp), eax
    jmp SPIN

OP_MOVI:                 ; -- MOVe Immediate (constant) value to register # ebp
    mov eax, edx         ; (edx = 32-bit instruction)
    mov ecx, edx
    shr eax, 5
    and eax, 0xffff      ;  eax =  bits 20-5  of instruction
    and ecx, 37o         ;  ecs =  bits  4-0  of instruction
    shl eax, cl          ;  eax = (bits 20-5) * 2^(bits 4-0)
    mov REG(ebp), eax    ; (ebp =  bits 26-21 of instruction)
    jmp SPIN

OP_CMOV:                 ; -- Conditional MOVe
    mov eax, REG(edi)    ;                      (edi = bits 14-9 of instruction)
    test eax, eax        ; IF  register # edi is not 0
    jz .F
    mov eax, REG(esi)    ;                     (esi = bits 20-15 of instruction)
    mov REG(ebp), eax    ; THEN move value from register # esi to register # ebp
.F:
    jmp SPIN

OP_OUT:
    push DWORD REG(ebp)
    call putchar
    add esp, 4
    jmp SPIN

OP_READ:
    mov ecx, 0x200
    mov esi, REG(esi)    ; (esi = bits 20-15 of instruction)
    shl esi, 9
    lea esi, [DISK + esi]
    mov edi, REG(ebp)
    lea edi, PTR(edi)
    rep movsb            ; read 512-byte DISK block whose number is in register # esi
    jmp SPIN

This program seems to be an assembly language emulator, but it looks like to be incomplete, since many entries of the table are not valid jump targets. I guess that completing the program is part of the challenge.


Transcription of the text

To make it easier to coordinate efforts and doublecheck the transcription, please transcribe the text in blocks, adding both the image blocks and the textual transcription below. Preserve the new lines and the hyphens, since we don't know if they are important or not. Use <pre></pre> blocks instead of standard four spaces identation for two reasons:

  1. Save some space (there is a 30K limit in answers, I am afraid to hit it).

  2. Use <kbd></kbd> for underscore text.

Also, be careful about characters that are too easy to confuse:

  • l and 1. You can distinguish them because 1 has a tail, and l don't.

  • o, O and 0. 0 is rectangular, O is diamond-shaped. o is smaller and round.

  • q and g. g has a a small turn in the descent line. Be careful specially to a q that is directly above another letter that has a transversal ascent line that could be confused with the g descent line.

  • 6 and b. The 6 is very square. b is rounded.

  • 8 and B. The B is straight in its left, the 8 isn't.

Also, it is important that each part to be reviewed by at least one person other than the transcriber to ensure that there are no typos in the transcription. Two or more reviewers are better.

First part

[Transcribed by Victor Stafusa, reviewed by humn, second review welcome]

first part

h+ABgKIA4IfCjYkhh+ACo4micpng-
CiCHAAAgzyEHIIcAACDP4A0ghwAA-
IM/hBICHAAAgz+AMIIcA-
ACDPJQAghwAAIM+gDCCHAAAg-
z+EGIIcAACDPIgMghwAAIM8gD-
SCHAAAgzyAMIIcAACDP4QYghwAA
IM/hBiCHAAAgz6AMIIcAACD-
PYA4ghwAAIM9gDiCHAAAgzyAEII-
cAACDPoQAghwAAIM8AAAD4goAA-
ZoKgCmKCwAvhIAsAANAAAABH6oAA-
hyACIs8gAACHIA0gzyAAAIcgDmD-
PIAAAhyANYM8gAACHIAAl-
zyAAAIcgByHPIAAAhyAMoM8gAA-
CHIAwgzyAAAIcgAyLPIAAAhyAAJc-
8gAACHIAygzyAAAIcgByHP-
IAAAhyAHIc8gAACHIA3gzyAAAIcg-
ByHPIAAAhyAEIM8gAACHIA-
ChzyAAAPgAAABHv4AAhyAAIl88-
8gBHvvIAhyAAIEe+8gCHwAA0g-
sAAYYKAACCCoAApXyooAELrcgC-
HIAAgQuvyAIcgAGJHP/IAl/yu-
AIcgDaJH/IAA0qoAAIcgACBCinIA-
hyAAKUKq8gCHIAuiR/yAAFqA-
AABSgCgAhyADpEc8gACX/KgA-
hyAUYkf8gACHIAggzyAAAIcgAm-
LPIAAAhyAKoM8gAACHIAAlzyAAA-
IcgDaDPIAAAhyAMIM8gAACH-
IANizyAAAIcgBmHPIAAAhy-
AOoM8gAACHIAbhzyAAAIcgDGD-
PIAAAhyADos8gAACHIA0gzyAAAIcg-
DeDPIAAAhyAG4c8gAA-
CHIAAlzyAAAIcgAKPPIAAAhyACIs-
8gAACHIAkgzyAAAIcgBWHPIAAAhy-
AF-

Second part

TO DO Please transcribe it.

Third part

TO DO Please transcribe it.

...

Add a lot of parts here...

Last part

[Transcribed by Silenus, reviewed by Victor Stafusa, second review welcome]

last part

NWdNu+qgtKoyqoHspnuURTOjUVw-
Gy75nE7fcM0doZWLOI-
u50d+mbS46Z1+AqfXaA6/-
rT9AqdvNt4-
iA0yYRQwidXoRtqSzankRtG71Nn-
6vikbXTKSvk8ZrbdjYCxVoCDVpx-
pXRxT2hGC8OPhLk9537gU4-
rUoirNDzkuytvIQF+sH-
belshtabXTuWap0g2V+ZhA2w-
g1vZi78Fld/uP9VsDPNzgkD-
3jtXPPe0Xi-
l1AzDJN/7VWSKI9AfUXKrEx2Rl-
R3IR+18B7wsoRwCn9qbExo3meD-
vCER2NVcM81GNXcZV5JORUnE-
rVEtQJqwU4Rr1doiibD/JWu8n-
zLiYMG25C0QDQCGAx1FuhXGmqEA-
3Dy6fbgr6D+aUc2bUCviBv69-
UiHl+l7N7kkkdWKl5vwok8suTh/-
zgYXJNvQuFf19WtE6VIdabzOBc-
c1PwR3u0k/0eoglxqYqcD5L-
INh2iTzrMbW0SMGm5lhYoPXelf-
wsFbLQFBmR8Oz8Fwe0h/UbAw1w-
9JddVlDmHr0NDd+fbPL/N6Qe-
DPMxp3W2x-
FnKOPagBnmpxjOyeECNor/-
UhfbC5HCS0ojdS+ShclM/-
Y9rPb0bUC7zj7wdSG3BusbNa-
chJorOjDP9RBnGmBmPr9kxFKQqg-
dQfSvlGu+Z3H5HO/5OEKbDJps-
lPQzB5cWbFacYkKqNrvyCZQi-
LsM+omRCI3gM

Also,

Certain of the characters from page 2 are underlined (see here). The underlined characters are (at least) MzJYWNrZXI1NT6amt6a5vbm1vqo. Perhaps these are the only characters that are important and the rest of the string can be ignored.

humn
  • 21,899
  • 4
  • 60
  • 161
  • I think this is a good lead, I will see if I cant find something to output the 2nd page in text while leaving the question open for a (hopeful) solution. –  Mar 04 '17 at 00:43
  • I will take a closer look tomorrow and list all the underlined and bold chars –  Mar 04 '17 at 01:06
  • @Memhave Are the underlining just photocopying bleed-through from the next page? – Lawrence Mar 04 '17 at 13:30
  • @Lawrence Looking at the scanned photocopy, no. The underlines are clearly in the text and clearly not part of what was on the next page. Also, the underlined letters seems to be in boldtext for me. – Victor Stafusa - BozoNaCadeia Mar 04 '17 at 13:32
  • @VictorStafusa You're right. My mistake. – Lawrence Mar 04 '17 at 13:35
  • @Silenus I think that the newlines might be important. Look a the sequence of A's in the end of the first column. Some lines clearly feature line breaks much earlier than what would be needed otherwise. – Victor Stafusa - BozoNaCadeia Mar 04 '17 at 13:44
  • @Lawrence Victor is correct, they are not bleed through - they have bold print and underlined. I should be able to get them in a few hours. –  Mar 04 '17 at 15:16
  • The underlined characters look suspiciously like a Discord token. However, they are missing - and . – bb216b3acfd8f72cbc8f899d4d6963 Mar 04 '17 at 15:17
  • Almost all the lines are finished with an hyphen, and those hyphens are never presented in any other place other than line ends (didn't checked that though). The only two lines that miss the hyphen is the very last one and the 6th line at the start of the text (I looked through the text for some other cases but didn't found any). This is a curious fact, but I guess that the sixth line missing an hyphen might be just a small mistake that passed through from whoever edited and reviewed this. – Victor Stafusa - BozoNaCadeia Mar 04 '17 at 15:47
  • Hyphens could mean its part of the same string while a space denotes a seperator –  Mar 04 '17 at 17:41
5

When I looked at the underlined characters in the base64 encoded part, I got the sequence: MzJoYWNrZXI1NTd6amt6aS5vbmlvbgo.

Which decodes to a webpage.

The assembly code and the whole base64 string are on that site. The assembly code is in a file named u5emu.asm and the string is in a file named disk.img.b64.

So maybe the emulator/vm is some sort of Ultra 5 or SPARC thingy - which can mount the .img file?

johndon
  • 51
  • 4
4

I started reversing the VM; might as well post what I have so far. :)

  • The VM is little-endian.
  • The VM uses 32-bit fixed-length instructions. The opcode is stored in the high 5 bits (loadw, mul, movi, etc.). The remaining 27 bits vary depending on the opcode.
  • There are 64 registers. The last register (reg[63]) is a 16-bit program counter, which increments by 4 after executing an instruction.
  • Although instructions are always 4 bytes, they don't need to be 4-byte aligned.

Let us now document each instruction. In our notation, r6 means a register (identified by a 6-bit code) and immX means an X-bit immediate constant.

As Victor Stafusa has already mentioned, the x86 code provided in Image 1 is incomplete, and only provides 6 instructions: loadw, mul, movi, cmov, out, and read. The rest need to be figured out from the base64-encoded program in Image 2.

OP_LOAD_W

Encoding:

  • Bits [31:27]: Opcode (00010)
  • Bits [26:21]: Dest (r6)
  • Bits [20:15]: X (r6)
  • Bits [14:9]: Y (r6)
  • Bits [8:0]: Unused

Operation:

Dest = mem[X + Y];

OP_MUL

Encoding:

  • Bits [31:27]: Opcode (01001)
  • Bits [26:21]: Dest (r6)
  • Bits [20:15]: X (r6)
  • Bits [14:9]: Y (r6)
  • Bits [8:0]: Unused

Operation:

Dest = X * Y; // Unsigned multiplication.

OP_MOVI

Encoding:

  • Bits [31:27]: Opcode (10000)
  • Bits [26:21]: Dest (r6)
  • Bits [20:5]: X (imm16)
  • Bits [4:0]: Y (imm5)

Operation:

Dest = X << Y;

OP_CMOV

Encoding:

  • Bits [31:27]: Opcode (10010)
  • Bits [26:21]: Dest (r6)
  • Bits [20:15]: Src (r6)
  • Bits [14:9]: X (r6)
  • Bits [8:0]: Unused

Operation:

if (X != 0) {
    Dest = Src;
}

OP_OUT

Encoding:

  • Bits [31:27]: Opcode (11001)
  • Bits [26:21]: Char (r6)
  • Bits [20:0]: Unused

Operation:

putchar(Char);

OP_READ

Encoding:

  • Bits [31:27]: Opcode (11010)
  • Bits [26:21]: Dest (r6)
  • Bits [20:15]: Sector (r6)
  • Bits [14:0]: Unused

Operation:

for (i = 0; i < 512; i++) {
    mem[Dest + i] = disk[(Sector*512) + i];
}
  • 1
    Maybe the [disk] referenced in the OP_READ is the same disk from @johndon s answer –  Mar 07 '17 at 21:51