jkt's blog

Blößmrt

Entries from March 2014.

Trojita 0.4 "Ukraine" is released
2014-03-05

Hi all,
we are pleased to announce version 0.4 of Trojitá, a fast Qt IMAP e-mail client. For this release, a lot of changes were made under the hood, but of course there are some changes that are visible to the user as well.

Improvements:

This release has been tagged in git as "v0.4". You can also download a tarball (GPG signature). Prebuilt binaries for multiple distributions are available via the OBS .

This release is dedicated to the people of all nations living in Ukraine. We are no fans of political messages in software announcements, but we also cannot remain silent when unmarked Russian troops are marching over a free country. The Trojitá project was founded in a republic formerly known as Czechoslovakia. We were "protected" by foreign aggressors twice in the 20th century — first in 1938 by the Nazi Germany, and second time in 1968 by the occupation forces of the USSR. Back in 1938, Adolf Hitler used the same rhetorics we hear today: that a national minority was oppressed. In 1968, eight people who protested against the occupation in Moscow were detained within a couple of minutes, convicted and sent to jail. In 2014, Moscowians are protesting on a bigger scale, yet we all see the cops arresting them on Youtube — including those displaying blank signs.

This is not about politics, this is about morality. What is happening today in Ukraine is a barbaric act, an occupation of an innocent country which has done nothing but stopped being attracted to their more prominent eastern neighbor. No matter what one thinks about the international politics and the Crimean independence, this is an act which must be condemned and fiercely fought against. There isn't much what we could do, so we hope that at least this symbolic act will let the Ukrainians know that the world's thoughts are with them in this dire moment. За вашу и нашу свободу, indeed!

Finally, we would like to thank Jai Luthra, Danny Rim, Benjamin Kaiser and Yazeed Zoabi, our Google Code-In students, and Stephan Platz, Karan Luthra, Tomasz Kalkosiński and Luigi Toscano, people who recently joined Trojitá, for their code contributions.

The Trojitá developers

Tags: gentoo, kde, qt, trojita.
Trojita 0.4.1, a security update for CVE-2014-2567
2014-03-20

Summary

An SSL stripping vulnerability was discovered in Trojitá, a fast Qt IMAP e-mail client. User's credentials are never leaked, but if a user tries to send an e-mail, the automatic saving into the "sent" or "draft" folders could happen over a plaintext connection even if the user's preferences specify STARTTLS as a requirement.

Background

The IMAP protocol defines the STARTTLS command which is used to transparently upgrade a plaintext connection to an encrypted one using SSL/TLS. The STARTTLS command can only be issued in an unauthenticated state as per the IMAP's state machine.

RFC 3501 also allows for a possibility of the connection jumping immediately into an authenticated state via the PREAUTH initial response. However, as the STARTTLS command cannot be issued once in the authenticated state, an attacker able to intercept and modify the network communication might trick the client into a state where the connection cannot be encrypted anymore.

Affected versions

All versions of Trojitá up to 0.4 are vulnerable. The fix is included in version 0.4.1.

Remedies

Connections which use the SSL/TLS form the very beginning (e.g. the connections using port 993) are secure and not vulnerable.

Possible impact

The user's credentials will never be transmitted over a plaintext connection even in presence of this attack.

Because Trojitá proceeded to use the connection without STARTTLS in face of PREAUTH, certain data might be leaked to the attacker. The only example which we were able to identify is the full content of a message which the user attempts to save to their "Sent" folder while trying to send a mail.

We don't believe that any other data could be leaked. Again, user's credentials will not be leaked.

Acknowledgement

Thanks to Arnt Gulbrandsen on the imap-protocol ML for asking what happens when we're configured to request STARTTLS and a PREAUTH is received, and to Michael M Slusarz for starting that discussion.

Tags: gentoo, kde, qt, trojita.
Tagged pointers, and saving memory in Trojita
2014-03-26

One of the improvements which were mentioned in the recent announcement of Trojitá, a fast Qt e-mail client, were substantial memory savings and speed improvements. In the rest of this post, I would like to explain what exactly we have done and how it matters. This is going to be a technical post, so if you are not interested in C++ or software engineering, you might want to skip this article.

Planting Trees

At the core of Trojitá's IMAP implementation is the TreeItem, an abstract class whose basic layout will be familiar to anyone who has worked with a custom QAbstractItemModel reimplementation. In short, the purpose of this class is to serve as a node in the tree of items which represent all the data stored on a remote IMAP server.

The structure is tree-shaped because that's what fits both the QAbstractItemModel's and the IMAP way of working. At the top, there's a list of mailboxes. Children of these mailboxes are either other, nested mailboxes, or lists of messages. Below the lists of messages, one can find individual e-mails, and within these e-mails, individual body parts as per the recursive nature of the MIME encapsulation. (This is what enables messages with pictures attached, e-mail forwarding, and similar stuff. MIME is fun.) This tree of items is used by the QAbstractItemModel for keeping track of what is where, and for issuing the QModelIndex instances which are used by the rest of the application for accessing, requesting and manipulating the data.

When a QModelIndex is used and passed to the IMAP Model, what matters most is its internalPointer(), a void * which, within Trojitá, always points to an instance of some TreeItem subclass. Everything else, like the row() and column(), are actually not important; the pointer itself is enough to determine everything about the index in question.

Each TreeItem has to store a couple of interesting properties. Besides the usual Qt-mandated stuff like pointer to the parent item and a list of children, there are also application-specific items which enable the code to, well, actually do useful things like printing e-mail subjects or downloading mail attachments. For a mailbox, this crucial information might be the mailbox name. For a message, the UID of the message along with a pointer to the mailbox is enough to uniquely identify everything which is needed.

Lazy Loading

Enter the lazy loading. Many people confirm that Trojitá is fast, and plenty of them are not afraid to say that it is blazingly fast. This speed is enabled by the fact that Trojitá will only do the smallest amount of work required to bring the data over the network (or from disk, for that matter). If you open a huge mailbox with half a million messages, perhaps the GMail's "All messages" account, or one's LKML archive, Trojitá will not start loading half a million of subjects. Instead, the in-memory TreeItem nodes are created in a special state "no data has been requested yet". Trojitá still creates half a million items in memory, but these items are rather lightweight and only contain the absolute minimum of data they need for proper operation.

Some of these "empty" nodes are, eventually, consulted and used for item display -- perhaps because a view is attached to this model, and the view wants to show the recent mail to the user. In Qt, this usually happens via the data() method of the QAbstractItemModel, but other methods like rowCount() have a very similar effect. Whenever more data are needed, the state of the tree node changes from the initial "no data have been requested" to "loading stuff", and an asynchronous request for these data is dispatched. An important part of the tale is that the request is indeed completely asynchronous, so you won't see any blocking whatsoever in the GUI. The QTreeView will show an animation while a subtree is expanded, the message viewer might display a spinner, and the mail listing shows greyed-out "Loading..." placeholder instead of the usual message subjects.

After a short while, the data arrive and the tree node is updated with the extracted contents -- be it e-mail subject, or perhaps the attached image of dancing pigs. As the requested data are now here, the status of the tree node is updated from the previous "loading stuff" into "done". At the same time, an appropriate signal, like dataChanged or rowsInserted, is emitted. Requesting the same data again via the classic MVC API will not result in network requests, but everything will be accommodated from the local cache.

What we see now is that there is just a handful of item states, yet the typical layout of the TreeItem looks roughly like this:

enum class FetchingStatus {
    INITIAL_NOTHING_REQUESTED_YET,
    LOADING,
    DONE,
    FAILED
};
class TreeItem {
    TreeItem *m_parent;
    QList<TreeItem*> m_children;
    FetchingStatus m_status;
};

On a 64bit system, this translates to at least three 64bit words being used -- one for the painter to the parent item, one (or much more) for storage of the list of children, and one more for storing the enum FetchingStatus. That's a lot of space, given we have just created half a million of these items.

Tagged Pointers

An interesting property of a modern CPU is that the data structures must be aligned properly. A very common rule is that e.g. a 32bit integer can only start at memory offset which is a multiple of four. In hex, this means that an address, or a pointer value, could end with 0x0, or 0x4, or 0x8, or 0xc. The detailed rules are platform-specific and depend on the exact data structure which we are pointing to, but the important message is that at least some of the low bits in the pointer address are always going to be zero. Perhaps we could encode some information in there?

Turns out this is exactly what pointer tagging is about. Instead of having two members, one TreeItem * and one FetchingStatus, these are squashed into a single pointer-sized value. The CPU can no longer use the pointer value directly, all accesses have to go via an inlined function which simply masks away the lowest bits which do bring a very minor performance hit, but the memory conservation is real.

For a real-world example, see this commit in Trojitá.

Using Memory Only When Needed

Back to our example of a mailbox with 500k messages. Surely a user is only going to see a small subset of them at once, right?

That is indeed the case. We still have to at least reserve space for 500k items for technical reasons, but there is certainly no need to reserve space for heavy stuff like subjects and other headers. Indeed, in Trojitá, we track the From/To/Cc/Bcc headers, the subjects, various kinds of timestamps, other envelope items and similar stuff, and this totals a couple hundred bytes per each message. A couple hundred bytes is not much (pun intended), but "a couple hundred bytes" times "half a million" is a ton of memory.

This got implemented here. One particular benchmark which tests how fast Trojitá resynchronizes a mailbox with 100k of messages showed immediate reduction in memory usage from previous 45 MB to 25 MB. The change, again, does come with a cost; one now has to follow one more pointer redirection, and one has to perform one more dynamic allocation for each message which is actually visible. That, however, proves to be negligible during typical usage.

Measure, Don't Guess

As usual with optimizing, the real results might sometimes be surprising. A careful reader and an experienced Qt programmer might have noticed the QList above and shuddered in horror. In fact, Trojitá now uses QVector in its place, but when I was changing the code, using std::vector sounded like a no-brainer. Who needs the copy-on-write semantics here anyway, so why should I pay its price in this context? These data (list of children of an item) are not copied that often, and copying a contiguous list of pointers is pretty cheap anyway (it surely is dwarfed by dynamic allocation overhead). So we should just stick with std::vector, right?

Well, not really. It turned out that plenty of these lists are empty most of the time. If we are looking at the list of messages in our huge mailbox, chances are that most of these messages were not loaded yet, and therefore the list of children, i.e. something which represents their inner MIME structure, is likely empty. This is where the QVector really shines. Instead of using three pointers per vector, like the GCC's std::vector does, QVector is happy with a single pointer pointing to a shared null instance, something which is empty.

Now, factor of three on an item which is used half a million times, this is something which is going to hurt. That's why Trojitá eventually settled on using QVector for the m_children member. The important lesson here is "don't assume, measure".

Wrapping up

Thanks to these optimization (and a couple more, see the git log), one particular test case now runs ten times faster while simultaneously using 38% less memory -- comparing the v0.4 with v0.3.96. Trojitá was pretty fast even before, but now it really flies. The sources of memory diet were described in today's blog post; the explanation on how the time was cut is something which will have to wait for another day.

Tags: gentoo, kde, qt, trojita.