1 pointby vnikme5 hours ago1 comment
  • vnikme5 hours ago
    I built Telegram crawlers that need to monitor thousands of channels simultaneously. TDLib works great for interactive clients, but when you run it as a crawler it develops a memory problem: after a week or two of watching 2000+ channels, it quietly fills up RAM and needs a restart.

    The root cause is that TDLib caches everything — messages, users, dialogs, stickers, animations — because it was designed for a UI client that needs instant access to recent history. MessagesManager, ContactsManager, DialogsManager and ~150 other managers accumulate state indefinitely. There's no eviction because a chat app doesn't want to evict.

    My use case is different: I don't need any of that. I have my own Postgres database. I track pts per channel myself. I just need a reliable, typed C++ interface to the raw MTProto protocol — something like Telethon but native, with no business logic attached.

    So I forked TDLib and stripped it down.

    What I removed: - All ~150 manager classes (MessagesManager, ContactsManager, etc.) - tddb/ — SQLite, Binlog, all persistence - tde2e/ — end-to-end encryption for secret chats - td_api — TDLib's own high-level abstraction layer - JSON/C/JNI/CLI interfaces

    What's left: - Full MTProto protocol layer (crypto, key exchange, sessions) - Auto-generated typed C++ API from Telegram's .tl schema - TDLib's actor framework and async infrastructure - Bot token + phone auth + 2FA - Stateless session: export as base64 string, restore on next run

    Result: ~104k lines vs ~400k in original TDLib. Memory is fixed from startup — the internal state is a handful of auth keys and DC config, nothing that grows with usage.

    The public API looks like this:

        auto client = mtproto::Client::create({.api_id = ID, .api_hash = "..."});
        client->auth_with_bot_token("TOKEN");
        client->on_update([](auto updates) { /* raw telegram_api:: types */ });
        client->send(
            td::telegram_api::make_object<td::telegram_api::updates_getChannelDifference>(
                std::move(input_channel),
                td::telegram_api::make_object<td::telegram_api::channelMessagesFilterEmpty>(),
                pts, 100, false
            ),
            [](td::Result<td::telegram_api::object_ptr
                   td::telegram_api::updates_ChannelDifference>> result) {
                // fully typed response
            }
        );
    
    Updating to a new Telegram protocol layer is one command — CMake auto-regenerates all C++ types from the .tl schema file.

    Repo: https://github.com/vnikme/autoproto

    Would be interested to hear from anyone who has hit the same TDLib memory issue, or who has needed raw MTProto access from C++ for other reasons.