Q&A: Petr Malyukov on Building Decentralized Real-Time Communications Infrastructure for the AI Era

23 minutes ago
7 min read

Had a chance to do a Q&A with Petr Malyukov, co-founder and CEO of dTelecom, a decentralised real-time communications (dRTC) infrastructure powered by AI and built on Solana blockchain.

Question: Petr, for those who aren’t familiar with decentralized or blockchain-based technology, how would you explain dTelecom? What problem are you solving?

Petr: If I had to say it in the simplest way, I’d call dTelecom ‘real-time communication you can drop into a product.’ Most teams already know what it means to add payments, maps, or messaging through an API. We’re aiming for that same level of familiarity with voice, video, and chat.

The decentralization piece is not something I expect customers to think about day-to-day. The experience should feel straightforward. You’re a developer, you use APIs and SDKs, you ship features. The network happens in the background.

And the problem is the one you feel as soon as your usage becomes real. As traffic grows, so does the need for reliability. Costs start behaving in ways you didn’t plan for. Routing and performance become harder to control. If everything is sitting on one provider’s stack, you’re exposed to their outages, their pricing changes, their regional constraints. We’re building a network model where capacity can scale with demand and media can be handled closer to where users are, without forcing teams into unfamiliar workflows.

Question: Most people use tools like Zoom, Teams, or WhatsApp every day without thinking about the infrastructure behind them. Why did you believe it needed to be reinvented?

Petr: Most of the time, the infrastructure doesn’t matter at all, because everything is working. People see the interface, but they don’t see the backend. And honestly, that’s how it should be.

But the moment something breaks, you suddenly learn what the infrastructure really is. Does the call degrade or does it recover quickly? Is it one region or half the world? Do people get dropped and have to rejoin? How long does it take to restore traffic? For a business, that turns into very tangible damage in missed revenue, disrupted workflows, broken SLA promises, and eventually a loss of trust.

The other piece is cost. In a centralized model, you’re paying for a whole stack (data centers, bandwidth, spare capacity for spikes) and you’re paying the markup that’s baked into that stack. As usage grows, you often feel like you’re getting punished for success, with more minutes, higher bills, and not necessarily better economics. Also, you’re tied to the provider’s decisions regarding pricing changes, product roadmap shifts, and coverage gaps. Your product ends up shaped by what the platform can or won’t do.

So for us, reinventing it was more like looking at the direction the world is going. We saw more video, more voice, and more interactive experiences, and realized that the underlying model hasn’t kept up with the way people actually use communication now. We wanted an approach where capacity can be supplied by many operators, performance is measurable, and the economics don’t collapse as soon as you scale.

Question: What makes dTelecom meaningfully different from traditional communication platforms in terms of cost, privacy, and performance?

Petr: The biggest difference is that we started with a different foundation. We didn’t want to win by piling features onto the same centralized model.

On cost, the idea is simple. Workloads can run across independent nodes, so scaling doesn’t force you to pay more for one provider’s servers. It becomes a question of adding capacity to the network. That’s also why we can price services like speech-to-text very competitively. For example, our newly launched x402 STT is $0.005 per minute. This is much lower than comparable rates from OpenAI Whisper ($0.006), Google Cloud STT ($0.016), or AWS Transcribe ($0.024).

On privacy and control, centralized systems tend to pull a lot of traffic and metadata into a small number of aggregation points. We’re designing for the opposite direction, with more locality, more control over where media is routed and processed, and patterns that teams can use when they have jurisdiction or compliance constraints.

For some customers, that might be as simple as region-aware routing. For others, it can extend to deployment choices that keep them closer to their own infrastructure requirements.

Moreover, on performance, I care a lot about accountability. In centralized systems, you’re trusting a vendor’s internal processes. In a node-based model, performance can be measured in a way that’s harder to hand-wave. Nodes can build reputation based on uptime and throughput, and there can be consequences for underperformance.

Question: You’ve mentioned that your new x402-powered speech-to-text service is more cost-efficient compared to competitors. Why is that important, especially for voice-first products and AI agents?

Petr: Speech-to-text is one of those things where the economics matter immediately because it’s volume by nature. You pay for minutes of audio that add up fast if you’re transcribing support, sales, lessons, streams, or daily internal calls.

So a difference that looks small on paper becomes big and changes product decisions. If transcription is expensive, teams start rationing it. They might turn it into a premium feature, limit language support, or avoid ‘always-on’ use cases. But when the cost drops, you can design more freely. You can keep transcription on, generate summaries, make everything searchable, support more languages, and measure call quality.

For AI agents, it’s even more important. Agents talk a lot. They generate audio minutes constantly. If speech services are expensive or pricing is hard to predict, teams put brakes on the agent. This shows up as time limits, feature restrictions, and extra human oversight just to control spend. Cheaper, clearer per-minute costs make it easier for agents to run continuously and for product teams to scale them without hampering the experience.

Question: AI is becoming part of almost every digital product. How is AI improving real-time communication today, and how does dTelecom use it to create better user experiences?

Petr: I think AI is pushing real-time communication into something closer to a workspace, where you’re capturing what happened, making it usable, and removing friction that used to be accepted as normal.

The obvious improvements are things like transcription and translation. You don’t lose details, and language stops being a hard barrier. Voice enhancement is another one, as people are not always calling from quiet rooms with perfect microphones. Also, moderation matters more than people want to admit.

For us, AI sits inside the real-time stack. It’s part of how the session becomes more stable and more useful. The network can monitor conditions and respond when routes degrade. We’re also working on access patterns where autonomous agents can use speech services programmatically without the usual overhead of accounts, keys, and manual billing workflows.

At the end of the day, we’re aiming for clearer audio, fewer interruptions, transcripts and translations when you need them, and an experience that holds up as voice becomes a real interface. It’s not only for humans, but also for software that speaks and listens.

Question: Who is dTelecom built for right now, and how are your customers using it in practice?

Petr: Right now, we’re focused on teams that have real usage and feel real pain. Often that’s startups or SMEs, say, 50 to 200 people, where communication is core to the workflow and there’s already an engineering team paying for RTC.

In practice, they embed voice, video, and chat into their apps through the SDK and API. They’re coming because they need reliability, predictable scaling, and better control as usage grows. The use cases tend to be very concrete. For example, customer support and live assistance, interactive education, marketplaces and live commerce, community-driven products – any environment where latency and session stability directly affect retention.

We also build end-user products because they keep us honest. dMeet is our open-source video conferencing app for teams and businesses. Froggy is a Telegram-based live-streaming mini-app for creators and communities. Some people want a ready-made experience first, before they ever think about building their own.

Question: Looking at the bigger picture, where do you see the communication and collaboration market heading over the next five years?

Petr: I think communication will feel more like a capability that’s embedded everywhere. Calls won’t disappear, but more interaction will happen inside products, within support flows, learning environments, creator tools, and community spaces.

AI will be the most visible shift because it changes the shape of participation. You’ll see agents joining conversations in real time, summarizing, coordinating, handling routine interactions, and helping creators stay present even when they’re offline. That’s going to change expectations about what a live experience can include.

Lastly, infrastructure will be evaluated differently. Buyers will demand more accountability through measurable performance, more control over data locality, and deployment flexibility.

Question: A bit about yourself and your journey that got you to dTelecom.

Petr: I grew up around telecommunications. My dad worked as a technician at a telecom company, and he brought computers into our home early. I still remember him building a ZX Spectrum, and later our first Pentium. That shaped me more than I realized at the time. I got used to taking things apart, building small projects, trying to make them work (and sometimes trying to sell them).

I studied Applied Informatics in Economics, but in my second year I launched my first business and eventually left university to build companies full-time. Over the next 17 years, I built products across different areas. Some worked. Some didn’t. But the process taught me discipline – what it actually takes to ship, iterate, and find product-market fit. Communication kept showing up as a recurring theme, and that’s also where my partnership with my co-founder and CTO, Vadim Filimonov, became deeper.

In 2019 we launched a communications app with an AI translator. That’s where the infrastructure pain stopped being theoretical. Problems related to latency, reliability, routing, and cost became our daily reality. Because we were relying on someone else’s stack, we kept hitting the same ceiling.

Eventually, we were forced to start thinking outside the box, which led us to the dTelecom we’re building today.