Security Ops News

[P] [<] [N] [>] | Pages: 1 2 3 4 5

> Post #47160613 by shin_lao | 138 points | 42 comments | 2h ago
Jane Street Hit with Terra $40B Insider Trading Suit
(No body text)
> Post #47157224 by tintinnabula | 347 points | 117 comments | 7h ago
Jimi Hendrix was a systems engineer
(No body text)
> Post #47159302 by shrikaranhanda | 130 points | 27 comments | 4h ago
First Website (1992)
(No body text)
> Post #47161160 by jnord | 57 points | 12 comments | 59m ago
RAM now represents 35 percent of bill of materials for HP PCs
(No body text)
> Post #47157398 by thellimist | 147 points | 74 comments | 7h ago
Making MCP cheaper via CLI
(No body text)
> Post #47158975 by iamskeole | 65 points | 32 comments | 5h ago
How Will OpenAI Compete?
(No body text)
> Post #47112299 by cs702 | 75 points | 26 comments | 3d ago
Artist who "paints" portraits on glass by hitting it with a hammer
(No body text)
> Post #47154399 by andreynering | 211 points | 353 comments | 10h ago
Windows 11 Notepad to support Markdown
(No body text)
> Post #47160526 by zyoralabs | 16 points | 1 comments | 2h ago
Show HN: ZSE – Open-source LLM inference engine with 3.9s cold starts
I've been building ZSE (Z Server Engine) for the past few weeks — an open-source LLM inference engine focused on two things nobody has fully solved together: memory efficiency and fast cold starts.

The problem I was trying to solve: Running a 32B model normally requires ~64 GB VRAM. Most developers don't have that. And even when quantization helps with memory, cold starts with bitsandbytes NF4 take 2+ minutes on first load and 45–120 seconds on warm restarts — which kills serverless and autoscaling use cases.

What ZSE does differently:

Fits 32B in 19.3 GB VRAM (70% reduction vs FP16) — runs on a single A100-40GB

Fits 7B in 5.2 GB VRAM (63% reduction) — runs on consumer GPUs

Native .zse pre-quantized format with memory-mapped weights: 3.9s cold start for 7B, 21.4s for 32B — vs 45s and 120s with bitsandbytes, ~30s for vLLM

All benchmarks verified on Modal A100-80GB (Feb 2026)

It ships with:

OpenAI-compatible API server (drop-in replacement)

Interactive CLI (zse serve, zse chat, zse convert, zse hardware)

Web dashboard with real-time GPU monitoring

Continuous batching (3.45× throughput)

GGUF support via llama.cpp

CPU fallback — works without a GPU

Rate limiting, audit logging, API key auth

Install:

----- pip install zllm-zse zse serve Qwen/Qwen2.5-7B-Instruct For fast cold starts (one-time conversion):

----- zse convert Qwen/Qwen2.5-Coder-7B-Instruct -o qwen-7b.zse zse serve qwen-7b.zse # 3.9s every time

The cold start improvement comes from the .zse format storing pre-quantized weights as memory-mapped safetensors — no quantization step at load time, no weight conversion, just mmap + GPU transfer. On NVMe SSDs this gets under 4 seconds for 7B. On spinning HDDs it'll be slower.

All code is real — no mock implementations. Built at Zyora Labs. Apache 2.0.

Happy to answer questions about the quantization approach, the .zse format design, or the memory efficiency techniques.

> Post #47153798 by surprisetalk | 310 points | 476 comments | 11h ago
Bus stop balancing is fast, cheap, and effective
(No body text)
[P] [<] [N] [>] | Pages: 1 2 3 4 5